Tesla wants to kill "brake failure". 04/15 Update SLTechnology News&Howtos

Tesla wants to kill "brake failure".

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Recently, Ashok Elluswamy, director of Tesla's autopilot software, gave a speech at CVPR 2022, introducing many achievements made by Tesla's autopilot team in the past year, especially the neural network model called Occupancy Networks (hereinafter referred to as Occupy Network).

He said that there are many problems in the traditional methods of semantic segmentation and depth information used in autopilot systems, such as the difficulty of converting 2D to 3D and the inaccurate estimation of depth information.

After using the occupation network, the model can predict the space occupied by the objects around the vehicle (including the space generated by the next action of the dynamic object).

Based on this, vehicles can dodge without identifying specific obstacles-Ashok Elluswamy even joked on Twitter that Tesla's car could even avoid UFO!

Based on this technology, vehicles can also see whether there are obstacles in the surrounding corners, so as to achieve unprotected steering like human drivers!

In short, occupying the network has significantly enhanced Tesla's autopilot ability (L2).

It is said that Tesla's autopilot system can prevent 40 accidents caused by driver errors every day!

In addition, Ashok Elluswamy also focuses on the efforts of Tesla's Autopilot system to prevent drivers from misoperating.

By perceiving the external environment and the driver's operating system, the vehicle can recognize the driver's misoperation, such as stepping on the acceleration pedal at the wrong time, the vehicle will stop accelerating and brake automatically!

▲ Tesla active braking

That is to say, some of the "brake failure" problems caused by driver misoperation that have been frequently exposed in China before will be technically limited.

I have to say that Tesla is really good at promoting technological progress. The following is a compilation of Ashok Elluswamy's speech video, with a slight deletion.

First, the pure vision algorithm is powerful two-dimensional image to three-dimensional

At the beginning of the speech, Ashok said that not everyone knew the specific functions of Tesla's autopilot system, so he gave a brief introduction.

▲ Ashok

According to him, Tesla's autopilot system can help vehicles maintain lanes, follow vehicles, slow down and turn corners, and so on. In addition to these, Tesla's autopilot system is also equipped with standard safety functions, such as emergency braking and obstacle avoidance, to avoid a variety of collision accidents.

In addition, since 2019, about 1 million Tesla vehicles have been able to use more advanced navigation on the highway, check side lane information to perform lane changes and identify entrances and exits of the highway.

And Tesla's autopilot system can also park cars automatically, identify traffic lights and road signs in parking lots, and turn right to avoid obstacles such as cars. At present, these functions have been verified by more than 100,000 Tesla owners.

In the speech, Ashok also produced a video recorded by the user. The video shows that when users are driving on a crowded road in San Francisco, the car screen shows the surrounding environment, such as road boundaries, lane lines, and the location and speed of nearby vehicles.

The ▲ system recognizes the surrounding environment.

On the one hand, these need the support of hardware such as Tesla's car machine and camera, on the other hand, they also need the support of Tesla's built-in algorithm and neural network.

According to Ashok, Tesla is equipped with eight 1.2-megapixel cameras that can capture 360-degree images of their surroundings, generating an average of 36 images per second. Tesla's car will then process the information and perform 144 trillion operations per second (TeraOPs / s).

And these processes are based on pure vision algorithms, without the use of lidar and ultrasonic radar, and no high-definition maps.

So how does Tesla's autopilot identify general obstacles?

Ashok said that when a general obstacle is encountered, the system will use the space segmentation method. When using the space segmentation method, the system marks each pixel in the space as "driveable" and "undrivable", and then the autopilot chip processes the scene. However, there are some problems with this method.

The marking of objects by ▲

First of all, the object pixels marked by the system are in the two-dimensional space, and in order to navigate the car in the three-dimensional space, the object pixels need to be converted into the corresponding predicted values in the three-dimensional space, so that Tesla's system can establish an interactive physical model and successfully handle the navigation task.

The marking of objects by ▲

When converting object pixels from two-dimensional image to three-dimensional image, the system needs to carry out image semantic segmentation (that is, to identify the image at the pixel level, that is, to mark the object category to which each pixel in the image belongs).

This process will produce unnecessary images or unnecessary pixels in the system, and a few pixels on the ground plane of the image can have a great impact, directly determining how to convert this two-dimensional image into a three-dimensional image. Therefore, Tesla does not want to produce such influential pixels in planning.

In addition, different obstacles need to be judged by different methods.

Generally speaking, the depth value of the object is more commonly used (the distance when looking at the object from the point of view of the observer, which is finally obtained by projection transformation, standardizing equipment coordinates, zooming and translating).

In some scenarios, the system can predict obstacles first. In another scene, the system can also detect the depth of the pixels of the image, so each pixel will produce some depth value.

▲ depth map (right)

However, although the resulting depth map is very beautiful, when using the depth map for prediction, only three points are needed.

And in the visualization of these three points, although it looks good up close, they will also deform with the increase of distance, and it is difficult for these images to continue to be used in the next stage.

For example, a wall may deform and bend. The objects near the ground plane are also determined by fewer points, which makes it impossible for the system to correctly judge obstacles in planning.

And because these depth maps are converted from the plane images captured by multiple cameras, it is difficult to produce the same obstacle, and it is difficult for the system to predict the boundary of the obstacle.

Therefore, Tesla put forward the occupation network scheme to solve this problem.

Second, calculate the space occupancy rate to encode the object

In the course of the presentation, Ashok also demonstrated this occupying network solution in a video. He said that as can be seen from the video, in this scheme, the system processes the images captured by eight cameras, then calculates the space occupancy of the object, and finally generates a schematic diagram.

Simulated image generated by ▲

And every time Tesla's car moves while driving, the system network will recalculate the space occupancy of the surrounding objects. In addition, the system network will calculate not only the space occupancy of some static objects, such as trees and walls, but also the space occupancy of dynamic objects, including moving cars.

After that, the network outputs the image into a three-dimensional image, and can also predict the occluded object, so even if the car uploads only part of the outline of the object, the user can distinguish the object clearly.

In addition, although the resolution of the image captured by the system is different because of different distance, the resolution of the simulated 3D image is the same based on the above scheme.

▲ produces the same resolution image.

This means that the whole scheme runs very efficiently, says Ashok. The computing platform runs for 10 milliseconds, and the system network can run at 100Hz, which is even faster than many cameras record images.

So how is this process accomplished? This requires an understanding of the architecture that occupies the network solution.

When explaining the architecture of occupying the network scheme, Ashok compared the image correction process between Tesla's fisheye camera and the left camera.

First of all, the system will stretch the image, then extract the image features, query whether the relevant points of the three-dimensional image are occupied, then use three-dimensional position coding, and then map it to a fixed position. This information is then collected in subsequent calculations.

▲ performs preliminary processing of the image.

After that, the system will embed the image space, continue to process the image stream through three-dimensional query, and finally generate three-dimensional occupation features. Because high-dimensional occupancy features are generated, it is difficult to perform this operation at every point in space. Therefore, the system will generate these high-dimensional features in lower dimensions, such as using typical upsampling techniques to generate high-dimensional space occupancy.

▲ calculates the space occupancy of objects

Interestingly, Ashok revealed in his speech that the occupation scheme was only used to deal with static objects, but in the end, it was difficult to deal with only static trees, and the system encountered a lot of difficulties at the beginning of distinguishing between "real pedestrians" and "fake pedestrians".

But the team eventually found that whether the obstacles were mobile or static, the system only needed to be able to avoid them.

▲ genuine and fake pedestrians

Therefore, the occupation network scheme no longer distinguishes dynamic obstacles from static obstacles, but uses other categories to deal with them and calculate the instantaneous space occupancy of objects, but this is not enough to ensure that Tesla's car can drive safely.

Because if only the instantaneous space occupancy rate is calculated, it is not reasonable for Tesla to encounter a car while driving on the highway and then start to slow down. The system would like to know the space occupancy of the car at different times after that, as well as the changes.

In this way, the system can predict when the car will leave. Therefore, the scheme also involves predicting the occupancy flow.

The calculation process of ▲ occupation flow

The data of occupancy stream can be the first or higher derivative of space occupancy or time, or it can provide more precise control and unify them into the same coordinate system. The system will use the same method to generate space occupancy and occupancy streams, which will also provide strong protection against a variety of obstacles.

Third, the type of obstacle is not important. The system can avoid collision.

Ashok also said that conventional motion or mobile networks cannot determine the type of object, such as whether it is a static object or a moving vehicle.

However, from the control level, the type of objects is actually not important, occupying the network scheme provides a good protection to prevent the classification dilemma of the network.

Because no matter what the cause of the obstacle, the system will think that this part of the space is occupied and move at a certain speed. Some special types of vehicles may have strange protuberances that are difficult to model with traditional techniques, and the system will use cubes or other polygons to represent moving objects.

In this way, the object can be squeezed arbitrarily, using this space-occupying method, without the need for complex grid topology modeling.

When the vehicle is making an unprotected or protected turn, geometric information can be used to infer occlusion. Geometric information needs to speculate not only the information recognized by the vehicle camera but also the unrecognized information.

For example, when a car is making an unprotected turn, there is a fork ahead, and potential vehicles may be obscured by trees and road signs, so the car "knows" that it cannot see the vehicle through these obstructions. Based on different control strategies, the car can ask questions and eliminate this occlusion.

Therefore, for a stationary object, the car can identify when it becomes visible on the way. Because of the complete three-dimensional obstacle, the car can also predict how far it will hit the object, and then the system will identify and pass through the obscured object through smooth control.

So occupying the network scheme helps to improve the control stack in many different ways. This scheme is an extension of the neural radiation field, which has taken over computer vision research to a large extent in the past few years.

Schematic diagram of the relationship between ▲ NeRf and occupation network

NeRf is an image reconstruction of a scene in a single scene or a single location, from a point in a single location.

Ashok said that Tesla's vehicle is in motion, the background processing received the image is more accurate, so you can (use NeRf) to generate a cross-time and accurate image route, through the NeRf model and 3D state differential rendering image to produce a more accurate 3D reconstruction.

There is a problem with the images of the real world-we will see a lot of unreal or different scenes in the real world.

For example, solar glare or dirt or dust on the windshield will change due to the diffraction of light, or raindrops will further distort the propagation of light, resulting in artifacts.

The way to improve robustness is to use higher-level descriptors, but these descriptors do not change local lighting artifacts (such as glare) to some extent.

Because RGB (color system) images can be very noisy, adding descriptors to RGB provides a layer of semantic protection against changes in RGB values. Therefore, Tesla's goal is to use this approach to occupy the network solution.

▲ descriptors are more robust than RGB

Because occupying the network scheme requires space occupancy in several shots, complete neural optimization cannot be run in the car, but neural optimization can be reduced to run in the background. ensure that the space occupancy it produces can interpret the images of all sensors received by the car while it is running.

In addition, descriptors can be superimposed during training to produce good supervision for these networks; at the same time, different sensor data can be rendered differently to monitor the images held.

At present, Tesla already has a network to reduce obstacles, the next step is to avoid any collision, Autopilot already has a lot of security features.

Then, Ashok showed three videos of Autopilot starting to avoid collisions.

The collision accident here refers to the crash caused by the driver accidentally pressing the accelerator pedal as a brake pedal.

Ashok said that when the driver accidentally uses the throttle as a brake, the car will accelerate and cause a collision, but the vehicle will recognize and stop the acceleration automatically and brake automatically to prevent the collision.

In the first video, Ashock said that if the Autopilot did not start and stop the car from accelerating, the driver in the video would probably fall into the river.

▲ Tesla AP starts to avoid falling into the river

Similarly, the second video shows a Tesla driver stepping on the gas pedal while parking, but the Autopilot starts quickly and prevents the car from hitting shops and pedestrians.

▲ Tesla AP starts to avoid car crashing into the store

IV. Automatic path planning through occupancy vehicles

But it may take seconds or even minutes for a car to brake smoothly and stop, and there may not be enough time to identify obstacles and calculate them while driving.

So we need to use neural networks to achieve this goal; especially recently there have been more complex hidden scenarios. All Tesla's autopilot team has to do is get the space occupancy rate from the previous network.

First, the space occupancy is encoded into a super-compressed multilayer perceptron (MLP). In essence, this MLP is an implicit representation of whether collisions can be avoided in any particular query state, and this method of avoiding collisions provides some guarantees within a certain time frame. For example, you can avoid collisions for 2 or 4 seconds or within a certain time range.

Ashok gives another example of a top-down road where black pixels are obstacles, gray pixels are roads, and white pixels are road lanes. In the top view of this three-dimensional space, the car can be placed at any pixel to simulate whether a collision can be avoided.

Schematic diagram of ▲ vehicle running condition

"if you think of a car as a single point and the period to avoid a collision is set to an instant, then whether a collision will occur at the current time depends only on the location of the obstacle, but the problem is that the car is not a point," he said. "it has a rectangular shape and can also turn."

Therefore, only when the shape is convoluted with the obstacle, can we immediately know whether the car is in a state of collision.

As the car turns (or rotates out of control), the collision field will change. Green means that the car is in a safe position without a collision, and red means a collision, so when the car rotates, there are more collision positions, but when the car position is aligned, the green position expands, which means that the car will not collide.

Overall, Ashok shows how to use multiple camera videos and products to generate dense space occupancy and occupancy streams, which can be used to generate an effective collision avoidance field through a neural network, that is, vehicles "look" through the camera and, according to experience, pass through the road of obstacles at the right speed and direction.

Implicit Neural Network for avoiding collision in ▲

Ashok also shared an experiment in a simulated environment in which the driver stepped on the throttle to accelerate without steering, and the car detected a collision and planned a path so that the car could pass safely.

Ashok said at the end of the speech that if they can successfully implement all of the above technologies, they can produce a car that will never crash.

Obviously, this work is not finished yet. In his last PPT, Ashok actively invited engineers to join Tesla to build a car that would never crash.

▲ Ashok Elluswamy welcomes more talents to join Tesla

Conclusion: Tesla continues to explore autopilot

Since Tesla brought fire autopilot technology, autopilot track has emerged a large number of followers. But I have to say that Tesla is always at the forefront of the industry, constantly exploring new methods of autopilot.

The person in charge of Tesla's Autopilot project brought a new technical interpretation, and to a certain extent, it also showed us ahead of time the highlights of Tesla's future self-driving technology. With Tesla's spirit of continuous exploration, its self-driving will continue to lead the entire auto market.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.