Robot quality education, it's time to popularize it. 04/16 Update SLTechnology News&Howtos

Robot quality education, it's time to popularize it.

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Every time I see humans training robots, I am confused about how many times the skills developed in such a cruel environment can be used in the real world.

It's like kicking it in the back, letting it get up again after falling to the ground, letting two robots fight and rubbing crazily on the ground, or constantly jumping high on a platform of more than ten meters. Can I help you?

This treatment of heavily built robots, not to mention the picture "can not bear to look directly at", repair wear and tear is free of money?

Then I realized that it had something to do with the current technology path. Deep learning requires a lot of practice and training in order to continuously optimize the algorithm until the robot can cope with a variety of situations in the real world.

In theory, as long as the time is long enough and there is no upper limit on the budget, monkeys can write literary works on computers, not to mention machines that turn their heads against heaven.

But what does this have to do with us ordinary people? Can we use (and afford) reliable problem-solving machine assistants in our lifetime? Ten thousand years is too long, shall we just seize the day?

Recently, soft actor-critic (SAC), the latest reinforcement learning algorithm developed by Berkeley and Google brain, is said to have the potential to change the early thinking of training robots for real-world robot learning.

Today, let's talk about how SAC will change the "path to success" of robots.

Why is it so hard to get a rio robot?

Before introducing the new SAC algorithm, it is necessary to explain whether there must be a better robot algorithm. In other words, what exactly can this algorithm change?

This needs to be answered by the challenges faced by robots in the real world.

First of all, it is the current training methods that determine that the speed at which machines master new skills is far from enough.

In traditional machine learning algorithms, the parameters of prototype design need to be adjusted every time a new task is performed, and some of them need to collect data again for training, which makes the total time required for new machine skills increase rapidly.

Second, countless accidents in the real world may cause the machine to break down.

In real use of the machine, no matter what problems occur, such as power outage or network delay, the machine will "crash" to deal with the crisis. If you have to restart the job every time, what is the value of the application robot?

The above is all about efficiency, and the cost dilemma brought about by traditional training methods makes researchers worry about baldness.

Whether it is the violent operation of man-made bug or the high-frequency jitter of the actuator in a variety of complex environments, it will bring great wear and tear to the hardware. Is this a robot? This is a banknote shredder!

Of course, it is not that human beings have thought of a way. For example, let robots play games, design simulation environment, these efforts have greatly reduced the dependence on real training, but in the final analysis, can not replace the diversity and randomness of the real environment. The most important thing is to create a set of algorithms that are "tailored" for real-world robots.

What kind of algorithms do real-world robots need?

So, what attributes should such an algorithm have?

At least a few key elements are needed:

1. Good sample complexity. The more training samples provided to the algorithm, the lower the time cost for the machine to obtain the data label, the smaller the return error, and the better the performance in reinforcement learning.

two。 There are no sensitive hyperparameters. In order to improve the performance and effect of machine learning, it is often necessary to optimize the hyperparameters, but in the real environment, the less the parameters are adjusted, the better. The algorithm needs to minimize the need to adjust hyperparameters.

3. Asynchronous sampling. In the real world, there will inevitably be problems such as data flow terminals and reasoning delays, so that the machine can maintain a certain degree of persistence and stability during the "restart" phase. data collection and training must be carried out in multiple separate threads that are minimized.

4. Smooth movement. In order to prevent large movements or vibrations from damaging the hardware, time-related and coherent exploration becomes particularly important.

To sum up, if we believe that robots in the real world are indispensable in the future, it is obviously unwise to require them to master these and other skills with unlimited time, unlimited investment, and countless times of hitting the wall.

How can it carry out trade-off and practical training on its own? SAC appeared in response.

The secret of SAC's success is to be big.

Having said so much, the Lord finally appeared. that. What on earth is SAC?

SAC, full name is Soft actor-critic. It is not difficult to see from the name that SAC is also based on the logic of the Actor-Critic algorithm, that is, actor (players) perform randomly, critic (judges) score randomly, and pursue better performance (reward) under mutual checks and balances.

The difference is that SAC's attitude towards parameters is very "gentle". It automatically weighs the expected return (maximize return) and the depth of exploration (maximum uncertainty), and then automatically learns that it is not regarded as a "super parameter" that needs to be adjusted, so as to obtain the optimal strategy.

The advantage of this is that there are a variety of training samples, there is no need to adjust the parameters frequently, and the learning efficiency is much higher. Even in the worst experimental environment, it performed well.

It is as if machines used to stay up all night to do Mathematical Olympiad problems under the "care" of human mothers, in order to become a "mathematical prodigy". Now that I have learned to work hard in combination with work and rest, I am satisfied with winning high marks in the college entrance examination. Obviously, the latter is the role model for most ordinary machines and what human parents should expect.

I would like to invite three robot students from Berkeley Primary School to speak for themselves:

The first person to come to us was Minitaur, a small quadruped robot with eight actuators. When moving forward, we often use the controller to track the swinging parts of the limbs and observe various angles to balance the strength on the legs. If there is no effective training strategy, it is easy to lose balance and fall down. If you fall too much steel and iron bones will be broken.

However, after mastering the new learning method, by maximizing the uncertainty of the data during training, Minitaur can handle the interference of great logarithmic balance without any additional learning.

The second is a very flexible "three-fingered hand" student, whose task is to rotate the valve with his hand to make the color hook face to the right. However, a small motor is installed on the valve, and the force will reset automatically. Therefore, with each turn, the initial position of the valve is randomly reset so that the machine must re-perceive the current valve direction. This task requires perception and prediction, and accurate control of nine servo motors to complete, which is very challenging, but our "three fingers" still successfully completed the task.

Although the last robot student was playing Lego, he didn't get much fun from it. Because the trainer requires it to accurately aim at the stud when stacking building blocks to reduce friction.

In addition to determining the position and speed of the joints, you also need to ensure the strength of the ends and transmit complex commands to seven joints at the same time. Is there any difference between this and requiring human children to "take a hundred steps through the poplar"?

However, the machine that mastered the SAC Dafa did not disappoint, and it took only three hours to learn how to accomplish the task. The PPO strategy used in the past took 7.4 hours to complete. Isn't SAC great?

In the paper, the researchers gave SAC a high-profile attributive: "state-of-the-art" (the most advanced), which can be said to be well deserved.

Of course, the above are just conceptual experiments, and it takes a lot of optimization iteration and developing coding parameters to really extend this ability to more challenging real-world tasks, but predictably, due to the emergence of SAC, the robot is infinitely close to the critical point of concept to practical.

With this best guide, robots can finally be less "abused" and really infiltrate into the details of life.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.