There are two "deep learning big three". Hinton and LeCun predict the future of deep learning. 07/06 Update SLTechnology News&Howtos

There are two "deep learning big three". Hinton and LeCun predict the future of deep learning.

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Https://www.toutiao.com/a6707483763141509643/

On June 23, local time, Geoffrey Hinton and Yann LeCun, winners of this year's ACM Turing Award and the Big three of Deep Learning, gave a speech on ACM FCRC 2019, sharing their latest views on deep learning.

The title of Geoffrey Hinton's speech is "Deep Learning Revolution". So far, he says, there are two typical examples of artificial intelligence. The first is logic-inspired intelligence in the 1950s, when the essence of intelligence was to use symbolic rules to make symbolic expressions. This method focuses on reasoning, mainly on how to make computers react to reasoning like human beings. The second is artificial intelligence based on biological inspiration. The essence of intelligence it represents is to learn the connection advantages in neural networks. This approach focuses on learning and perception.

(source: Geoffrey Hinton)

From this point of view, the two paradigms of artificial intelligence are very different, and they have different views on internal representation (internal representations).

(source: Geoffrey Hinton)

The internal representation of logic-based artificial intelligence is symbolic expression. Programmers can enter these symbols into the computer in explicit language; the computer produces new representations of existing symbols by applying rules. On the other hand, the internal representation of biological artificial intelligence has nothing to do with language. Like neural activity, they are filled with a large number of vectors, which are learned directly from the data and have a direct causal effect on neural activity.

This leads to two ways in which computers perform tasks.

The first is programming, which Hinton also calls intelligent design (intelligent design). When programming, the programmer has figured out the methods and steps to deal with the task. what he needs to do is to calculate accurately, enter all the details into the computer, and then let the computer perform it.

The second is learning, when you only need to provide the computer with a large number of input and output examples, so that the computer can learn how to connect the input with the output and map the output according to the input. Of course, this also requires programming, but the program used is a simplified general learning program.

For more than 50 years, human beings have been trying to make symbolic artificial intelligence (symbolic AI) realize the function of "looking at pictures and talking". For this task, human beings have tried for a long time in both ways, and finally the neural network has successfully completed this task, which is based on the method of pure learning.

(source: Geoffrey Hinton)

Hinton: the Core problem of Neural Network

This leads to the core question of neural networks: large neural networks with millions of weight parameters and multiple layers of nonlinear neurons are very powerful computing devices, so can neural networks start with random weight parameters? and learn to perform a difficult task (such as object recognition or machine translation) by acquiring all the knowledge from the training data?

Next, Hinton reviews the efforts of his predecessors.

(source: Geoffrey Hinton)

How does the neural network work? Hinton made a brief introduction.

(source: Geoffrey Hinton)

The researchers first made a rough idealization of a real neuron so that they could study how neurons work together to perform difficult calculations.

Neural network consists of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function called activation function. The connection between each two nodes represents a weight for the signal passing through the connection, which is called weight, which is equivalent to the memory of the artificial neural network. The output of the network varies according to the connection mode, weight value and incentive function of the network. The network itself is usually the approximation of a certain algorithm or function in nature, or it may be the expression of a logical strategy.

(source: Geoffrey Hinton)

So, how to train neural networks? Hinton believes that there are two major methods, which are supervised training and unsupervised training.

Supervisory training: show an input vector to the network, and tell it the correct output, by adjusting the weight to reduce the difference between the correct output and the actual output.

Unsupervised training: only the input is displayed to the network, and by adjusting the weight, the input (or part of the input) is better reconstructed from the activity of the hidden neurons, and finally the output is generated.

Among them, supervised learning is a well-understood training method, but the "mutation" method it uses is very inefficient.

(source: Geoffrey Hinton)

In contrast, back propagation (backpropagation algorithm) is just an effective way to calculate how weight changes affect output errors. Instead of disturbing the weight one at a time and measuring the effect, it uses calculus to calculate the error gradient of all weights at the same time. When there are a million weights, the back propagation method is 1 million times more efficient than the mutation method.

(source: Geoffrey Hinton)

However, the development of back propagation algorithm is not satisfactory.

In the 1990s, although the back propagation algorithm worked well, it was not as expected, and it was still very difficult to train deep networks; on medium-sized data sets, some other machine learning methods are even more effective than back propagation.

(source: Yann LeCun)

Researchers of symbolic artificial intelligence say the task of learning difficulties in large deep neural networks is stupid because they start with random connections and have no prior knowledge.

So deep learning experienced a period of "cold winter", until 2012, people realized that deep learning is useful, deep learning has a large number of applications. Such as image recognition and machine translation.

Finally, Hinton talked about the future of neural network vision. According to Hinton, almost all artificial neural networks use only two time scales: slow adaptation to weights and rapid changes in neural activity. Synapses can adapt on many different time scales, and the fast weight adaptation (fast weight) for short-term memory (short-term memory) will make the neural network better.

Yann LeCun: the future lies in supervised learning

Yann LeCun said in his speech that supervised learning works well when there is a large amount of data, and can do speech recognition, image recognition, facial recognition, attribute generation from pictures, machine translation and so on.

If the neural network has some special architecture, such as those proposed in the 1980s and 1990s, it can recognize handwritten words, and it works very well. by the end of the 1990s, such a system developed by Yann LeCun at Bell Labs had undertaken 10% of the handwritten text recognition work in the United States, which was successful not only technically but also commercially.

(source: Yann LeCun)

Later, the whole academic circle almost abandoned the neural network. This is partly due to the lack of large databases, partly because the software written at that time was too complex and required a lot of investment, and on the other hand, computers were not fast enough to run other applications.

Convolution neural network is actually inspired by biology, but it does not copy biology. Yann LeCun was inspired by biological ideas and research results, and he found that back propagation can be used to train neural networks to achieve these phenomena. The idea of convolution network is that objects in the world are made up of various parts, each part of which is made up of patterns, which are the basic combination of materials and edges, and the edges are composed of distributed pixels. If a system can detect a useful combination of pixels, and then to the edge, pattern, and finally to each part of the object, this is a target recognition system. This is not only suitable for visual recognition, but also for speech, text and other natural signals. We can use convolution networks to recognize faces and pedestrians on the road.

Between the 1990s and around 2010, there was a so-called "AI winter", but people like Yann LeCun continued their research. They continue their research on face recognition, pedestrian recognition and so on. They also use machine learning in robotics, using convolution networks to automatically mark the entire image, and each pixel is marked as "can" or "cannot" to pass through, guiding the robot forward.

(source: Yann LeCun)

A few years later, they used a similar system to complete the target segmentation task, the whole system can achieve real-time deployment of VGA, segmentation of each pixel on the image. The system could detect pedestrians, roads and trees, but the results were not immediately recognized by the computer Vision Society.

Convolution neural networks have many applications in recent years, such as medical imaging, autopilot, machine translation, games and other fields. Convolution neural networks require a lot of training. However, this method of massive repetition of experiments is not feasible in reality. For example, if you want to teach a self-driving car to drive, it is impossible to repeat training like this in the real world. Pure reinforcement learning can only be applied to the virtual world.

So why can people and animals learn so fast?

Unlike autopilot systems, humans can build intuitively real models, so they don't drive their cars off a cliff. This is an internal model that humans have mastered, so how do humans learn this model? And how to make the machine learn this model?

A similar mechanism exists in animals. Prediction is an indispensable part of intelligence, and when there is a difference between the actual situation and prediction, it is actually a learning process.

Taking video content prediction as an example, given a piece of video data, it is necessary to predict the content in the blank space from one piece of video content. The typical scenario of self-supervised learning is that there is no need to really leave a blank without announcing which paragraph to free in advance, but just let the system rebuild the input according to some restrictions. The system completes the task only through observation, without external interaction, and the learning efficiency is higher.

The future of machine learning lies in self-supervised and semi-supervised learning, rather than supervised learning and pure reinforcement learning. Self-supervised learning is like filling in the blanks, performing well on NLP tasks, but mediocre on image recognition and understanding tasks. This is because the world is not entirely predictable. For video prediction tasks, the results may be multiple, and the prediction results made by the training system often get the only "fuzzy" results, that is, the "average" of all future results. This is not an ideal prediction.

Finally, Yann LeCun says that for hundreds of years, the emergence of theories has often been accompanied by great inventions and creations that followed. What will deep learning and intelligence theory bring in the future? It's worth waiting for.

(source: Yann LeCun)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.