In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
On October 31, the 2019 Beijing Zhiyuan Conference, hosted by Beijing Zhiyuan Institute of artificial Intelligence, opened at the National Conference Center. Focusing on the current situation and opportunities and challenges of basic research on artificial intelligence, as well as the core direction of the future development of artificial intelligence technology, the conference returned to AI technology itself, conducted in-depth discussions, and explored the development trend of artificial intelligence cutting-edge science and technology.
On the first day of the main forum, Professor Zhu Songchun of the University of California, Los Angeles, delivered a keynote speech entitled "towards General artificial Intelligence: from big data to Big Task" and put forward the view that tasks are intelligence centers.
The following is a transcript of Zhu Songchun's speech, compiled by AI Technology Camp (ID:rgznai100):
With regard to general artificial intelligence, everyone has their own ideas, some people think that it is impossible, some people think that it will come soon, and it will be terrible. No matter which point of view, we should study the problem in a down-to-earth manner. today, I would like to share with you a train of thought from big data to the big task.
First, the dispute between the two paradigms of artificial intelligence: big data VS task; second, the central core of intelligence is the task, which is driven by all kinds of tasks all the time; third, how to study general artificial intelligence by building a platform for large tasks.
The dispute between two kinds of AI Paradigm: "big data" versus "Great Task"
The first is the struggle between the two paradigms. If you ask most artificial intelligence researchers, they will tell you that AI = B+C+D, that is, artificial intelligence equals big data plus arithmetic plus deep learning, which is a generally accepted view, but I have always opposed this view. A few years ago, when I opposed deep learning as the solution of artificial intelligence, many people were still very dissatisfied, but today many people have agreed with this view.
At present, artificial intelligence driven by big data has encountered a lot of problems in the process of industrial landing, which can be expected a long time ago. For example, you can only do specific, human pre-defined tasks, but not general tasks, or define your own tasks. Second, each task requires a large amount of data, the cost is very high, and the model has no explanation, and the knowledge representation is different from that of people.
In fact, we were the first team to do big data. In 2005, we led a group of international scholars, including those who later marked ImageNet at Stanford, and later the director of MIT laboratory, in Lianhua Mountain Landmark data in Hubei Province, China. At that time, big data was just on the rise, and we were ambitious to mark data and developed more than 200 pages of data labeling manuals, including how to decompose lotus flowers, stamens, petals and living environment.
After marking it for several years, I found that there was something wrong with it. When some students first asked me how to mark it, I could still answer, but then I couldn't answer it. So I found this road impassable, so around 2009, I switched to cognitive science.
I wrote an article about two modes of artificial intelligence.
One is called the parrot paradigm, where parrots can talk to humans, but don't understand what you're talking about. For example, if you say Lin Daiyu, it also says Lin Daiyu, but it doesn't know what Lin Daiyu is.
Another is the "crow paradigm". After the crow finds the walnut, it will throw it on the road and let the car crush it and crush it before eating it. But because there were too many crows on the road to eat walnuts, the crows threw walnuts at the zebra crossing because there was a traffic light. When the car stopped when the green light was on, it could eat it. This example is very amazing, because the crow has neither big data nor supervised learning, but it can study the causality on its own, and then use resources to complete the task, and the power consumption is very small, less than 1 watt. This gives us a lot of inspiration.
But crows can do more than that, they also know how to use tools, I believe that to this day, the planning ability of many robots is far less than the crows' understanding of physics.
Suppose we want to define an artificial intelligence system, I can think that any animal or machine is an AI system, which often determines three elements: the first is architecture, if you lack a piece in your brain, you will never evolve to a level, and more than 90% of human intelligence is inborn; the second is environmental data; and the third is tasks.
The first level of solution is to use big data for a task, such as face recognition. Give a framework, such as how many layers of deep learning, this is the current general big data system. I think humans have gone the other way, that is, there is a very small amount of data, but there are a lot of tasks, but humans have a very advanced architecture, which becomes another system.
For example, how to teach computers to recognize chairs? Big data's method is very simple and very violent. Is to search for a large number of examples and mark them manually. You need to collect chairs of all kinds of materials and camera angles, hold them up and take them for training, and the system will remember these features after training. But artists always design new chairs, there are always special cases, the machine is always confused, so it can not be generalized, can not explain what a chair is, this is its core problem. There are such problems in the fields of autopilot, video surveillance and so on, that is, we cannot enumerate all the examples.
What is the solution of the second level? Suppose we want to understand a chair and define what a chair is. First of all, get people's classic sitting posture, and then use sitting posture to fit the image, a variety of positions, various orientations, various posture sitting posture, can sit comfortably is the chair, this is a task.
The chair becomes the equivalent of a task. No matter what object can sit, it is the chair that is comfortable. This represents a kind of imagination, I want to imagine how to sit in this chair, which is different from deep learning: regression is statistics, using features to fit, simulation is using my body to imagine, this is the fundamental difference between the two.
The third level, what is sitting comfortably? First of all, this chair should sit firmly. Physical stability is a very sensitive perception in the nervous system.
I did a simple experiment in which I put all kinds of chairs in the office and lab to see where the students sat first and then when they came in.
Another thing that we empathize with is that we can feel how this person exerts and how he is exerted, which can be reflected. This is a very strong human ability. After the mapping, I know how you sit comfortably, from learning your values.
Once you understand this, you don't need any data. I know that chairs are for people to sit comfortably, and fundamentally, I don't need data at all. This is what I call small data, big tasks.
Task-centric intelligence
Task-centered intelligence is realized after a long time.
The expression of the first layer is centered on the image. We see an image, and then take the image as the center to extract various features to understand the various components. The second layer becomes centered on the scene and objects, that is, geometric representation.
The first layer is deep learning, that is, the image is regarded as an image, without geometric, physical and functional understanding. The second layer begins to express the scene with three-dimensional geometry, such as tables, chairs and so on.
Finally, I think the real expression is the task-centered task-centered representation, because we have tasks before we have all kinds of objects in the world, which are designed to meet people's needs and accomplish certain tasks. Task-centered expressions, such as function, causality, values, physics, and social common sense, are collectively referred to as physical and social common sense, which children acquired 18 months ago.
What is a task-centered expression? We don't think about the world from the point of view of a certain category of objects. For example, to open a wine bottle, not only the bottle opener can open the wine bottle, anything can open the wine bottle, smashing walnuts also hides a physical principle. After we know the principles of physics, we do not need a fixed thing to open the bottle, as long as we can complete the task of opening the bottle. I think this is general artificial intelligence.
How to solve this problem? Recent research has found that most knowledge expressions in the human brain are not organized according to physical categories, such as chairs, desks and cars, but are organized according to how they are used. It can be divided into two scales: the scale of the body and the scale of the hand.
In fact, this is not new, because more than 1 / 3 of our Chinese characters are human-related radicals. We invented Chinese characters because there are some tasks in Chinese characters, which are to put people in and think about them together, such as hands, feet, ears, body and so on.
Or take smashing walnuts as an example. Crows grind walnuts with wheels. If in a new environment, the tools are taken away, but you can still complete the task of smashing walnuts, this is a very important thing in primary education, which is to teach you basic common sense.
Although there is only one example of smashing walnuts, people have to choose one from tens of thousands of choices, and there are a large number of simulation in the process. I see walnuts and several tools, the brain will quickly think about what to do, maybe your brain has tens of thousands of choices quickly. This is a process of massive computation, but it is not a calculation of deep learning, but a calculation of simulation.
There is also spatio-temporal causal reasoning, now this situation to achieve a variety of goals, there are a variety of actions, and then form a causal equation, physically unified.
Another point is Causal Learning and Reinforcement Learing. RL is also very popular now, but according to neuroscience researchers, RL is used by lower animals such as mice and needs to be tested over and over again with a large number of examples. On the other hand, people use causal learning, which only needs two or three examples.
The task of smashing walnuts to the robot can not be simply converted, it must be a physical causal equivalence, to reason the physical function.
This is a process of learning from one or two examples. A smarter person can learn the values of sitting in a chair from a few simple choices and the essence from a simple act of smashing walnuts. What do you need data for once you realize it? Therefore, this is a core issue.
Take shoveling as an example, if you are asked to shovel dirt with tools, you will imagine how to shovel. If you don't have tools, you can also shovel dirt with tools at home. After automatic calculation, the first choice of the machine is the pot, and the second choice is the cup.
When man or ape-man walked through the Stone Age, the nervous system had learned tools and physics, and he understood the essence.
Now back to how to define a task? How many pixels can be clearly defined in an image, but how do you define a task? The definition task is to change the flow pattern in the scene in a causal way. "flow state" is a word invented by Newton, including time-varying physical state, inner state, social relations and so on, which can be simply classified as physical flow pattern and social flow pattern.
If the task space (atomic space) of these atoms is defined, it can be combined to produce a compound mathematical space, which is the task. To be clear about this matter, more than half of the problem of artificial intelligence has been solved.
At present, artificial intelligence has encountered great difficulties because people do not know exactly what tasks to do. The unclear definition of the task is the reason why many products cannot be sold or are complained after they are sold. For example, the definition of the sweeping robot product is not clear what should and should not be sucked, and there is no clear definition of the task for the machine itself. The same is true of monitoring, which people should be arrested, who should not be arrested, or what kind of environment can not be accurately defined.
I talked about some basic physical tasks and common sense, which is the main obstacle to artificial intelligence at present. For example, in natural language understanding, natural language is at most symbol-to-symbol, such as what is meant by "playing with water". If there is no experience of three-dimensional data and no physical common sense, it is actually very difficult to understand the meaning of this word, so physical common sense is the key.
There is a fable in China, "the Blind tell the Sun". A blind man who has never seen the sun can not explain what the sun is. This is the embarrassment of natural language. Natural language must be associated with cognitive science, computer vision and robots, otherwise it can not be studied clearly, this is my point of view.
Another kind of intelligence is the common sense and task of society. Human toddlers begin to point to things after 12 months. He knows something, but he thinks you don't know it, so he shows you that it is a very strong intelligence. To achieve this kind of intelligence, we must first have a change of perspective, that is, reasoning what others see and think, which is the basic thing of intelligence. Human beings should have context in dialogue, know what the context is and what they know together.
From a third point of view, a person is what he really sees. Then we think about what he saw from the third person, which is computer reasoning, which is equivalent to I probably know what you are looking at, and I know how to answer when you suddenly ask me a question. There is also the consensus we have reached, I know you know, you also know I know, this is the formation of a common task.
What is the cognitive framework of human beings? Dialogue and language are very important issues in order to form dialogue. Each objective world is represented by a circle in which each point represents a state. Red is what I think, and blue is what the robot sees.
First of all, robots see an incomplete and uncertain world, while human beings see a shared world, because they see things from the same point of view. We look at things from each other's point of view, and both sides know each other what they see. Only by having something in common can we build a model.
Decision function, that is, I know what you should do and what I should do in this state. Value function, that is, I probably know what you should do, and what I think you will do, and what you think I will do, with common context and knowledge, there will be common values. Finally, through the process of communication, we reached a consensus.
Christopher Manning talked about communication between people only 10 bit, very slow, compared with 5G is too far away, but communication is very fast, why? Because we have these things.
Let me summarize AI's crow model.
With a small number of examples, but with functions, causality, values, etc., we can use examples to understand the world from the beginning. I call it intelligent dark matter. You see this chair and imagine how the body is going to sit. It's called dark matter. 95% of them are cognitive reasoning, and only when this 95% is done, can we understand the remaining 5%, otherwise we can only enumerate all the situations.
This is a simple demonstration of how robots interact with humans. After the robot sees the person coming in, it needs to understand the person's intention. after knowing the intention, the robot can help people open the refrigerator and know to put the food in the refrigerator. In the whole process, there is not only the exchange of words, but also the exchange of movements and expressions, so that the two sides can reach a consensus, that is, to guess what your intentions are, which is a basic way.
This is the desktop robot we just made, which can reconstruct the 3D scene. Some basic computer vision methods can be used in 3D scene reconstruction. Then imagine what people can do in this scene to define the usefulness of furniture.
This includes top-down 's inference, but can never recognize small objects and must pass through the scene context. This is a very fatal problem. Now there is no top-down in all deep learning, only bottom-up.
At this time, we have to make a unified system that integrates six fields, namely, computer vision, cognitive science, language dialogue, machine learning, robot learning and so on.
How to build a "big task" training and testing platform?
How do you build a big task? My goal is to train a "crow" with general artificial intelligence in a system, which is a core problem.
Of course, it is not enough to train in one physical scene. The first step is to generate a large number of three-dimensional objects in the database according to human needs. This is a variety of generated examples that can be tested in a variety of environments later. Now the big data fitting, everyone can test, this is one of the ways to play.
My way of playing is different. After the intelligent system comes, I will show you with a brand new system to see if you can complete a variety of tasks, rather than the tasks specified in advance.
In this system, we have to do something physically realistic, such as pouring wine, pouring water, squeezing toothpaste, playing with sand and playing with water, which is very laborious, and we have been doing it for many years.
First define the basic task, the person can be connected, the machine can be connected, and then the task can be done together in person.
We made a glove that can record some perceptual and motor behavior in detail. After I entered, I knew that there were some basic operations, namely Learing from demonstration.
This is a virtual robot agent, let it complete Fresh Juice. For example, it has to find oranges first, then cut them, and then press them at the place where the juice is squeezed. It has to go through the process of training. Such as cooking, making noodles, this is a very big task. In ordinary daily life, the more difficult it is to look down on things you don't think highly of.
People can interact with machines inside, that is, human-computer cooperation to accomplish one thing.
Finally, people can also teach the robot, for example, this man demonstrates how to hit walnuts, and there is actually a lot of engineering behind this action.
The robot will think about how to smash walnuts in a new environment, and the whole reasoning process can be done in this place.
The most important thing, I can stop the machine at any time and ask it, what do you know now, you know what I'm doing, or ask it to explain what I'm going to do and why, this is Explainable AI.
The core of the agent is to combine natural language dialogue, computer vision, robots and so on, just like teaching a child how to teach him with small data.
Summary
First, 99% of people bet on big data in the struggle between "big data" and the "big task" paradigm, but I bet on the big task 10 years ago.
Second, I think the task is the intelligence center. We have the task-orientated operating system, programming language and architecture.
Third, how to build a platform for big tasks. I want to be able to train the "crow" in it and drive it through independent tasks. There are many mathematical, theoretical and engineering problems to be solved, and China and the United States must cooperate to solve this problem.
Https://www.toutiao.com/i6754302467501982212/
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.