In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)12/24 Report--
What can be done by letting GPT-4 operate a humanoid robot without any programming or training in advance?
The answer is: how strange!
This requires the robot to perform "dazzling popcorn in a dark cinema, only to suddenly find that it is eating the 😅 of a stranger next door".
Under the control of GPT-4, the robot smiles frozen, shaking its head awkwardly and leaning back its forehead:
But after being embarrassed, I didn't forget to grab another one and put it in my mouth.
It is also like asking the robot to "play the guitar".
GPT-4 thought for a moment and began to manipulate the robot to move its fingers, then shook its head crazily, as if it were a bit of rock'n 'roll.
But if you take a closer look, when you shake your head, your fingers will not move at all.
If you say that this is a garrulous Mr. stall pinching and calculating, there seems to be nothing wrong with it (manual dog head).
Summing up a series of actions--
Compared with the Boston powered humanoid robot, every move is carefully controlled by a human program. This robot directly controlled by GPT-4 has a hideous expression and weird action, but all its behavior meets the requirements of prompt.
After this series of videos of GPT-4 manipulating robots were posted on the Internet, many netizens shouted, "the Horror Valley effect has been committed":
It even scares 20-year-old robotics experts:
These movements give me the creeps. Do you have a sense of beauty to see for yourself?
Some netizens joked: "it looks just like me on the stage."
However, some netizens believe that it is inconceivable that humanoid robots can be manipulated through GPT-4.
It turns out that this is the first research on a humanoid robot driven by GPT-4, from the University of Tokyo and the Japanese company Alternative Machine.
With this research, users do not need to program the robot beforehand, but only need to use language input, that is, to chat with GPT-4 for a while, to get the robot to complete the action according to the instructions.
Let's take a look at more details and principles behind this study.
Big model + the new attempt of the robot without programming or training, using GPT-4 as its brain, what other frowning but reasonable actions can this humanoid robot Alter3 do?
Why don't you give Alter3 an instruction to pretend to be a 👻?
It also knows how to enter the play in a second, with its mouth wide open and its hands stretched forward.
But for some reason, the slightly open lips and hollow eyes make people feel more like Lin Zhengying calling an expert zombie:
If you want to ask it to take a selfie, Alter3 can also take a mug-faced picture on the spot.
He did not know that he was so ugly by himself in the original camera that he did not enjoy his expression, but closed his eyes painfully:
Let's listen to another rock'n 'roll, music. Let's go.
You must be right to say that he is nodding his head frequently to follow the beat, but if you say that he is standing in front of him respectfully and saying, "Ah, yes", it seems to be very appropriate (manual dog head):
Of all the videos released on demo, "drinking Tea" is the least bizarre and even seems to be playing me:
When you have nothing to love when you go to work, drinking tea is so desperate. If you want us to say, don't open your mouth until you put it on the edge of your mouth, it doesn't matter if you don't drink this tea.
As a humanoid robot, Alter3 makes funny ideas about human behavior, so... Why not try something else?
For example, play a snake that swings after being inspired by music:
See, it's not that flexible, but it's twisting its torso as much as it can. it's a crazy version of a snake. Gif.
From this point of view, it is possible to integrate humanoid robot and GPT-4 directly, but the aesthetics is not enough.
In fact, in retrospect, scientists and researchers have been working all year to combine large models with robots.
However, the usual practice is to do more training, and then try to transfer the capabilities and knowledge of large models of image language to the field of robots.
Including Microsoft's ChatGPT for Robotics, Google's PaLm-E, RT-1, RT-2, as well as VoxPoser, BoboCat and many other work, all follow this route.
Among them, the sensational Google RT (Robot Transformer) series worked well, but it took Google only 17 months to train it and collect 130000 pieces of robot-specific data from 13 robots-- money and energy that is hard for the average team to have.
In the middle of the year, the embodied intelligence of Li Feifei's team went a step further, further improving the robot's ability to interact with the environment by combining LLM (large language model) and VLM (visual language model).
In this way, the robot does not need additional data and training to complete the task.
However, Li Feifei's team gave out the hardware in demo, and the ontology is still just a robotic arm. In the study we introduce today, the subjects are GPT-4, which is the strongest in the field of large models, and Alter3, which is the "body".
Both the GPT-4 developed by OpenAI and the humanoid robot Alter3 jointly developed by the University of Tokyo and Hiroshi Kuroshihiro, the "father of robots" in Japan, are all existing research achievements.
The real purpose of this research is to explore how to use large models like GPT-4 to control humanoid robots to complete various actions without programming, so as to verify the ability of GPT-4 to generate actions and reduce the complexity of human-computer interaction.
With this series of results, Alter3 can complete the various complex actions seen above (let's put aside the degree of completion and appreciation).
Another thing is that when researchers integrated Alter3 and GPT-4, they found that even if the same instruction was given to Alter3, the feedback from Alter3 would not be the same every time.
After some analysis, they believe that this is related to the characteristics of the large language model itself, that is, the same input may correspond to different outputs, but it does not mean that GPT-4 can not control the humanoid robot well.
For example, if a robot is required to "eat", it may make different actions to eat with chopsticks and with a knife and fork.
So, then, how on earth does GPT-4 know how to control Alter3 as soon as it receives statement input?
The key is to rely on two-step prompts before it is actually connected to GPT-4, Alter3 has a brain (AI neural network) and carries a variety of sensors.
Previously, Alter3's behavior mainly depended on its built-in CPG (CentralPattern Generator, central pattern generator) to analyze the data from the sensor, and then drive 43 pneumatic devices in a certain order to complete the corresponding action.
The whole process often requires human intervention, patch, so as to make some improvements.
But! Everything is different now, and the research team says they are "relieved" by the integration of GPT-4.
It is now possible to command Alter3 with verbal instructions, and the techniques behind it are mainly these two:
CoT (chain of thought) and zero-shot (zero sample learning).
Relying on these two technologies, the control of Alter3 no longer depends entirely on the hardware itself, but can directly convert natural language into actions that can be understood and executed by the robot with GPT-4.
Most importantly, the whole process does not require explicit programming of any part of the body.
OK, now let's talk about how to integrate GPT-4 and Alter3 together.
It can be divided into two steps:
First of all, use prompt to describe what actions or actions you want Alter3 to accomplish, such as "Let's take a selfie" and "raise your arms higher when you take a selfie".
The GPT-4 that receives the input generates a series of thought steps that detail what needs to be done to accomplish this action.
This process, called part of CoT by the research team, breaks down a complex task into a series of simpler thought steps.
The researchers then sacrificed another prompt, transforming the detailed steps of the decomposition into action instructions that the Alter3 could understand.
The simple understanding is to convert the description of people's instructions into Python code, which can be directly used to control the specific motion parameters of various body parts of Alter3.
With the transformed code, Alter3 can winkle if he wants to, and he can do so if he wants to skip his mouth.
.
The research team sees this second step as part of CoT because it accomplishes "translating an abstract description into a concrete operation".
The team says CoT gives GPT-4 effective control over Alter3, ordering it to do complex movements without extra training or fine-tuning.
To say a few more words, in addition to the above two Prompt to control the robot, the research team also completed a number of other studies.
For example, we break apart and look at the behavior of Alter3 in the dialogue, mainly aiming at its dialogue trajectory and semantic time evolution.
For the conversation trajectory, the research team used a method called UMAP (Uniform Manifold Approximation and Projection). The team embeds the content of the dialogue into the two-dimensional space, making it easy to observe the development of this simplified version of the dialogue.
They found that when the order of the conversation was fixed, the conversation trajectory showed a circular pattern, that is, repeating the same topic all the time.
When the dialogue sequence is random, the dialogue content is more divergent or creativity.
What's interesting about thieves is that the study found that GPT-4 tends to say "goodbye" over and over again after talking for a long time. If you don't intervene, it will indulge in finding ways to say goodbye to you.
In the process of semantic time evolution analysis, the team observed the changes of chat content over time.
They found that some of the key words in the early stages of the conversation, such as "art" or "learning", were forgotten by GPT-4 and replaced by words such as "culture", "human" and "inhibition".
This shows that the content of the dialogue is gradually developing and changing.
Of course, if you start saying "goodbye" to GPT-4, it just wants to say byebye~ (doge) to you.
The popular network-wide study from the University of Tokyo comes from the University of Tokyo and the Japanese company Alternative Machine.
Takahide Yoshida, from the Department of General Systems Science, University of Tokyo.
The other two authors, Atsushi Masumori and Takashi Ikegami, are both at the University of Tokyo and at Alternative Machine.
Finally, it has to be mentioned that the protagonist of this study, Alter3, whose behind-the-scenes researcher is also from the University of Tokyo, was jointly created by Takashi Ikeami, an AI researcher at the University of Tokyo, and Hiroshi Kuroshi, the "father of robots" in Japan.
Alter3, born in 2020, is the third generation of robots in the same series.
It is understood that the two iterations of the Alter series were completed in opera singing. The third generation made its debut by conducting an orchestra and taking part in other live performances at the New National Theater in Tokyo.
At that time, it was characterized by enhanced sensors, and improved singing expression and vocal system.
And the CPG in the body that can eventually drive 43 pneumatic devices.
How sensitive is CPG's analysis of the data? That is, if the temperature in Alter3's room drops sharply, Alter3 will shudder to show that he is cold.
This may also provide some basis for it to be able to make expressions and complete movements vividly after it is connected to GPT-4 as its brain.
When One More Thing talks about the latest news of humanoid robots, be sure to mention the latest developments of Tesla Optimus Prime Optimus, an old horse.
Just now, Musk suddenly posted a video of Optimus on Twitter, saying that the second generation of Optimus robots (Gen 2) would be released this month.
A slight "slight" improvement is a 30 per cent increase in the walking speed of the second-generation Optimus.
The sense of balance and physical control have also improved.
I'm looking forward to it!
Reference link:
[1] https://tnoinkwms.github.io/ALTER-LLM/
[2] https://arxiv.org/abs/2312.06571
[3] https://twitter.com/elonmusk/status/1734763060244386074
This article comes from the official account of Wechat: quantum bit (ID:QbitAI), author: Hengyu Xiao Xiao
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.