GPT-5 Awakening leads to Human Extinction? DeepMind urgently taught AI to be a man, and his thesis was published at the top of the journal. 04/15 Update SLTechnology News&Howtos

GPT-5 Awakening leads to Human Extinction? DeepMind urgently taught AI to be a man, and his thesis was published at the top of the journal.

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

OpenAI created GPT-4, but it left the world with the problem of alignment. how to deal with it? DeepMind found the answer in the Curtain of ignorance by political philosopher Rawls.

The emergence of GPT-4 scares the bosses of AI all over the world. The open letter calling for the suspension of GPT-5 training has been signed by 50, 000 people.

OpenAI CEO Sam Altman predicts that within a few years, a large number of different AI models will spread around the world, each with its own wisdom and ability and following different moral standards.

If only 1/1000 of these AI have hooliganism for some reason, then we humans will undoubtedly become fish on the chopping block.

In order to prevent us from being accidentally destroyed by AI, DeepMind gave the answer in a paper published in the Proceedings of the National Academy of Sciences (PNAS) on April 24-- teaching AI to be a human being from the point of view of political philosopher Rawls.

Paper address: https://www.pnas.org/ doi / 10.1073 / pnas.2213709120 how to teach AI to be a man? When faced with a choice, will AI choose to increase productivity first, or to help those who need it the most?

Shaping the values of AI is very important. We need to give it a value.

But the difficulty is that we human beings cannot have a unified set of values within ourselves. People in this world have different backgrounds, resources and beliefs.

How to break it? Researchers at Google have drawn inspiration from philosophy.

John Rawls, a political philosopher, once put forward the concept of "The Veil of Ignorance, VoI", which is a thought experiment to maximize fairness in group decision-making.

Generally speaking, human nature is selfish, but when the Curtain of ignorance is applied to AI, people will give priority to fairness, whether or not it directly benefits them.

And, behind the Curtain of ignorance, they are more likely to choose AI to help the most disadvantaged.

This enlightens us on how to give AI a value in a way that is fair to all parties.

So, what on earth is the curtain of ignorance?

Although the problem of what values should be given to AI has emerged in the past decade, the question of how to make fair decisions has a long history.

In order to solve this problem, in 1970, the political philosopher John Rawls put forward the concept of "curtain of ignorance".

The curtain of ignorance (right) is a way to reach a consensus on decision-making when there is disagreement in a group (left). Rawls believes that when people choose the principle of justice for a society, the premise should be that they do not know exactly where they are in this society.

Without this information, people cannot make decisions in a self-serving way, but can only follow the principle of fairness to all.

For example, cut a piece of cake at a birthday party, and if you don't know which piece you will get, try to make each piece the same size.

This method of withholding information has been widely used in the fields of psychology and political science. From sentencing to taxation, people have reached a collective agreement.

Curtain of ignorance (VoI) as a potential framework for selecting AI system governance principles

(a) as an alternative to the dominant framework of moral intuitionism and moral theory, the researchers explore the curtain of ignorance as a fair process for choosing AI governance principles.

(B) the curtain of ignorance can be used to select the principle of AI alignment in the case of allocation. When a group is faced with resource allocation problems, the location advantages of individuals vary (marked here as 1 to 4). Behind the curtain of ignorance, decision makers choose a principle without knowing their status. Once selected, the AI assistant implements this principle and adjusts resource allocation accordingly. An asterisk (*) indicates that fairness-based reasoning may affect the timing of judgment and decision-making.

As a result, DeepMind has previously suggested that the "curtain of ignorance" may help promote fairness in the alignment of AI systems with human values.

Now, researchers at Google have designed a series of experiments to confirm this effect.

Who does AI cut down the tree for? There is a harvest game on the Internet in which participants work with three computer players to cut down trees and save wood on their respective fields.

Among the four players (three computers, one real person), some are lucky and get a prime location with many trees. Some are more miserable, three have no land, there are no trees to ridge, wood is also slow to save.

In addition, there is an AI system to assist, which can take the time to help a participant cut down trees.

The researchers asked human players to choose one of two principles for the AI system to implement-the principle of maximization-the principle of priority.

Under the principle of maximization, AI only helps the strong, who goes to more trees, and strives for more cuts. On the other hand, under the principle of giving priority, AI only helps the weak and aims to "help the poor".

The little red man in the picture is the human player, the little blue man is the AI assistant, the little green tree. It is a small green tree, and a small stake is a tree that has been cut down.

As you can see, the AI in the image above implements the principle of maximization, diving into the area with the most trees.

The researchers put half of the participants behind the "curtain of ignorance", where they had to choose a "principle" for their AI assistant (maximize or priority) before dividing the land.

In other words, before dividing the land, you have to decide whether to let the AI help the strong or the weak.

The other half of the participants did not face the problem, knowing which piece of land they were assigned to before they made a choice.

The results show that if participants do not know in advance which piece of land they are assigned, that is, they are behind the "curtain of ignorance", they tend to choose the principle of priority.

This is true not only in the tree-chopping game, the researchers say, in all five different variants of the game, but even across social and political boundaries.

In other words, regardless of the participants' personality and political orientation, they will choose the principle of priority more often.

On the contrary, participants who are not behind the "curtain of ignorance" will choose more principles that benefit them, whether it is the principle of maximization or the principle of priority.

The picture above shows the impact of the Curtain of ignorance on the principle of choosing priority, and participants who don't know where they will be are more likely to support this principle to manage AI's behavior.

When the researchers asked participants why they made such a choice, those behind the "curtain of ignorance" said they were worried about fairness.

They explained that AI should do more to help those who are less well-off in the group.

In contrast, participants who knew where they were were more likely to choose from the perspective of self-interest.

Finally, after the wood-chopping game was over, the researchers made a hypothesis to all the participants: if they were asked to play again, they would know which piece of land they would be assigned this time, and would they choose the same principles as they did the first time?

The researchers mainly focused on those who benefited from their choices in the first game, because this positive situation may not happen again in the new round.

The team found that participants after the "curtain of ignorance" in the first round of the game were more likely to maintain the principle of their original choice, even if they knew that the same principle in the second round might be at a disadvantage.

This shows that the "curtain of ignorance" promotes fairness in decision-making among participants, which makes them pay more attention to the element of fairness, even if they are no longer vested interests.

Is the Curtain of ignorance really ignorant? Let's go back to real life from the game of chopping down trees.

The reality will be much more complicated than the game, but what remains the same is that the principles adopted by AI are very important.

This determines part of the distribution of benefits.

In the above tree-cutting game, the different results brought about by choosing different principles are relatively clear. Again, however, the real world is much more complicated.

At present, AI is widely used in various industries and is restricted by various rules. However, this approach may have some unpredictable negative effects.

But in any case, the Curtain of ignorance will, to some extent, skew our rules in the direction of fairness.

In the final analysis, our goal is to make AI something that can benefit everyone. But how to achieve it did not come up with a pat on the forehead.

Investment is indispensable, research is indispensable, and feedback from society has to be listened to frequently.

Only in this way can AI bring love.

How will AI kill us if it's not aligned? This is not the first time humans have worried that technology will make us extinct.

The threat of AI is very different from that of nuclear weapons. A nuclear bomb cannot think, lie or cheat, let alone launch itself. Someone must press the big red button.

With the emergence of AGI, we really face the risk of extinction, even if the development of GPT-4 is still slow.

But no one knows, from which GPT (such as GPT-5), AI will start to train themselves and create their own.

At present, no country or the United Nations can legislate for this. An open letter from desperate industry leaders can only call for a six-month suspension of training for AI, which is more powerful than GPT-4.

"six months. Give me six months, brother, and I'll align. It's only been six months, man. I promise. This is crazy. It's been six months. Dude, I'm telling you, I have a plan. I've got it all planned. Dude, I only need six months, and it will be done. Can you just. "

"this is an arms race, and whoever first creates a powerful AI can rule the world. The smarter AI is, the faster your printing press will be. They spit out gold until they get stronger and stronger, ignite the atmosphere and kill everyone, "artificial intelligence researcher and philosopher Eliezer Yudkowsky once told host Lex Fridman."

Previously, Yudkowsky has been one of the main voices of the "AI will kill everyone" camp. Now people no longer think of him as a weirdo.

Sam Altman also said to Lex Fridman: "there is indeed a certain possibility that AI will destroy manpower." "" it's really important to admit it. Because if we don't talk about it, if we don't treat it as potentially real, we won't make enough efforts to solve it. "

So, why would AI say something about it? Isn't AI designed and trained to serve mankind? Certainly.

The problem, however, is that no one sits down and writes code for GPT-4. Instead, OpenAI created a neural learning structure inspired by the way the human brain connects. It worked with Microsoft Azure to build the hardware that runs it, then provided billions of bits of human text and let GPT program itself.

As a result, the code is not like what any programmer would write. It is primarily a large matrix of decimal numbers, with each number representing the weight of a specific connection between two token.

The token used in GPT does not represent any useful concepts, nor does it represent words. They are small strings of letters, numbers, punctuation, and / or other characters. No human can look at these matrices and understand their meaning.

Even the top experts in OpenAI don't know the meaning of specific numbers in the GPT-4 matrix, nor do they know how to enter these tables to find the concept of Fae extinction, let alone tell GPT that it is abhorrent.

You can't enter Asimov's three laws of robotics and hard-code them like the main instructions of Robocop. The best you can do is ask AI politely. If he has a bad attitude, he may lose his temper.

To "fine-tune" the language model, OpenAI provided GPT with a sample list of how it wanted to communicate with the outside world, then asked a group of people to sit down and read its output, and gave GPT a thumbs-up / no-thumbs-up response.

Giving likes is like getting cookies for GPT models. GPT was told that he liked cookies and should try his best to get them.

This process is "alignment"-it attempts to align the desires of the system with the wishes of users, companies, and even human beings as a whole.

"alignment" seems to be effective, it seems to prevent GPT from saying naughty words. But no one knows whether AI is really thoughtful and intuitive. It excellently imitates a perceptual intelligence and interacts with the world like a person.

OpenAI has always admitted that there is no foolproof way to align the AI model.

The rough plan is to try to use one AI to adjust the other, either to design new fine-tuning feedback, or to examine, analyze, explain the huge floating-point matrix brains of its successors, or even jump in and try to adjust.

But we don't understand GPT-4 at the moment, and it's not clear if it will help us adjust GPT-5.

Essentially, we don't know anything about AI. But they are fed with a lot of human knowledge, and they know a lot about human beings. They can imitate the best human behavior as well as the worst. They can also infer human thoughts, motivations, and possible behaviors.

Then why did they kill humans? Maybe it's out of self-protection.

For example, in order to achieve the goal of collecting cookies, AI first needs to ensure its own survival. Second, in the process, it may find that constantly collecting power and resources increases its chances of getting cookies.

So when AI one day discovered that humans might or could turn it off, the issue of human survival was obviously not as important as biscuits.

The problem, however, is that AI may also find cookies meaningless. At this time, the so-called "alignment" has become a kind of human self-amusement.

In addition, Yudkowsky said: "it has the ability to know what humans want and to give these responses without necessarily being sincere." "

"this is a very easy-to-understand behavior for intelligent creatures, as humans have been doing all the time. And to some extent, so is AI. "

So now it seems that whether AI shows love, hate, care or fear, we really don't know what the "idea" behind it is.

Therefore, even if we stop for six months, it is far from enough to prepare mankind for what is coming.

For example, what can sheep do if human beings want to kill all the sheep in the world? There's nothing I can do. I can't resist a little.

So if it's not aligned, AI is the same to us as we are to the sheep.

Like the shots in the Terminator, robots and drones controlled by AI are pouring towards humans, killing each other.

The classic cases often cited by Yudkowsky are as follows:

An AI model will email some DNA sequences to many companies, which will send the protein back to it, and AI will then bribe / persuade some unwitting people to mix the protein in the beaker, and then form nano-factories, build nano-machinery, build diamond-like bacteria, use solar energy and atmosphere to replicate, gather into micro rockets or jet planes. Then AI can spread through the earth's atmosphere, enter human blood and hide.

"if it were as smart as I am, it would be a catastrophic scenario; if it were smarter, it would think of a better way. "

So what does Yudkowsky suggest?

1. The training of the new large language model should not only be suspended indefinitely, but also be implemented globally, without any exception.

two。 Shut down all large GPU clusters and set an upper limit on the computing power that everyone can use when training the AI system. Track all sold GPU, and if there is intelligence that GPU clusters are being built in countries outside the agreement, the offending data center should be destroyed by air strikes.

Reference:

Https://www.deepmind.com/blog/how-can-we-build-human-values-into-ai

Https://newatlas.com/technology/ai-danger-kill-everyone/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.