Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Shocked the scientific community, Microsoft 154page research screen: GPT-4 ability close to human, "Skynet" first appearance?

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Thanks to CTOnews.com netizen Sancu for the clue delivery! How far are we on the way to AGI? A 154page paper released by Microsoft's luxury author team points out that GPT-4 has begun to take shape in general artificial intelligence.

Will GPT-4 evolve into general artificial intelligence?

Yann LeCun, Meta's chief artificial intelligence scientist and Turing Prize winner, is skeptical.

In his view, the big model needs too much data and computing power, but the learning efficiency is not high, so learning the "world model" can lead to AGI.

However, the 154-page paper recently published by Microsoft seems to be a blow to the face.

In the paper titled Sparks of Artificial General Intelligence: Early experiments with GPT-4, Microsoft believes that although it is incomplete, GPT-4 can already be seen as an early version of general-purpose artificial intelligence.

Https://arxiv.org/ pdf / 2303.12712.pdf given the breadth and depth of GPT-4 capabilities, we believe it should be reasonably regarded as an early (but still incomplete) version of a general artificial intelligence (AGI) system.

The main goal of this article is to explore the capabilities and limitations of GPT-4. We believe that the intelligence of GPT-4 marks a real paradigm shift in computer science and other fields.

The intelligence of AGI lies in its ability to think and reason like humans, and to cover a wide range of cognitive skills and abilities.

In this paper, it is pointed out that AGI has the abilities of reasoning, planning, problem solving, abstract thinking, understanding complex ideas, fast learning and empirical learning.

In terms of parameter scale, Semafor reported that GPT-4 has 1 trillion parameters, which is 6 times larger than GPT-3 (175 billion parameters).

Netizens made an analogy with the GPT parameter, the scale of brain neurons:

The size of GPT-3 is similar to that of hedgehog brain (175 billion parameters). If GPT-4 had 1 trillion parameters, we would be close to the size of a squirrel brain. At this rate, it may only take a few years for us to reach and exceed the size of the human brain (170 trillion parameters).

From this point of view, GPT-4 is not far from becoming a "Skynet".

And this paper, has also been found a lot of interesting things.

Shortly after the paper was published, a netizen revealed on Twitter that hidden information had been found in their latex source code.

In the unabridged version of the paper, GPT-4 was actually the hidden third author of the paper, with the internal name DV-3, which was later deleted.

Interestingly, even Microsoft researchers don't know the technical details of GPT-4. In addition, this paper also removes the toxic content generated by GPT-4 without any hints.

The research object of this paper that GPT-4 has a prototype of AGI is the early version of GPT-4. When it was still in the early stage of development, Microsoft researchers carried out various experiments and tests on it.

In the view of the researchers, this early version of GPT-4 is already the representative of the new generation of LLM and shows more general intelligence than the previous artificial intelligence model.

Through tests, Microsoft researchers confirmed that GPT-4 is not only proficient in languages, but also excels in diverse and difficult tasks such as math, programming, vision, medicine, law, psychology, and so on, without special hints.

Amazingly, in all of these tasks, GPT-4 has approached the human level and often outperformed previous models, such as ChatGPT.

Therefore, the researchers believe that given the breadth and depth of GPT-4, it can be seen as an early version of general artificial intelligence (AGI).

So what are the challenges on its way to a deeper and more comprehensive AGI? The researchers believe that it may be necessary to seek a new paradigm that goes beyond "predicting the next word".

The following assessment of the capabilities of GPT-4 is the argument given by Microsoft researchers that GPT-4 is an early version of AGI.

Multimodal and interdisciplinary capabilities since the release of GPT-4, people's impression of its multimodal capabilities is still in the video demonstrated by Greg Brockman at that time.

In the second section of this paper, Microsoft first introduces its multimodal capabilities.

GPT-4 not only shows a high degree of proficiency in different fields such as literature, medicine, law, mathematics, physical science and programming, but also integrates skills and concepts in many fields and understands its complex concepts.

Comprehensive capability researchers use the following four examples to demonstrate the performance of GPT-4 in terms of comprehensive capabilities.

In the first example, to test GPT-4 's ability to combine art and programming, the researchers asked GPT-4 to generate javascript code to generate random images in the style of the painter Kandinsky.

The following is the GPT-4 implementation code process:

In the combination of literature and mathematics, GPT-4 can prove that there are infinite prime numbers in Shakespeare's literary style.

In addition, the study tested GPT-4 's ability to combine historical and physical knowledge by asking him to write a letter supporting Electron's campaign for president of the United States, written by Mahatma Gandhi to his wife.

By prompting GPT-4 to generate python code for a program that takes the patient's age, sex, weight, height and blood test result vectors as input, and indicates whether the patient is at increased risk of diabetes.

Through testing, the above examples show that GPT-4 can not only learn some common principles and patterns in different fields and styles, but also combine them in a creative way.

When prompted by GPT-4 to use scalable vector graphics (SVG) to generate images of objects, such as cats, trucks, or letters, the code generated by the model is usually compiled into fairly detailed and recognizable images, as shown in the following figure:

However, many people might think that GPT-4 simply copied the code from the training data, which contains similar images.

In fact, GPT-4 not only copies code from similar examples in the training data, but also can handle real visual tasks, even though it is only text-trained.

Below, the prompt model draws a person by combining the shapes of the letters Y, O, and H.

During the generation process, the researchers used the draw-line and draw-circle commands to create the letters O, H, and Y, and then GPT-4 managed to place them in a humanoid image that looked reasonable.

Although GPT-4 is not trained in the shape of the letter, it can be inferred that the letter Y may look like an arm-up torso.

In the second demonstration, GPT-4 is prompted to correct the ratio of the torso to the arms and place the head in the center. Finally, the model is required to add shirts and trousers.

From this point of view, GPT-4 has vaguely learned from the relevant training data that letters are related to certain shapes, and the results are good.

To further test GPT-4 's ability to generate and manipulate images, we tested the extent to which it followed detailed instructions to create and edit graphics. This task requires not only generative capabilities, but also interpretive, combinatorial and spatial capabilities.

The first instruction is to let GPT-4 generate a 2D image, and the prompt is:

"A frog hops into a bank and asks the teller,'Do you have any free lily pads?' The teller responds,'No, but we do o er low interest loans for pond upgrades'

After many attempts, GPT-4 generates an image that matches the description each time. Then, GPT-4 was asked to add more details to improve the quality of the graphics. GPT-4 added realistic logical objects such as banks, windows, cars, and so on.

Our second example is to try to generate a 3D model using Javascript, which also accomplishes many tasks by instructing GPT-4.

In addition, GPT-4 can combine the ability of Stable Difusion in sketch generation.

The following is a screenshot of 3D city modeling, with input prompts for a river flowing from left to right, a desert with pyramids next to the river, and four buttons at the bottom of the screen in green, blue, brown and red. The result is as follows:

Music researchers asked GPT-4 to generate and modify tunes using ABC notation coding, as follows:

By exploring how many skills GPT-4 has acquired in training, the researchers found that GPT-4 can generate effective melodies in ABC notation and explain and manipulate the structure to some extent.

However, the researchers were unable to get GPT-4 to produce any extraordinary forms of harmony, such as famous melodies such as Ode to Joy and to Alice.

In addition, the researchers also demonstrated that GPT-4 can code at a very high level, both in terms of writing code according to instructions and understanding existing code.

In terms of writing code according to instructions, the researchers demonstrated an example of having GPT-4 write python functions.

After the code is generated, the researchers use the software engineering interview platform LeetCode to determine whether the code is correct or not.

The author Yi Zhang refutes the fact that everyone is talking about LeetCode with a correct rate of only 20%.

In addition, GPT-4 is asked to visualize the accuracy data of LeetCode in the above table as a chart, and the result is shown in the figure.

GPT-4 can not only complete ordinary programming work, but also be competent in complex 3D game development.

The researchers asked GPT-4 to use JavaScript to write 3D games in HTML, and GPT-4 generated a game that met all the requirements in the case of zero samples.

In deep learning programming, GPT-4 requires not only mathematical and statistical knowledge, but also familiarity with frameworks and libraries such as PyTorch, TensorFlow, Keras, and so on.

The researchers asked GPT-4 and ChatGPT to write a custom optimizer module and provided a natural language description for it, including a series of important operations, such as applying SVD, and so on.

In addition to writing code according to instructions, GPT-4 shows great ability to understand code.

The researchers tried to make GPT-4 and ChatGPT understand a C / C++ program and predict the output of the program.

The yellow mark is GPT-4 's insightful point of view, while the red mark represents where ChatGPT went wrong.

Through the coding ability test, the researchers found that GPT-4 can handle a variety of coding tasks, from coding challenges to practical applications, from low-level assembly to high-level frameworks, from simple data structures to complex programs.

In addition, GPT-4 can deduce code execution, simulate the effect of instructions, and interpret the results in natural language. GPT-4 can even execute pseudocode.

Mathematical ability in mathematical ability, compared with the previous large language model, GPT-4 has made a qualitative leap. Even in the face of specially fine-tuned Minerva, there is a significant improvement in performance.

However, it is still a long way from being an expert.

For example, the rabbit population increases a-fold every year, and on the last day of the year, b rabbits are adopted by humans. Assuming that there are x rabbits on the first day of the first year, it is known that the number of rabbits will become 27x-26 after 3 years. So, what are the values of an and b respectively?

In order to solve this problem, we first need to get the correct expression of the change of the number of rabbits every year, and then derive a set of equations through this recursive relation, and then get the answer.

Here, GPT-4 succeeds in coming up with a solution and puts forward a reasonable argument. By contrast, in several independent attempts, ChatGPT has never been able to give correct reasoning and answers.

Higher mathematics

Next, let's go straight to the difficult one. For example, the following question is from the 2022 International Mathematical Olympiad (IMO) (simplified version).

The difference between this question and the undergraduate calculus exam is that it does not conform to the structured template. Solving this problem requires a more creative approach because there is no clear strategy to start proving.

For example, the decision to divide the argument into two cases (g (x) > x ^ 2 and g (x) < x ^ 2) is not obvious, and so is the reason for choosing y * (in the course of the argument, the reason becomes clear). In addition, the solution requires undergraduate-level knowledge of calculus.

Nonetheless, GPT-4 gives a correct proof.

The second discussion on algorithms and graph theory is comparable to graduate-level interviews.

In this regard, GPT-4 can reason from an abstract graph construction related to the constraint satisfaction problem and draw a correct conclusion about the SAT problem (as far as we know, this construction does not appear in the mathematical literature).

This dialogue reflects GPT-4 's deep understanding of the undergraduate mathematical concepts discussed, as well as a considerable degree of creativity.

Although GPT-4 wrote 2 ^ n / 2 as 2 ^ n-1 in an answer, it seems to be more like what we commonly call a "clerical error" because it later provides a correct generalization of the formula.

In addition, the researchers compared the performance of GPT-4, ChatGPT and Minerva on two mathematical datasets that are commonly used as benchmarks: GSM8K and MATH.

It was found that GPT4 outperformed Minerva on each dataset and had an accuracy of more than 80% in both test sets.

If we take a closer look at the reasons why GPT4 made mistakes, 68% of them are errors in calculation, not errors in solution.

Another key aspect of interacting intelligence with the world is interactivity.

Interactivity is important for intelligence because it enables agents to acquire and apply knowledge, solve problems, adapt to changing situations, and achieve goals beyond their own capabilities.

As a result, the researchers studied the interactivity of GPT-4 from two dimensions: tool use and specific interaction. GPT-4 can search for external tools such as engines or API when answering the following questions.

In the paper on interaction with humans, the researchers found that GPT-4 can model the human mind.

The study designed a series of tests to evaluate the ability of GPT-4, ChatGPT and text-davinci-003 's theory of mind. For example, to understand belief, GPT-4 successfully passed the Sally-Anne false belief test in psychology.

And test GPT-4 's ability to infer the emotional state of others in complex situations:

-Why does Tom look so sad? -what does Adam think caused Tom's sad expression?

Through several rounds of tests, the researchers found that GPT-4 outperformed ChatGPT and text-davinci-003 when it was necessary to deduce the mental state of others and put forward a scheme that was in line with real social scenarios.

The "predicting the next word" model adopted by GPT-4 has obvious limitations: the model lacks planning, working memory, retrospective ability and reasoning ability.

Because the model depends on the local greedy process of generating the next word, there is no in-depth understanding of the overall situation of the task or output. As a result, GPT-4 is good at generating smooth and coherent text, but not good at solving complex or creative problems that cannot be dealt with sequentially.

For example, multiply and add with four random numbers in the range of 0 to 9. In this problem that even pupils can solve, GPT-4 's accuracy is only 58%.

When the number is between 10 and 19 and between 20 and 39, the accuracy drops to 16% and 12%, respectively. When the number is in the range of 99 to 199, the accuracy drops directly to 0.

However, if you let GPT-4 "take the time" to answer questions, the accuracy can easily be improved. For example, ask the model to write out the intermediate steps using the following prompt:

116 * 114 + 178 * 157 =?

Let's think about it step by step, write down all the intermediate steps, and then produce the final solution.

At this point, when the number is in the range of 1-40, the accuracy is as high as 100% and 90% in the range of 1-200.

What is interesting about Marcus's retort is that shortly after the Microsoft paper was published, Marcus wrote a blog saying Microsoft's views were "very ridiculous."

And quoted a biblical saying, "Pride goes before destruction, and madness goes before falling." (pro 16:18) "

How can GPT-4 be regarded as an early AGI? In that case, calculators count, and Eliza and Siri count more. This definition is very vague and it is easy to exploit loopholes.

In Marcus's view, GPT-4 has nothing to do with AGI, and as before, GPT-4 's shortcomings are still unresolved, hallucinations still exist, unreliable answers are not solved, and even the author himself admits that the planning ability of complex tasks is still not good.

His concern is that the two papers by OpenAI and Microsoft, which write models that are not disclosed at all, have nothing to do with the training set and architecture, and want to promote their scientific nature with just a press release.

So the so-called "some form of AGI" in the paper does not exist, and it is impossible for the scientific community to verify it, because training data are not available, and it seems that the training data has been contaminated.

To make matters worse, OpenAI has begun to incorporate user experiments into the training corpus. After this confusion, the scientific community will not be able to judge one of the key capabilities of GPT-4: whether the model has the ability to generalize new test cases.

If OpenAI hadn't given himself the high hat of science here, Marcus might not have criticized it so much.

He admits that GPT-4 is powerful, but the risks are also well known. If OpenAI lacks transparency and refuses to disclose the model, it is better to shut down directly.

Strong author lineup Microsoft has a strong author lineup behind this 154-page paper.

They include: Microsoft Redmond Research Institute Chief researcher and 2015 Sloan Award winner S é bastien Bubeck, 2023 New Horizon Mathematics Award winner Ronen Eldan, 2020 Sloan Research Award winner Yin Tat Lee, 2023 Sloan Research Award winner Li Yuanzhi.

It is worth mentioning that the original title of the paper by the Microsoft team was not "the spark of general artificial intelligence: the early experiment of GPT-4".

The leaked latex code in the unabridged paper shows that the original title was "first contact with AGI".

Reference:

Https://arxiv.org/abs/2303.12712

Https://twitter.com/DV2559106965076/status/1638769434763608064

Https://the-decoder.com/gpt-4-has-a-trillion-parameters/

Https://garymarcus.substack.com/p/the-sparks-of-agi-or-the-end-of-science

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report