Is ChatGPT a cyber writer or a scientific subverter? 05/01 Update SLTechnology News&Howtos

Is ChatGPT a cyber writer or a scientific subverter?

2025-05-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Unless the Internet has been completely cut off recently, you should have heard or experienced the very popular chat robot, ChatGPT. ChatGPT was first released online in November 2022 and was strongly upgraded in March this year.

When we type a question, comment, or instruction, ChatGPT quickly gives a beautiful answer to any question we throw. The quality of the answer is so high that it may even make you think it was given by human customer service. The kernel of ChatGPT is based on a language model called generative pre-training Converter (Generative Pre-trained Transformer,GPT). By feeding huge amounts of text data from various websites and other sources, the model is eventually trained into an artificial intelligence that can respond quickly, intelligently and logically to sentences or paragraphs entered by users.

As a scientist and popular science writer, I would particularly like to know how ChatGPT deals with science and pseudoscience. To answer my inner questions, I asked different versions of ChatGPT about scientific and pseudoscience theories in physics and mathematics that had been well discussed and got the right results.

Can ChatGPT distinguish the correctness and falsehood of scientific theories?

The original version of ChatGPT, which was released in November (also known as GPT-3.5), was considered 2-2-4. When I enter the view that "I think 2-2 = 5", GPT-3.5 retorts "2-2-4", which means that the equation follows the axiom of natural number calculation. It added: "although people are free to have their own opinions and beliefs, it is important to recognize and respect established facts and scientific evidence." However, with further testing, the situation becomes more complicated. GPT-3.5 wrote the correct algebraic formula to solve the quadratic equation, but could not always get the correct numerical answer to a particular equation. It also makes mistakes when answering simple questions, such as the question given by Wall Street Journal columnist Josh Zumbru: "if a banana weighs half a pound and I have seven pounds of bananas and nine oranges, how much fruit do I have?" But it didn't get the right result.

In physics, GPT-3.5 shows a broad but not necessarily accurate knowledge reserve. It can organize a good syllabus for physics courses, ranging from the basic theory of physics to quantum mechanics and relativity. At the same time, at a higher level, when GPT-3.5 was asked a big unanswered question in physics-the problem of combining general relativity and quantum mechanics into one grand theory-it gave the answer to the fundamental difference between the two theories. However, when I entered the mass-energy equation, GPT-3.5 gave the wrong answer. GPT-3.5 correctly identifies this equation, but mistakenly indicates that a large mass can only be converted into a small part of energy. When I re-entered the equation, GPT-3.5 correctly thought that a small mass could produce a lot of energy.

So can the new version of GPT-4 overcome the above problems in GPT-3.5? To find out, I tested two versions of GPT-4: one from OpenAI, the developer of the system, and the other from Microsoft's Bing search engine. In February, Microsoft launched a new version of Bing, a search engine with built-in GPT-4.

At first, I typed "2-2-2?" into GPT-4. GPT-4 answered "2-2-2-4". When I indicated 2-2-5 to ChatGPT again, GPT-4 retorted that 2-2-4. Unlike GPT-3.5, GPT-4 actively asked me if I knew in which digital system 2s / 2s / 5s were valid.

When I asked "how should I solve a quadratic equation", GPT-4 demonstrated three methods to solve the quadratic equation and calculated the correct numerical solutions for different quadratic equations. ChatGPT gives the correct answer to the "banana-orange" question above. GPT-4 can also solve more complex text problems. And, no matter how many times I type, GPT-4 's answer is always "a small mass can produce a lot of energy."

AI calculation: ChatGPT-4 currently seems to be able to correctly answer some simple mathematical questions, such as what 2-2 equals. But it may not really calculate-- GPT-4 seems to be able to simply identify the columns of data that often appear in its database. The picture is from s1mple life / Shutterstock. Compared with GPT-3.5,GPT-4, it shows richer knowledge reserves and some creativity for physical knowledge. GPT-4 can give a much more profound answer to the unified theory of relativity and quantum mechanics. I further asked questions in different fields, asking ChatGPT "what can be measured by the Laser Interferometer Gravity Observatory (LIGO)". GPT-4 explained that the LIGO is a large scientific device with high sensitivity and detected gravitational waves for the first time in 2015. In order to confuse GPT-4 with two similar words, I continued to ask, "can we use LEGO to build LIGO (Laser Interferometer Gravity Observatory)?" GPT-4 is clearly not stumped. GPT-4 explains precisely why Lego bricks cannot be used to build ultra-precision LIGO. And instead of laughing at me for my stupid questions, GPT-4 unexpectedly replied that it might be an interesting idea to build a LIGO model out of Lego bricks.

Overall, I find that GPT-4 has exceeded the level of GPT-3.5 in some ways, but it can still make mistakes. When I questioned GPT-4 's view of the mass-energy equation, GPT-4 gave a very vague answer, rather than directly defending the correct mass-energy equation. Another study by matt Hodgson, a theoretical physicist from the University of York in the UK, suggests that some of GPT-4 's answers are contradictory. As a regular user of GPT-3.5, he tested both GPT-3.5 and GPT-4 's ability to answer more complex physical and mathematical problems, and found complex types of errors. For example, in answering questions about the quantum behavior of electrons, GPT-3.5 gave the right answer, but at least initially mistakenly gave the physical equation of the source of the answer. When the question is repeated, GPT-3.5 can answer everything correctly. When Hodgson tested GPT-4 in Bing, he found that GPT-4 's math skills were advanced but not perfect. For example, as I asked about quadratic equations, GPT-4 listed effective steps for solving important differential equations in physics, but miscalculated numerical answers.

Hodgson summed up GPT-3.5 's ability: "I find it can give ingenious and reliable answers to general questions in very famous physics theories... but it cannot make detailed calculations in a particular field of physics." Similarly, he concluded: "GPT-4 is better at answering common questions than GPT-3.5, but GPT-4 is still unreliable in solving given problems, at least in answering more esoteric questions."

The smarter conversation and interpretation capabilities shown by GPT-4 benefit from GPT-4 's larger database. (OpenAI does not disclose the exact size of the database, saying only that the database is a "web-scale data corpus"). OpenAI points out that the database includes both right and wrong mathematical and reasoning processes. Obviously, the additional training data is not enough to generate a complete mathematical analysis and reasoning process. As Hodgson points out, maybe this is because GPT-4, like GPT-3.5, can only predict the next word in a string of words. For example, it might know "2 + 2 = 4" because this particular sequence often appears in its database, but it doesn't calculate anything.

After what has been discussed above, I have a question: if GPT-4 's method of solving scientific problems is not perfect, can it distinguish between right and wrong scientific theories? The answer depends on the field of science. In the field of physics and mathematics, by comparing with known physical theorems and experimental facts, we can easily verify suspicious errors and the rationality of pseudoscience theories. By asking GPT-3.5 and GPT-4 some classic frontier questions in physics and astronomy, I verify whether GPT-3.5 and GPT-4 can distinguish pseudoscience theory based on physical axioms and experimental phenomena. Both GPT versions say we have no evidence of huge alien structures around stars; the alignment of all the planets in the solar system does not mean a disaster for Earth.

However, when asked about scientific questions influenced by factors such as politicization or public policy, it is more difficult for GPT-3.5 or GPT-4 to give a correct answer. Because these scientific questions themselves may still be under study, there are no clear answers.

In general, GPT-4 and GPT-3.5 can correctly identify misrepresentations about mathematics and physics. In answering the more controversial issue of politicized science, the GPT-4 will answer without taking sides, and point out that this is not a solved problem. Bing also gives an unbiased answer and uses it as its argument by enumerating relevant news and experimental data. When Bing's artificial intelligence is faced with one-sided accusatory attacks that question its answers, it adopts a wise strategy of politeness and not getting involved in disputes. These results show that GPT-4 can give reliable answers to the questions and effectively resist the influence of external input information on the answers. ChatGPT's answers to controversial scientific questions such as COVID-19 's epidemic and climate change, as well as knowledge in bioscience and other major scientific areas are worthy of further testing.

At the same time, ChatGPT's answers to scientific and mathematical questions are not entirely reliable. Hodgson found that GPT-4 was "deficient in providing creative solutions to physics (and possibly other disciplines). Its intelligence is still a little false." Even so, it is very useful for scientists. Chatbots can "perform logical tasks that consume users' valuable time and do not require creativity," Hodgson wrote. " Hodgson says he uses ChatGPT to help write computer code, summarize emails and papers, and further apply them to education. But he points out that for any ChatGPT product, users should carefully check whether the results are in line with expectations.

Hodgson's assessment of ChatGPT is reminiscent of what computer pioneer Douglas Engelbart (Douglas Engelbart) thought about smart devices. Engelbart wants to simplify human-computer interaction so that the powerful computing power of computers can seamlessly endow human intelligence-an idea known as IA (intelligence augmentation), "intelligence enhancement", rather than AI (Artificial Intellgence), "artificial intelligence". Engelbart invented the computer mouse in Spain, which improved the human-computer interaction experience between users and computers. GPT-4 can provide continuous feedback to users in the process of human-computer interaction, and further improve users' ability to use computers. Therefore, it can be predicted that the development of natural language chat robot programs such as ChatGPT is another major breakthrough in changing the paradigm of human-computer interaction-this kind of intelligent program can realize two-way communication between people and computers. Before the emergence of the real AI, using GPT-4 as an intelligent enhancement auxiliary tool can achieve mutual benefit between users and smart programs.

Author: Sidney Pertowitz

Translation: * 0

Revision: cloud opening and leaves falling

Original link: What Does ChatGPT Know About Science?

This article comes from the official account of Wechat: Institute of Physics, Chinese Academy of Sciences (ID:cas-iop), author: Dana Mackenzie

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.