Artificial intelligence can already solve complex mathematical problems, and what other jobs cannot be replaced? 04/28 Update SLTechnology News&Howtos

Artificial intelligence can already solve complex mathematical problems, and what other jobs cannot be replaced?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Large language model (LLMs) refers to an artificial intelligence model that uses machine learning technology and uses a large amount of text data for training in order to understand and generate natural language texts naturally. These models can be used for natural language processing tasks, such as text classification, text generation, language translation, question answering and summary generation. In recent years, due to the progress of deep learning technology, large-scale language models have made remarkable achievements, such as OpenAI's GPT model and Google's BERT model. These models seem to have human intelligence and creativity. They provide detailed and clear answers to written questions.

For decades, mathematicians have been trying to turn proof into computer code, a process known as formalization. If you code the proof and the computer runs the code without errors, you will know that the proof is correct. But it can take hundreds or thousands of hours to prove a proposition.

In the past five years, artificial intelligence researchers have begun to teach LLMs to formalize mathematical sentences automatically. LLMs has been able to translate one natural language into another. But the transition from math to code is a daunting challenge.

Although LLMs has achieved great success in areas such as natural language processing, they also have some problems:

Data bias: the performance of LLMs depends on its training data. If there are biases in the training data, the model will learn these deviations, thus affecting its performance.

Biases: LLMs may learn biases from its training data and reflect these biases in the text it generates. This can lead to discriminatory language or misstatements.

Knowledge representation: LLMs doesn't really understand the language or the world, they just learn the patterns that appear in the data. This means that they may have problems dealing with new situations.

Model size: LLMs requires a lot of computing resources and storage space, as well as a lot of training data. This makes training and deployment costs very high.

Environment dependence: the performance of LLMs depends on the context and environment of the input. If the input data is different from the training data, they may produce incorrect output.

Based on the above problems, these models sometimes make illogical statements or confidently tell lies into the truth. "We don't want to create a language model that speaks like a human, we want it to understand what it's talking about," said Wu Yuhuai of Google AI.

Wu is the co-author of two recent papers, which propose a way to achieve this goal. They are about a very specific application: training artificial intelligence systems to do math.

The first paper describes how to teach LLM to convert ordinary mathematical statements into formal code that a computer can run and check. In the second part, LLM is trained not only to understand natural language mathematical problems, but also to use a system called Minerva to actually solve these problems.

Minerva refers to a system for solving mathematical problems. It is a system that combines natural language processing and mathematical reasoning. The function of this system is to help the computer understand the mathematical problems in the natural language, so that the answers can be obtained through reasoning and calculation. Specifically, the system includes several subsystems, including natural language processing, problem modeling, mathematical knowledge base and reasoning engine. Through the cooperation of these subsystems, Minerva can effectively solve the mathematical problems of natural language.

In short, these papers put forward the blueprint of artificial intelligence design in the future, and LLM can learn reasoning through mathematical thinking.

The researchers mainly used a LLM called Codex (based on GPT-3). In order for Codex to understand mathematics well and achieve automatic formalization, they only provide two examples of natural language mathematical problems and their formal code translation. After a brief training, Codex gave natural language statements of nearly 4000 math problems from high school competitions. At first, Codex was slightly less than 30 per cent accurate. When it failed, it created some terms to fill the gaps in the translation dictionary.

Prior to this study, Codex had never attempted to translate between natural language and formal mathematical code. But Codex is familiar with code through training on GitHub, as well as natural language math on the Internet. On that basis, the researchers only need to show it a few examples they want, and Codex can start connecting these dots.

Researchers are trying not only to teach LLMs how to translate math problems, but also to teach them how to solve problems.

Minerva mathematics

Although the second paper is independent of the early automatic formalization work, it also has a similar style. Google's research team trained a LLM to answer high school contest-level math questions in detail, such as "what are the y coordinates of a straight line parallel to y = 4x + 6 (5Power10) that intersects the y axis?"

The author starts with a LLM called PaLM, which has been trained in general natural language content, similar to GPT-3. They named the enhanced model Minerva.

The researchers showed Minerva four examples they wanted. They then tested the model on a series of quantitative reasoning problems. The performance of Minerva varies from subject to subject: its accuracy is slightly more than half in some subjects such as algebra and slightly less than half in others such as geometry.

One of the authors' concerns is that Minerva answers questions correctly simply because it has seen these or similar questions in the training data. This problem, known as "pollution", makes it difficult to know whether a model is really solving the problem or just copying other people's work.

To prevent this possibility, the researchers asked Minerva to take Poland's 2022 national math test, which answered 65 per cent of the questions correctly. This shows that the trained model has the ability to solve mathematical problems.

Bridge

While Minerva's work is impressive, it comes with a serious problem, which the author points out: Minerva has no way to automatically verify that it answers the question correctly. Even if it does answer a question correctly, it cannot check whether the steps it has taken are effective.

In other words, Minerva can't check its work, which means it needs to rely on human feedback to get better. As a result, researchers doubt whether this approach can be extended to complex problems.

Wu points out that, on the one hand, if you study natural language or Minerva-type reasoning, there is a lot of data that can be used-the whole mathematical Internet, but essentially you can't use it for reinforcement learning. On the other hand, proof assistants like Isabelle / HOL provide a basic environment, but there is little data to train. We need some kind of bridge to connect them.

Automatic formalization is the bridge. The improvement of automatic formalization can help mathematicians automate in writing proof and verifying the correctness of the work.

By combining the advances of these two papers, systems like Minerva can first automatically formalize natural language math problems, then solve them, and use proof assistants to check their work. This real-time check will provide the necessary feedback for reinforcement learning so that these programs can learn from mistakes. Finally, they will get a provably correct answer, accompanied by a series of logical steps that effectively combine the power of LLM and reinforcement learning.

Artificial intelligence researchers have broader goals. They think that mathematics is the perfect proof of developing artificial intelligence reasoning skills, because it can be said to be the most difficult of all reasoning tasks. According to this idea, if a machine can perform mathematical reasoning effectively, it should naturally acquire other skills, such as the ability to write computer code or provide medical diagnosis.

However, there are still some jobs that can not be replaced by current artificial intelligence, such as:

Artistic creation: the creation of real and creative works of art requires human creativity and emotional experience.

Psychotherapy: in the face of serious psychological problems, the treatment and support provided by human professional psychologists cannot be replaced.

Manual labor: although there are robots that can perform some manual work, performing some complex tasks still requires human skills.

Social relationships: building and maintaining relationships requires human emotional and social skills.

In short, in many areas, human emotion, judgment and creativity are irreplaceable.

This article comes from the official account of Wechat: Lao Hu Shuo Science (ID:LaohuSci). Author: I am Lao Hu.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.