Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Join hands with Google to launch a global mathematical problem-solving competition for large models to explore the no man's land of artificial intelligence mathematical reasoning.

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Mathematics has always been regarded as the touchstone of artificial intelligence. When the large language model breaks through its "congenital defects" (such as lack of complex reasoning ability, inaccurate numerical calculation, etc.) and successfully meets the challenges of mathematical reasoning, artificial intelligence will enter a new era. How to improve the mathematical reasoning ability of large language models and break through the inherent shortcomings of language models has become the focus of attention in the field of global artificial intelligence.

Exploring the uninhabited area of artificial intelligence mathematical reasoning

A few days ago, taking the lead of learning and thinking, together with experts and scholars from many well-known science and technology enterprises and universities, such as Google and Jinan University, and relying on the open and innovative platform of the new generation of artificial intelligence in the country of wisdom education, jointly held the AAAI2024 Global Mathematical reasoning Competition for large models, inviting global artificial intelligence experts, developers and enthusiasts to automatically solve mathematical problems in primary and secondary schools with large models. The aim is to explore and solve the challenges of artificial intelligence in the field of mathematics.

AAAI (Association for the Advancement of Artificial Intelligence), founded by computer science and artificial intelligence scientists Allen Newell, Marvin Minsky and John McCarthy, is one of the most authoritative and important international associations in the field of artificial intelligence. AAAI conference is recommended as a class A conference by the Chinese computer Society (CCF).

During the competition, contestants are required to use a large model to generate reasoning steps and answers for a given mathematical problem. The organizers will rank the contestants by comparing the accuracy between the output answers of the contestants' model and the correct answers. The contestant with the highest accuracy will win the competition.

In order to fully explore the mathematical reasoning ability of all kinds of large models, the competition is divided into two tracks: Chinese math problem solving and English math problem solving. Learning and thinking to provide Chinese and English data sets-TAL-SAQ7K-CN and TAL-SAQ6K-EN. The data set includes the real problems of mathematics competitions in primary and secondary schools at home and abroad. The format of the questions has been carefully processed, and each topic contains the content of the topic, the difficulty level of the topic and the chain of knowledge points from coarse-grained to fine-grained. At the same time, TAL-SAQ7K-CN, TAL-SAQ6K-EN datasets relate to the fact that mathematical expressions have been processed into a unified text pattern Latex.

The competition is divided into two stages, the first stage from now to December 31, the public list stage. The organizers randomly selected 30% of the data from TAL-SAQ7K-CN and TAL-SAQ6K-EN in advance for contestants to debug the large model. The second stage is from January 1 to January 10, 2024, which is the private list stage. During this period, the contestants used the large model tuned in the first phase to solve the remaining 70% of the questions in the data set. The results of this stage will be taken as the final results of the competition.

In addition, the organizers also provided three evaluation criteria for the competition as a reference, that is, the performance of GPT-3.5,GPT-4 and MathGPT, a mathematical model developed by ourselves in the future, on the public list. The specific results are as follows:

Track1:

Track2:

Do a good job in the basic work of mathematics in the era of AI model

Large model has been one of the hottest areas in the development of artificial intelligence in recent years, and the emergence of ChatGPT makes more people see the future direction of artificial intelligence. However, the existing large language models have obvious deficiencies in solving, explaining, answering and recommending mathematical problems, such as frequent mistakes in solving mathematical problems, and it is difficult to carry out complex operations.

As the initiator of this global large model mathematics competition, Learning and thinking expressed the hope that through this large model mathematics competition, we can explore and solve the deficiency that the existing models are good at liberal arts but not good at reasoning and calculation in science. Learning and thinking is also actively exploring solutions, such as learning and thinking MathGPT (official website link: https://www.mathgpt.com/) combines the ability of large models and computing engines to solve the three major challenges of large models in the field of mathematics-solving problems, explaining the steps, and interesting and vivid content. The former is responsible for understanding the problem, parsing it step by step, and calling the computing engine at appropriate steps, so as to improve the accuracy. By training the model based on the data of the problem-solving process of a large number of famous teachers, the problem-solving steps of the model can be clearer. Then introduce the teaching ideas and methods of excellent teachers, the model can also further improve the interest in solving problems.

Taking a series of questions as an example, the answer given by MathGPT consists of three parts: "analysis", "detailed explanation" and "finishing point", which is more detailed than the rough explanation of the general model. "Analysis" provides the idea and way of thinking of the problem, and helps users to understand the problem better. "detailed explanation" gives the specific calculation method and answer, and finally prompts the examination points, difficulties and key points of the question. Help users review and reflect on the intention of the question, and draw inferences from examples.

As the first large-scale model in the field of mathematics in China, the mathematical computing ability of MathGPT has covered primary school, junior middle school and senior high school, and the types of questions include calculation problems, application problems, algebra problems and so on. Relevant technical reports show that in the test results of six open mathematical evaluation sets, including CEval-Math, AGIEval-Math, APE5K, CMMLU-Math, College entrance examination Mathematics and Math401, MathGPT has obtained the highest scores in many tests, and MathGPT has also performed well in C-Eval 's junior and senior high school general test sets.

In addition, Learning and thinking has also opened up the model training and test data set of MathGPT-TAL-SCQ5K-EN / CN (3K training set and 2K test set respectively) in GitHub, Hugging Face and other technology communities. The topic is in the form of single selection, which involves the mathematics content of small, initial and high stages, with detailed analysis steps to facilitate the training of COT.

As the construction unit of the new generation of artificial intelligence open innovation platform in the country of wisdom education, Learning and thinking has been actively involved in promoting the development and progress of artificial intelligence technology in our country. With the arrival of the era of large models, Xueershi hopes to use its years of accumulation in mathematics and AI, facing mathematics enthusiasts and scientific research institutions around the world, to do a good job in the basic work of mathematics in the era of AI models.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report