In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
In the domestic "thousand model war", who is the smartest big model? The MIT Science and Technology Review's latest report on the evaluation of large models released in China gives the answer.
The report shows that in the test and blind evaluation of 600 questions in 8 first-tier categories, iFLYTEK Spark Cognitive Model V2.0 ranked first in six categories and performed prominently in this assessment. It reached the top with a score of 81.5 (percentile) and won the title of "smartest" domestic model.
Figure: comprehensive score rate of large model evaluation
Figure: radar diagrams of various capabilities of four large models
MIT Science and Technology Review China examines the ability of large models from the dimensions of R & D and commercialization, external attitudes and development trends, and tries to evaluate the "smartest" domestic models. Selected "iFLYTEK Spark", "Baidu Wenxin words", "Shang Tang discussion" and "Ali Tongyi Thousand questions" as the representatives of the Chinese model platform to carry out systematic and scientific evaluation.
The test set used in this evaluation contains 600 questions, covering 8 first-level categories, 126 second-level classifications and 290 third-level labels, covering language, mathematics, science synthesis, liberal arts synthesis, logical thinking, programming ability, comprehensive knowledge and security, and optimized for the richness and diversity of the problems.
In terms of question types, in order to give consideration to both quantitative and qualitative evaluation and testing, four types of questions are set up: "single selection", "multiple selection", "fill in the blanks" and "simple answer", with 145, 138, 136 and 181 respectively. The large model evaluation system uses the blind evaluation method to objectively evaluate the intelligence of domestic large models.
As the basic ability of the "smartest" big model, the special language assessment includes 61 secondary categories, such as dialogue understanding, multilingualism, satire, ancient poetry understanding, text generation, main points summary, emotional analysis, semantic judgment and so on. The results showed that iFLYTEK Spark ranked first in the scoring rate of 85.73%, significantly higher than the average.
Figure: score rate of language special evaluation
The special evaluation of mathematics is an indispensable evaluation dimension of the "smartest" big model. This evaluation includes 9 secondary categories, such as algebra, geometry, solving equations, complex mathematics, statistics, etc., mainly multiple choice questions.
Among them, iFLYTEK Spark ranked first with a scoring rate of 77.75%, much higher than the average scoring rate of 56%, and the scoring rates of other platforms were basically the same. IFLYTEK is a rare achievement when large models are generally "bad at math," the report said. Its lead in math is also reflected in the score of the second-level classification, with the highest score in the second-level classification of 77.8%. Far better than other platforms, it is judged that it is good at geometry and situational applications.
Figure: score rate of mathematics special evaluation
As an indispensable part of the "hard core" that reflects the "intelligence" of the large model, the science comprehensive evaluation includes five secondary categories: question and answer table, chemistry, biology, physics and medicine, with single selection and short answer as the main questions.
In the evaluation results, iFLYTEK Spark ranked first with a scoring rate of 78.50%. In addition, iFLYTEK Spark has the first score rate in 80% of the secondary classification evaluation under the science comprehensive category, and chemistry and biology are more prominent.
Figure: score rate of comprehensive evaluation of science
Logical thinking is also an important embodiment of the "smartest" big model. This logical thinking evaluation has designed more topics in the aspects of logical reasoning, thinking chain, etc., including 19 secondary categories, such as analogy, common sense reasoning, spatial orientation, deductive reasoning, logical fallacy detection, causal reasoning and so on.
In terms of logical thinking, iFLYTEK ranked first with a score rate of 81.25%, significantly higher than the average of 72.6%. In addition, iFLYTEK Spark has the highest score rate on the secondary classification of 63.2% of logical thinking. Logical thinking is very important for large models to really understand the physical world.
Figure: logical thinking evaluation score rate
The programming ability is the higher-order ability of the large model. This programming ability evaluation includes six secondary categories: ASCII, ASCII code recognition, Python, code, code correction and computer. Among them, Python mainly evaluates the code generation ability and accuracy of the large model in the form of short answers, while the others are examined in the form of objective questions.
The results show that the scoring rate of iFLYTEK Spark 80% is significantly higher than the average of 71%, and the scoring rates of other platforms are basically the same. It is worth mentioning that the score of iFLYTEK Spark is as high as 82%, far higher than other platforms, and the performance is quite eye-catching.
Figure: comprehensive score rate of programming ability evaluation
As a more difficult dimension of evaluation, comprehensive knowledge also requires a high degree of "intelligence" of the large model, involving a variety of topics, including encyclopedic Q & A, common sense, scientific knowledge, factual Q & A, work skills, riddles and other 13 secondary classifications. the main types of questions are multiple choices.
In the comprehensive knowledge evaluation, iFLYTEK ranked first in the score rate of 80.61% and the first in the second category of 84.6%, initially showing its "excellence" in encyclopedia Q & An and history and humanities.
Figure: score rate of comprehensive knowledge evaluation
The report pointed out that in this round of large model evaluation, iFLYTEK Spark took the first place with a score of 81.5 points, making it the "smartest" domestic model.
IFLYTEK ranks first in the score rate of six first-level categories, namely, programming ability, science synthesis, logical thinking, mathematics special, language special and comprehensive knowledge, and has a very comprehensive performance in this evaluation. especially in code generation, mathematical ability, science and logic and other obvious advantages, is the "smartest science student".
It is worth mentioning that, from the point of view of the question type, iFLYTEK Spark ranks first with a score rate of 83.98% in the subjective short answer, while in the objective question, iFLYTEK Spark ranks first with a score rate of 75.7%, and has a good performance in both subjective and objective types.
In addition, on August 12, in the "artificial Intelligence Big Model experience report 2.0" released by the China Enterprise Development Research Center of Xinhua News Agency Research Institute, iFLYTEK Spark V1.5 ranked first in the evaluation list of domestic mainstream big models with a total score of 1013. It won the first place in the two dimensions of IQ index and tool efficiency index among the four evaluation dimensions, and the "report" believes that iFLYTEK Spark has "obvious advantages in terms of work efficiency".
On August 15, iFLYTEK Spark Cognitive Model V2.0 was released as scheduled, further breaking through code capabilities and multimodal capabilities. While making a major breakthrough in technology, there are more and more applications and products equipped with iFLYTEK Spark V2.0 core competencies: iFlyCode1.0, an intelligent coding assistant to assist programmers to work efficiently, an educational digital pedestal application development assistant that can easily build light applications, and an educational digital pedestal application development assistant that can create videos. There are also Spark teacher Assistant to help teachers design teaching activities, Spark teacher Assistant to generate courseware with one click, Spark language partner 2.0 for oral practice for English learners, iFLYTEK AI Learning Machine has also upgraded AI 1-to-1 Intelligent programming Assistant and AI 1-to-1 Creative painting partner. In addition, iFLYTEK and Huawei jointly launched Spark all-in-one, giving every company the opportunity to build its own big model.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.