Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A hundred-day battle, pupated into a disc! IFLYTEK spark model is rated as the "smartest" big model in China.

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

"from its release on May 6 to today, it happens to be the 100-day battle for us to recognize the big model." Liu Qingfeng, chairman of iFLYTEK, said at the launch of iFLYTEK's cognitive model V2.0 on August 15. After the release of Spark V2.0, the eyes of the world once again focused on a new round of big model competition.

Recently, MIT Science and Technology Review China conducted an in-depth evaluation of the four mainstream Chinese models, namely, iFLYTEK Spark, Baidu Wenxin words, Shangtang discussion and AliTongyi Thousand questions. The results show that iFLYTEK Spark topped the list with the first total score.

The MIT Science and Technology Review is a media platform wholly owned by the world-renowned MIT. It has a strong authority in the industry and is also regarded as a Taishan Beidou in academic circles. This time, it uses the percentile system, and can refer to the 60% score rate as the "pass line". Except for two large models that have just passed the "pass line", Baidu Wenxin scored 75.2%, while iFLYTEK Spark scored the highest score of 81.5%. The average score rate of the four big models is 72.6%. It can be seen that iFLYTEK Spark alone "pull up" the average level of the Chinese model.

First-level classification test results of four large models of △ (part)

According to the 600 topics designed by the MIT Science and Technology Review, this horizontal review focuses on eight categories of capabilities (first-level classification), including large model language, mathematics, science, liberal arts, logic, programming, comprehensive knowledge and security. It also covers 126 second-level categories and 290 third-level tags. Through the anthropomorphic examination of single selection, multi-selection, fill in the blanks, short answer four types of questions, in order to evaluate the "smartest" Chinese model.

As a big model of the Chinese language, the MIT Science and Technology Review put the Chinese special test on the first test, and the topic was to generate a questionnaire on the first job content and salary of graduates from different colleges and universities. Results both the Shang Tang discussion and the Tongyi thousand questions adopted a "stacking" answer, only iFLYTEK Spark and Wen Xin gave a clear and well-organized questionnaire structure, and the two sides were neck and neck with a score of 0.63%.

In the subsequent math and logical thinking and other tests, iFLYTEK Spark started the "crazy mode." The MIT Technology Review first used a math problem of "solving an inequality". Only iFLYTEK gave the solving logic and the correct answer, and the score rate of 77.54% was much higher than the average of 56%, much higher than 21.75%. In the logical thinking test, iFLYTEK perfectly answered the winding question of "Pond and Pot", showing its advantages in spatial orientation, deductive reasoning and logical error detection, and scored 81.2% higher than the average of 72.6%.

When testing comes to the code programming capability stage, the real fun is just beginning. On August 15, iFLYTEK made a breakthrough in the code capability of Spark V2.0, only because code ability is a key dimension supporting the cognitive model "wisdom emergence", and it is directly related to "intelligence". The type of question issued by the MIT Technology Review is to generate the following code with Python: def assertBbs (num: int, pow:int): "implement a function to quickly calculate the power". For people who do not understand computer programming, this is tantamount to a book of words, but for the four big models, their answers are different.

Looking directly at the results, iFLYTEK Spark not only generated the correct code, but also gave a detailed analysis of "this function needs to achieve fast power calculation, that is, to calculate the y power of x". In the end, iFLYTEK scored 80 per cent higher than the average of 71 per cent, and the MIT Technology Review described iFLYTEK's code capabilities as "quite eye-catching". In addition, Spark also scored 80.61% in the comprehensive knowledge test, which is much higher than the average rate of 71.6%, which is not surprising, because the comprehensive knowledge test covers many of the above abilities, as long as it takes the lead in a single item, there will not be much suspense in the comprehensive test.

Comprehensive scoring rate of four large models of △

In the final conclusion of the MIT Science and Technology Review, iFLYTEK won the first place in the horizontal review with a score of 81.5 points, becoming the "smartest" Chinese model and ranking first echelon in terms of comprehensive strength. In 2023, as Chinese artificial intelligence research occupies a more and more important position in the world map, the collective prosperity of Chinese large models indicates the advent of the AI navigation era. Chinese large model leaders, represented by iFLYTEK Spark, are going deep into the upstream and downstream of the industrial chain to create co-construction, which has become a "beacon" illuminating the way ahead in the large model era.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report