Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

All failed! Stanford's 100-page paper ranks the transparency of the big model, and GPT-4 only ranks third.

2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

At the moment of the hundred Model War, which big model has the highest transparency?

(for example, information about how models are built, how they work, and how users use them. )

Now, the problem is finally solved.

Because HAI and other research institutions at Stanford University have released a new study--

A scoring system called basic Model Transparency Index (The Foundation Model Transparency Index) is specially designed.

It ranks 10 major foreign models from 100 dimensions and makes a comprehensive evaluation on the level of transparency.

The result was a big surprise!

If you take 60 points as the passing line, then the big models of the "competition" can be said to be wiped out, and none of them passed.

Feel the feel:

Llama 2 ranked first with a score of 54, followed by BLOOMZ with 53.

GPT-4 ranked third with a score of 48, while Amazon's Titan Text was at the bottom with a score of 12.

Not only that, on the official Stanford HAI blog, the director, Rishi Bommasani, bluntly brought out OpenAI and said:

Companies in the field of basic models are becoming more and more opaque.

For example, OpenAI with the name "open" has made it clear that most of the information related to GPT-4 will not be made public.

All in all, the team believes that at this stage of the development of large models, their transparency is a very important key point, directly linked to credibility.

And at a deeper level, they think it also reflects a fundamental lack of transparency in the artificial intelligence industry.

More than 100 pages of papers study model transparency, so where did this ranking come from?

At the same time of the announcement of the results, the team also published a paper more than 100 pages thick.

As we just mentioned, this ranking involves a total of 100 indicator dimensions.

If we look at it from a "close" perspective, these indicators can be roughly divided into three categories, namely:

Upstream: refers to the components and processes involved in building the underlying model, such as computing resources, data, etc.

Model: refers to the properties and functions of the underlying model, such as architecture, capabilities, risks, etc.

Downstream (Downstream): how the underlying model is distributed and used, such as impact on users, updated content, control policies, and so on.

According to the above three dimensions, the score details of the top 10 models are as follows:

From the result, the difference of the score of the "upstream" index is more obvious; for example, the "upstream" index of BLOOMZ accounts for a higher proportion of the overall score.

For example, Jurassic-2, Inflection-1 and Titan Text, the "upstream" index score of these three models is directly zero.

If you think of "upstream", "model", and "downstream" as three "top-level domains", the team is divided into 13 finer and deeper "subdomains" on top of them:

Data (Data), labor force (Labor), computing (Compute)

Method (Methods), model foundation (Model Basicis), model access (Model Access), function (Capabilities)

Risk (Risks), mitigation measures (Mitigations), distribution (Distributions), use strategy (Usage Policy), feedback (Feedback), impact (Impact).

The detail scores under the 13 "sub-domains" are as follows:

For the complete 100 metric dimensions, you can refer to the following chart:

Of course, the open source and closed source debate, which is one of the hottest topics in the field of large models, has also been involved in this research.

The team marked the widely downloadable model as an open source model, and three of the "contestants" were Llama 2, BLOOMZ, and Stable Diffusion 2.

It is clear from the ranking results that the open source model is generally far ahead, with only GPT-4 scoring 1 point higher than Stable Diffusion 2.

In response, the researchers also made an explanation:

This difference is largely due to the lack of transparency among developers of closed-source models on "upstream" issues, such as the data, labor, and calculations used to build the model.

For more details on the transparency ranking of the model, please refer to the paper at the end of the article.

Why is transparency important? In response to this problem, Stanford HAI also explained in its official blog.

For example, in the opinion of the person in charge Rishi Bommasani:

Lack of transparency has long been a problem for consumers of digital technology.

There are many such problems in the current Internet, such as deceptive advertising and pricing, deceiving users to shop online without their knowledge, and so on.

Dr MIT Shayne Longpre believes that as large models become more popular and fall to the ground rapidly in a variety of industries, scientists need to understand how they are designed, especially the "upstream" indicators.

For the industry, the same is true, decision makers in the face of "which big model, how to use" and other issues, need to be based on the transparency of the model.

So what do you think of the transparency ranking of the big model? Welcome to leave a message and exchange in the comments area.

Paper address:

Https://crfm.stanford.edu/fmti/fmti.pdf

Reference link:

[1] https://hai.stanford.edu/news/introducing-foundation-model-transparency-index

[2] https://github.com/stanford-crfm/fmti

[3] https://www.theverge.com/2023/10/18/23922973/stanford-ai-foundation-model-transparency-index

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report