Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

LLaMA model was leaked, Meta version of ChatGPT was forced to "open source", GitHub scored 8k stars, and a large number of reviews were released.

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

The model is open source, but not decent enough? Now, netizens will help you to be decent.

The battle for ChatGPT is getting worse.

A few weeks ago, Meta released its own large language model, LLaMA, with parameters ranging from 7 billion to 65 billion.

In this paper, LLaMA (13 billion), which uses only 1amp 10 parameters, surpasses GPT-3 in most benchmark tests.

For LLaMA with 65 billion parameters, it is comparable to DeepMind's Chinchilla (70 billion parameter) and Google's PaLM (540 billion parameter).

Although Meta claims that LLaMA is open source, it still needs to be applied and reviewed by researchers.

However, unexpectedly, just a few days after its release, the model file of LLaMA was leaked ahead of time.

So, the question is, is it intentional or accidental?

LLaMA suffered from "open source"? Recently, the finished product library of LLaMA was leaked on the foreign forum 4chan.

Last Thursday, user llamanon posted on 4chan's technical board, posting 7B and 65B LLaMA models through a seed file (torrent).

This seed link is currently merged into LLaMA's GitHub page.

He also submitted a second pull request to the project, which provided a seed link to another set of weights for the model.

At present, the project has gained 8k stars in GitHub.

However, one of the biggest mistakes of whistleblowers is to include their unique identifier code in the leak model.

This code is designed to track whistleblowers and put the personal information of users' llamanon at risk.

As the saying goes, LLaMA open source is not very decent, netizens help it decent.

In addition, users on 4chan have created a convenient resource for those who want to deploy the model on their workstations.

It also provides a guide to the distribution tutorial on how to get the model and add the modified weights to it for more effective reasoning.

More importantly, the resource even provides a way to integrate LLaMA into the online writing platform KoboldAI.

Whether the matter was intentional or unintentional leaked by Meta. Netizens have expressed their views one after another.

One netizen analyzed and said, "maybe this is deliberately leaked by Meta to counter OpenAI." "

Some customers think this is a better model that hits the heart of their business plan to sell access at a price of $250000 a year. Access to their service allows you to buy a machine that can run this leaked model in a month. Meta weakens a potential upstart competitor to keep the current large tech cartel stable. Maybe it's a bit of a conspiracy theory, but we live in an age of big technology and big conspiracies.

On Monday, Meta said that although LLaMA has been leaked to unauthorized users, it will continue to release its artificial intelligence tools to approved researchers.

Some netizens directly said that they downloaded the LLaMA with 7 billion parameters, although they did not know how to run it, in case they could use it later.

The leak and open source of LLaMA is a big event:

Stable Diffusion is open source. Eight months later, we can now read other people's thoughts and decode everything they see.

With the opening of LLMs, we're going to get something really crazy.

Shortly after the release of the preliminary evaluation LLaMA of the model, netizens found that the model with the minimum parameters also needed GPU close to 30GB to run.

However, with floating-point optimization through bit and byte libraries, they can make the model run on a single NVIDIA RTX 3060.

In addition, a researcher on GitHub was even able to run version 7B of LLM on the Ryzen 7900X CPU and infer several words per second.

So what about the LLaMA model? The young man from abroad evaluated it.

LLaMA performed well in many tests.

In terms of large-scale multitasking language understanding, even the relatively small 13B model is comparable to GPT-3, while GPT-3 is 13 times larger.

Version 33B is much better than GPT-3,65B and can compete with Google's 540B parameter PaLM, the most powerful LLM model available.

For text that needs to be processed by logic or computation, LLaMA performs well and can be compared with PaLM in terms of quantitative reasoning, or even better than the latter in code generation.

Given these results, LLaMA seems to be one of the most advanced models available, and it is small enough to run without many resources. This makes LLaMA full of temptation for people to play with it and see what it can do.

Interpreting jokes PaLM shows a very cool use case in the original paper: given a joke, let the model explain why it is funny. This task requires a combination of trial and logic, and none of the previous models of PaLM could do this.

Leave some of these jokes to LLaMA and ChatGPT to explain, and some joke language models can be get, such as Schimidhuber's tedious speeches.

But in general, neither LLaMA nor ChatGPT has a sense of humor.

But the strategies for dealing with jokes they don't understand are different. ChatGPT creates a "text wall" in the hope that at least some of the sentences are correct, acting like students who don't know the answer, hoping that the teacher can find the answer from their nonsense.

Zero sample classification is a very practical function, which enables people to use LLM instead of raters to generate training sets, and then train smaller serviceable models on these training sets.

A more challenging task is to classify click ads. Since even humans cannot agree on what click ads are, some examples are provided to the model in the prompts, so in fact this is a small sample rather than zero sample classification. Here are the tips for LLaMA.

In the test, only LLaMA-33B will try to give the answer according to the required format, and its prediction is reasonable, followed by ChatGPT, which can give a more reasonable answer, but often does not answer in the prescribed format, and the smaller 7B and 13B models are not suitable for this task.

Although code generation method LLM is good at humanities, it is not good at STEM subjects, so how does LLaMA perform in this area?

In the prompt, the form of the search table and the desired purpose are given, and the model is required to provide SQL query statements.

ChatGPT performs a little better in this task, but the results given by the language model are generally unreliable.

In all the tests compared with ChatGPT, LLaMA did not win as much as expected. Of course, if the gap is only caused by RLHF (reinforcement learning with human feedback), then the future of small models may be brighter.

Reference:

Https://www.reddit.com/r/MachineLearning/comments/11h3p2x/d_facebooks_llama_leaks_via_torrent_file_in_p

Https://medium.com/@enryu9000/mini-post-first-look-at-llama-4403517d41a1

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report