Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Kai-Fu Lee responded to the controversy over the Yi model shell LLaMA: benefit from and contribute to open source.

2025-02-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Thanks to CTOnews.com netizens for the clues of sharing! According to CTOnews.com news on November 16, Kai-Fu Lee responded on moments when he only changed the names of two tensors (Tensor) in response to the recent doubts about the full use of the LLaMA framework.

"the global big model architecture has gone from GPT2-> Gopher-- > Chinchilla-- > Llama2-- > Yi, and the industry has gradually formed a common standard for big models, just like being a mobile App developer without creating a new infrastructure other than iOS and Android," Lee said. "01.AI starts to benefit from open source and contributes to open source, and we will continue to make progress by learning modestly from the community."

According to previous reports from CTOnews.com, Kaifu Lee, chairman and CEO of Innovation works, founded the AI big model startup "010 million things" this year. The company has launched two open source models, Yi-34B and Yi-6B, which claim to be completely open to academic research and open free commercial applications at the same time.

However, on Yi-34B 's Hugging Face open source home page, developer ehartford questioned that the model used Meta LLaMA's architecture and only changed the names of two tensors (Tensor), input_layernorm and post_attention_layernorm.

Richard Lin, director of the open source team, responded that the naming problem was caused by the developer's negligence, and the developer renamed it several times in the code to meet the experimental requirements. But the developer is sorry that the developer "forgot to change the name of the tensor back to LLaMA".

It is worth mentioning that Jia Yangqing, the former chief AI scientist of Ali, complained that the LLaMA architecture was actually used in a new domestic model, and that only a few variable names were changed in the code, causing a heated discussion on the Internet. After being accused of responding to the big model and releasing the training model, Jia Yangqing posted the latest article, saying that "magic reform" in the open source field is undesirable.

Yesterday, 010 things officially explained the Yi-34B training process, saying that the core point of the sustainable development of the big model and seeking a breakthrough lies not only in the structure, but also in the parameters obtained by the training. CTOnews.com with the full text of the response:

With regard to the observation and analysis of everything, the technical architecture of the large model community is now in a stage of gradual convergence towards generalization. basically, the international mainstream models are based on the Transformer architecture, making changes to attention,activation,normalization,positional embedding and other parts. The architecture of LLaMA, Chinchilla, Gopher and other models are more or less the same as the GPT architecture, and the global open source community has a lot of changes based on the mainstream architecture, and the ecology is thriving. Most of the published open source models in China adopt the architecture of GPT / LLaMA, which is becoming the industry standard. However, the core point of the sustainable development of the large model and seeking a breakthrough lies not only in the structure, but also in the parameters obtained by training.

The process of model training is like cooking a dish, and the structure only determines the raw materials and general steps of cooking, which is gradually recognized by most people. To train a good model, you also need better "raw materials" (data) and control of the details of each step (training methods and specific parameters). As the development of large model technology is still in its infancy, from a technical point of view, industry consensus is a model structure consistent with the mainstream model, which is more conducive to the overall adaptation and future iterations.

In the process of training the model, it follows the basic architecture of GPT / LLaMA. Due to the open source contribution of the LLaMA community, it can start quickly. Zero everything trains Yi-34B and Yi-6B models from scratch, and reimplements the training code according to the actual training framework, and uses self-built data pipelines to build high-quality training data sets (from 3PB raw data selection to 3T token high-quality data). In addition, the joint end-to-end optimization of algorithm, hardware and software is carried out in the Infra part to achieve original breakthroughs such as double improvement of training efficiency and strong fault tolerance. The systematic work of these scientific training models can often play a great role and value compared with the basic model structure.

In the experiment before training, the team tried different data matching and scientifically selected the optimal data matching scheme, and invested most of the effort to adjust training methods, data matching, data engineering, detailed parameters, baby sitting (training process monitoring) skills and so on. This series of research and development tasks, which go beyond the model architecture, go hand in hand with research and engineering and have cutting-edge breakthrough, are the most critical tasks that really belong to the model training core and can form the moat know-how accumulation of large model technology. At the same time of model training, a large number of experiments and comparative verification have been carried out on a number of key nodes in the model structure. For example, we have experimented with Group Query Attention (GQA), Multi-Head Attention (MHA), Vanilla Attention and selected GQA, tested the changes of Pre-Norm and Post-Norm in different network widths and depths, and selected Pre-Norm, using RoPE ABF as positional embedding and so on. It is in the process of these experiments and exploration that some of the reasoning parameters are renamed in order to carry out the comparative experiment.

In the process of opening up everything for the first time, we found that the LLaMA architecture, which is commonly used in the open source community, will be more developer-friendly. The original starting point for the negligence of the experimental renaming of part of the reasoning code using LLaMA is to fully test the model, not to conceal the source. All things explain this and express our sincere apologies. In the process of resubmitting models and codes and replenishing copies of the LLaMA agreement, we are committed to completing the version updates of the open source communities as soon as possible.

We are very grateful for the feedback from the community. Everything has just started in the open source community, and we hope to join hands with you to create community prosperity. After the recent release of Chat Model, we will issue a technical report, and Yi Open-source will try its best to learn humbly and continue to make progress.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report