In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Large models enter the Android era, hoping to have domestic models to replace LIama.
Qianmo war began, Fudan Professor Qiu Xipeng said so.
As the leader of MOSS team who first launched ChatGPT-like model in China, he saw the development of large models at home and abroad for more than half a year, and he had a new cognitive experience.
At the first International Conference on Artificial Intelligence Generated Content (AIGC 2023) organized by Sibaicheng Technology, he admitted that although everyone said that large models were more engineering, there were still many scientific challenges to be solved, such as the design of training targets, memory optimization, automated evaluation, popularization of large models, new architectures, etc.
In the past six months after MOSS was released, his team has made certain achievements: SpeechGPT, Optimizer LOMO can fine-tune 65 billion parameter large model on a single machine;MOSS Chinese ability has exceeded ChatGPT...
However, Qiu Xipeng revealed that he was not in a hurry to commercialize MOSS, but continued to explore new architectures, as well as further improve the coding ability and mathematical ability of large models.
In the conversation with qubits, he talked about LIama open source ecology, domestic competition pattern, landing of large model industry, and the existence of large model illusion, evaluation list brushing and other phenomena.
Without changing its original meaning, the qubits are arranged as follows:
The big model enters the Android era, and one of many open source big models may eventually win. Now it is LIama.
Domestic large models are seriously homogenized, and it is hoped that domestic models can replace LIama;
Many large models do not have RLHF, which is not necessary in some technical fields.
Large model illusion is not a bad thing, and often results in performance degradation;
The startup window gets bigger, and everyone can do something they want with the help of big models.
Talk about LIama open source ecological qubits: is the big model now entering Android moment?
Qiu Xipeng: Yes, on the whole, it is an open source ecology dominated by LIama. There is a gap between them and GPT-4. Many complex applications can only be done by GPT-4, and LIama still needs to be further improved.
Qubits: Where exactly is the upgrade?
Qiu Xipeng: It has to be a base.
Qubits: LIama's change in market structure, how do you think?
Qiu Xipeng: There will be a lot of open source models at the beginning, and one may win in the end. It seems that LIama is currently the winner.
Qubits: Why?
Qiu Xipeng: First of all, the performance is good enough, and then there are quite a lot of upstream and downstream ecology around it. When you reformulate a new model, you have to consider upstream and downstream problems. It is not impossible for other large models to replace LIama in the future, but the cost will be very, very high, equivalent to breaking an ecological chain.
Talk about domestic competition pattern qubit: Will there be a second LIama in China?
Qiu Xipeng: Basically, China does its own thing. If there is no significant difference or performance improvement, it is difficult to establish the same ecology. We hope that the domestic model can replace Llama, otherwise it may limit some of our future development.
Qubits: Now thousands of models fight, and in the future several large models will win.
CHIU: Definitely. But now the whole is relatively homogeneous, it is unlikely to build user stickiness, and ultimately it needs to rely on performance to win.
On the landing of large model industry
Qubit: Many people say that the big model industry has landed to the "last kilometer". What do you think?
Qiu Xipeng: It is not certain whether it is the last kilometer, but it must greatly promote the landing of the industry. Big models really change the paradigm of AI before. Before making a product, it may take more manpower to label the data, which is a big market demand. But now large models don't need too much labeled data, lowering the overall technology or application threshold very low. But the disadvantage is that the calculation force is more demanding.
Qubits: Is the start-up window bigger?
Qiu Xipeng: Yes, it is an application oriented to more terminals. Everyone can do something they want to do with the help of a big model.
Qubits: SFT, RLHF have not yet formed a good paradigm, when can they reach the industrial application level?
Qiu Xipeng: Now there is a complete set of technical paths, plus there are many tools to help large models apply in vertical industries. Such technological path dependencies can make barriers to entry very low. I think the technology maturity is still relatively high at present.
Qubits: how to balance the requirements of large models while pursuing universality?
Qiu Xipeng: If the large model itself is universal, it may be enough to supplement some vertical domain knowledge. This part is not particularly difficult, and the cost will be much lower than that of pre-training.
Qubits: Like LIama2, the SFT and RLHF use 1 million orders of magnitude to manually label data, which is very large in terms of data volume and cost.
Qiu Xipeng: Now many large models have not achieved RLHF, only SFT.
Qubits: Is this step necessary for industry landing?
Qiu Xipeng: It's not necessary. For example, in a technical field model, you won't pay special attention to the so-called harmlessness and honesty, just like letting it write a code. Generally speaking, alignment will reduce the ability of the model.
On the evaluation of the list phenomenon qubit: how to treat some large model team brush list phenomenon.
Qiu Xipeng: At present, there is no particularly good data set that can reflect the various capabilities of large models. All parties are exploring. But the main problem now is that, overall, evaluation of generative algorithmic models is quite difficult.
Qubits: Take an example.
Qiu Xipeng: Just like ChatGPT brush list may not be able to brush Google's big model, but the use experience is better. True evaluation may still come from real human feelings, but this evaluation is costly and difficult to quantify.
QUANTUM BIT: Do you still need objective indicators?
Qiu Xipeng: It's still necessary, but it's best to become an academic comparison method like before. Now many enterprises go to brush the list, but do not disclose the data, nor specifically say how to do it, I think this is an unfair competition.
For example, domestic C-Eval itself is quite high in quality, but it was brushed off the list in a few days, resulting in little academic value.
QUANTUM: Do we have any progress on large-model illusions?
Qiu Xipeng: This aspect itself is not particularly much done. At present, the reliable method is to eliminate hallucinations by partial application. In addition, there are people who identify by alignment or negative feedback. But my personal view is that eliminating hallucinations may be solved by external knowledge verification, rather than eliminating them mechanically.
Qubits: Why?
Qiu Xipeng: I feel that it is strongly related to the thinking ability of the model. It is possible that the illusion disappears, resulting in the decline of the model ability.
Qubits: Illusion is not a bad thing?
Qiu Xipeng: It may not be a bad thing. It needs to be used in different occasions. For example, on some occasions, painting creation and scientific discovery use illusion.
Talking about AI alignment qubits: OpenAI has a super alignment team, and eventually AI alignment AI, what do you think about this?
CHIU: Alignment is really a difficult thing to do. The so-called AI alignment of human values, our own human values are difficult to measure. But some abilities like AI alignment are OK, such as solving math problems and playing chess, because it doesn't need to be evaluated by people, and AI alignment is better.
Qubits: Mathematically, the ability of large models is still relatively lacking.
Qiu Xipeng: I think this piece needs a higher quality data set.
Qubits on NLP: What impact does the large language model have on natural language processing?
Qiu Xipeng: It is equivalent to the whole field needs to be redivided. In the early days, it was divided according to different fields and tasks, but now it is divided from different stages, generally including: pre-training, instruction fine-tuning, RLHF, which leads to the similarity of what everyone does, not as diverse as before_
Although there are many large language models, they are basically based on Transformer architecture, and the training data and training methods are similar.
Qubits: What challenges do they pose?
Qiu Xipeng: First of all, the track is crowded, everyone is concentrated on this track; the other is that the prompt becomes very important, a little back to the previous characteristic project, there are high calculation power, high energy consumption, and it is difficult to have objective evaluation indicators, as well as safety problems.
These challenges are actually sinking into the various stages of the large model from pre-training to application.
Big Model Science Challenges Qubits: Big Models in General Cognition are Too Engineered, What Scientific Problems Still Need to Be Solved?
Qiu Xipeng: There are mainly the following points:
Model architecture, Transformer's disadvantage is that complexity and character length are square relations, when expanding the scale, it will become a bottleneck, and there will definitely be some new architectures in the future.
There is no accepted method for preparing data to improve the ability of the thought chain.
The design of training objectives, early machine learning objectives are very clear, can end to end to reduce generalization error; but now each stage of the large language model (pre-training, fine tuning plus alignment) objectives and final objectives are not consistent? How to design it is something that needs to be explored.
Illusion, now there are some popular frameworks to solve, such as LangChain, LIamaIndex, etc., but there is no deeper understanding of the basis behind the work to avoid weaknesses, which is worth studying.
Multimodal extensions, knowledge only of the symbolic world is always limited, how to align with more modalities. The current mainstream approach, such as connecting an encoder before a large language model, converts multimodal information into vectors, but only unidirectionally aligns, and there is insufficient fusion between modalities.
We have a SpeechGPT-let the large model directly receive the speech signal, discretize the speech into Token and directly input it to the large model, and the large model can directly input and output speech.
Knowledge source, large model has learned a lot of text-level knowledge, then further improve in the future, how can you improve? For example, there are some methods for knowledge that cannot be carried by text, such as multimodal learning, embodied learning, etc.
Real-time learning, how to make large models learn from human interaction, and combine it with parameter updates to continuously improve their knowledge level.
Agent, let the large model as the carrier of Agent, endow it with various abilities to complete complex tasks; further think, how do multiple agents interact?
Automatic evaluation, now the big model in reasoning ability, mathematical ability, code ability and other aspects are far worse, but these abilities are enough to support to do a lot of complex work, so to do an indicator to measure these abilities, but also to avoid "brush list" This phenomenon.
The tenth is a bit of engineering, but it is also a scientific problem, that is, the popularization of large models. If the computational requirements are still very high, it will only be limited to a small number of people to do research. We are considering a full volume fine-tuning optimization method. Some time ago, a new optimizer called LOMO (Low Memory Optimization) was proposed and successfully fine-tuned 65B LIama on a single server equipped with 8 cards 3090 (24GB memory).
Qubits: Do these questions need industry's attention?
Qiu Xipeng: I think it is worth paying attention to. Now some domestic teams basically only see engineering, but in fact many challenges have not been solved.
Qubits: what is the division of labor between academia and industry in large models?
Qiu Xipeng: I don't think there is a division of labor between the two, just like OpenAI's own research team and DeepMind are studying these problems.
Talk about MOSS progress qubits: What capabilities will MOSS improve next?
Qiu Xipeng: It is possible to further improve the coding ability and mathematical ability of the new architecture and large models.
Qubits: What about the cost? Like OpenAI before the explosion burned $700,000 a day.
Qiu Xipeng: We don't have a completely strict number, but there are hundreds of cards running every day.
Qubits: Will application-end products be considered in the future?
CHRISTINE: Maybe in the future.
Qubits: Why?
Qiu Xipeng: At present, many large models are similar in structure, and there is no uniqueness compared with other large models. If there are some technological innovations that can form competitiveness in the future, you can do something commercial.
Qubits: Is there an expected time?
Qiu Xipeng: No.
Qubits: previously mentioned will roll out larger parametric models, whether there are in the process of advancing.
Qiu Xipeng: MOSS itself does not, but the team has participated in the training of large models of other units.
On Cognitive Change Qubits: What is the change in cognition of large models from six months ago?
Qiu Xipeng: Of course, at first everyone would not understand why the big model could be done so well. Now they take it for granted. For example, after SFT command fine-tuning, you can understand human speech and so on. This happened half a year ago or before ChatGPT was introduced, not everyone could think of it.
And then today, the big model is understood differently-not as a chat-only model, but more as a decision-making model. Let people participate in more complex intelligent decisions, including agents.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.