Supporting 200000 word input, Moonshot AI opens the "long text" era of hundreds of billions of models. 04/10 Update SLTechnology News&Howtos

Supporting 200000 word input, Moonshot AI opens the "long text" era of hundreds of billions of models.

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

On October 9, 2023, Moonshot AI, a six-month-old large model start-up, announced a breakthrough in the field of "long text" and launched the first intelligent assistant product Kimi Chat that supports the input of 200,000 Chinese characters. This is the longest context input length that can be supported among the large model services that can be used in the global market, indicating that Moonshot AI has achieved a world leading level in this important technology.

From a technical point of view, the number of parameters determines how complex "calculations" the large model supports, while how many text inputs (that is, long text technology) it can receive determines how much "memory" the large model has, and both determine the application effect of the model. Supporting a longer context means that the big model has more "memory". This makes the application of the big model more in-depth and widespread: for example, conducting market analysis through multiple financial reports, dealing with ultra-long legal contracts, quickly combing the key information of multiple articles or web pages, role-playing based on novel settings, and so on, can be part of our work and life with the blessing of ultra-long text technology.

Compared with the large model services based on English training on the market, Kimi Chat has a strong multilingual ability. For example, Kimi Chat has a significant advantage in Chinese, and the actual use effect can support the context of about 200,000 Chinese characters, which is 2.5 times that of Anthropic's Claude-100k (about 80,000 words) and 8 times that of OpenAI's GPT-4-32k (about 25000 words). At the same time, through innovative network structure and engineering optimization, Kimi Chat realizes a lossless long-range attention mechanism under hundreds of billions of parameters, independent of "shortcuts" that do great damage to performance, such as sliding window, downsampling, small model and so on.

At present, Moonshot AI's intelligent assistant product Kimi Chat has been opened for internal testing.

The application dilemma caused by the limitation of input length of large model

The current situation that the input length of large models is generally low greatly restricts its technical landing, such as:

In the current hot virtual character scene, due to the lack of long text ability, the virtual character will easily forget important information. For example, in the Character AI community, users often complain that "because the character has forgotten his identity after many rounds of conversation, he has to reopen a new conversation."

For large model developers, the length of input prompt restricts the performance of scenarios and capabilities of large model applications. For example, when developing script killing games based on large models, it is often necessary to add plot settings and game rules of tens of thousands or even more than 100, 000 words to the application as prompt. If the input length of the model is not long enough, the rules and settings can only be reduced, thus the desired game effect can not be achieved.

In Agent, another main direction of large model application, because the operation of Agent requires automatic multi-round planning and decision-making, and each action needs to refer to historical memory information to complete, this will lead to a rapid increase in model input, which also means that the model that can not deal with longer context will reduce the probability of successful Agent operation because it is unable to make new planning and decisions based on historical information comprehensively and accurately.

In the process of using a large model as a work assistant to complete the task, almost every deep user has encountered a situation where the input length exceeds the limit. Especially for lawyers, analysts, consultants and other professional users, because they often need to analyze and deal with longer text content, frustration occurs frequently when using large models.

All the above problems will be easily solved after the large model has a long enough context input.

Long text opens a new world of large model applications

So how does a large model behave with ultra-long context input? Here are some examples of actual use of Kimi Chat:

The long text of the official account is sent directly to Kimi Chat to help you quickly summarize and analyze.

The newly released Nvidia financial report will be handed over to Kimi Chat to quickly complete the analysis of key information:

Too many invoices for business trips? Drag it all into Kimi Chat and quickly sort it out into the required information:

When you find a new algorithm paper, Kimi Chat can directly help you reproduce the code according to the paper:

With only one URL, you can chat with your favorite original god character in Kimi Chat:

Enter the whole Moon and sixpence and let Kimi Chat read it with you to help you better understand and use the knowledge in the book:

Through the above examples, we can see that when the context that the model can deal with becomes longer, the ability of the large model can cover more usage scenarios and really play a role in people's work, life and study. and because the question and answer and information processing can be directly based on full-text understanding, the "hallucination" problem generated by the large model can also be solved to a great extent.

Do not take shortcuts to solve the dual challenges of algorithm and engineering

In the development of long text technology, there are some "shortcuts" that do great harm to the effect, which mainly include the following aspects:

The "goldfish" model is characterized by easy forgetfulness. Actively abandon the above by sliding the window and so on, and only retain the attention mechanism for the latest input. The model cannot fully understand the full text and can not handle cross-document comparisons and comprehensive understanding of long text (for example, the 10 most valuable ideas cannot be extracted from a 100000-word transcript of a user interview).

The "bee" model is characterized by focusing only on the part and neglecting the whole. Only the attention mechanism for part of the input is retained through downsampling of the context or RAG (retrieval enhanced generation). The model also fails to fully understand the full text (for example, it is impossible to summarize and summarize the portraits of candidates from 50 resumes).

The tadpole model is characterized by the fact that the ability of the model is not yet fully developed. Increasing the context length by reducing the number of parameters (for example, to tens of billions of parameters) reduces the ability of the model itself, although it can support a longer context, but a large number of tasks are not competent.

Simple shortcuts can not achieve the ideal product effect. In order to really make a useful and easy-to-use product, we should not take false shortcuts, but should face the challenge.

At the training level, if you want to train a model that supports long enough context, you will inevitably have to face the following difficulties:

How can the model accurately Attend to the desired content in hundreds of thousands of context windows without reducing its original basic capabilities? The existing techniques such as sliding window and length extrapolation do great damage to the performance of the model, so it is impossible to achieve the real context in many scenarios.

Training the long-context model at the level of hundreds of billions of parameters brings higher computing power requirements and extremely serious visual memory pressure, and the traditional 3D parallel scheme has been unable to meet the training needs.

Lack of sufficient high-quality long-sequence data, how to provide more effective data for model training?

At the reasoning level, after obtaining a model that supports ultra-long context, how to make the model serve a large number of users is also faced with arduous challenges:

In the Transformer model, the amount of computation of the self-attention mechanism (Self Attention) increases squarely with the increase of context length. For example, when the context increases by 32 times, the computation actually increases by 1000 times, which means that if it is only implemented in a simple way, users will have to wait a very long time to get feedback.

Ultra-long context leads to a further increase in the demand for video memory: take the GPT-3 with 175 billion parameters as an example, the current maximum stand-alone configuration (80 GiB * 8) can only support a maximum of 64k context length reasoning, as can be seen in the requirements for video memory in ultra-long text.

Great video memory bandwidth pressure: Nvidia A800 or H800's video memory bandwidth is as high as 2-3 TiB / s, but in the face of such a long context, the naive method can only generate 2'5 tokens / s, and the experience is extremely stuttered.

Moonshot AI's technical team carried out extreme algorithm and engineering optimization, overcame the above difficulties, completed the production of the large memory model, and released a 100 billion parameter LLM product that supports 200000 word input.

The first step of the moon landing program: welcome to the era of Long LLM

Yang Zhilin, founder of Moonshot AI, said in an interview that lossless compression of huge amounts of data can achieve a high degree of intelligence, whether in text, voice or video.

The progress of lossless compression has been heavily dependent on the "parameter is king" mode, in which the compression ratio is directly related to the number of parameters, which greatly increases the training cost and application threshold of the model. Moonshot AI believes that the upper limit of the capacity of a large model (that is, lossless compression ratio) is determined by the ability of a single step and the number of steps performed. The single-step ability is positively related to the number of parameters, and the number of steps performed is the context length.

Moonshot AI believes that a longer context length can lead to a whole new chapter in the application of large models, pushing them from the LLM era to the Long LLM (LLLM) era:

Everyone can have a virtual partner with lifelong memory, which can remember all the details of interacting with you in the course of life and establish a long-term emotional connection.

Everyone can have an assistant who coexists with you in the work environment (co-inhabit), which knows all the knowledge of the public domain (the Internet) and the private domain (internal documents of the enterprise), and helps you complete the OKR based on this.

Everyone can have an omniscient learning guide, which can not only provide you with accurate knowledge, but also guide you to cross the barriers between disciplines and explore and innovate more freely.

Of course, the longer context length is just the first step for Moonshot AI in the next generation of big model technology. With the leading technology in this field, Moonshot AI plans to accelerate the innovation and application of large model technology.

The partner of the moon landing project said:

Monolith Capital, which focuses on investment in the next-generation digital industry and technology intelligence, is one of the three investors in the first round of financing for Moonshot AI, and has been supporting the company's development with practical actions. Cao Xi, founding partner of think Capital, said that Yang Zhilin is the most recognized Chinese technical expert in the field of large models in the world. His team has profound technology accumulation in artificial intelligence technology, especially in the field of large language model LLM, and has been widely recognized internationally. At present, companies such as OpenAI and Anthropic in Silicon Valley in the United States have received much attention. In fact, in China, Moonshot AI, which has enough technology reserves, is also growing into a global leading AGI start-up company. Multimodal large model is the key area of competition among AI manufacturers, in which long text input technology is one of its core technologies. The newly released large model and Kimi Chat by Moonshot AI team have achieved an important breakthrough in this aspect, and have been successfully applied to many practical scenarios. We will continue to strengthen and support the Moonshot AI team in bold innovation and technological breakthroughs in the field of AGI, leading the future development of artificial intelligence technology in China.

Real partner Dai Yusen expressed affirmation and expectation to the development of the company: "We believe that the recent explosion of AI applications is only the prelude of a revolution. If AI technology wants to really change the world and create great value, it still needs a big breakthrough in the degree of intelligence, which requires a team with top technical capabilities to adhere to the courage to pursue Moonshot and continue to challenge the boundary of intelligence promotion." As the first author of XLNet and other well-known scientific research work, Yang Zhilin has very rich scientific research and practical experience. For many years, he has firmly believed that the compression of high-dimensional data through large models is the only way for the development of artificial intelligence, and has United an entrepreneurial team with ultra-high talent density, tacit understanding, and full of challenges to the giant rock spirit. It is a great honor to be able to support Yang Zhilin's new journey again from the Angel Wheel. "

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.