In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
CTOnews.com news on July 25, recently, many large domestic models have emerged as a new force, helping the development of related industries. Beijing Zhiwei Intelligent Technology Co., Ltd. recently released a large KDF model in Shanghai, as well as a series of products developed based on the model, including financial industry tools such as "KDF Intelligence News", "KDF never" and "KDF Zhongshu".
After inquiry, CTOnews.com learned that the training data of the unintelligent KDF model is mainly in Chinese, and contains a large number of financial data, in order to improve the problem-solving ability of the model in the business and financial fields.
In addition, some English and code data are integrated into the training data to adapt to the general ability of the model. In the process of training, the intelligent KDF model treats a single Chinese character as an independent Token. The number of model parameters reached 140 billion and the number of training Token reached 400 billion. From the point of view of the amount of code, there are about 5000 lines in the data processing part, 2000 lines in the model experiment part and 2000 lines in the model training part.
In the specific training process, the known and unintelligent KDF model adopts the GELU nonlinear activation function based on PyTorch optimization. As a nonlinear activation function, GELU performs relatively well in all kinds of tasks, which helps the model to capture complex data characteristics more accurately and ensures the efficient operation of the whole development, training and deployment process.
In terms of network structure, the development team has deeply optimized the model. Compared with the LLaMA model, this model uses fewer parameters in each layer, which can effectively reduce the computing requirements and memory consumption. At the same time, the depth of the network is strengthened, so that the model has a stronger representation ability and can learn more complex data features.
In order to improve the scalability of the model in large-scale data processing, the development team readjusted the Bias of the attention layer and introduced Flash Attention technology to save video memory and improve the speed of model training and reasoning. Thanks to the reduced amount of computation and memory requirements, Flash Attention makes the known and unintelligent KDF model run more efficiently under limited hardware resources.
From some benchmark results, the known and unintelligent KDF model shows stable performance in seven natural language processing tasks. In some tasks, such as iFlytek and CMNLI, the known and unintelligent KDF large model performs relatively well, and the performance of each model is roughly the same in ExamQA and OCNLI tests, highlighting the model's ability to deal with different types of text and domain knowledge.
Hugging Face, the source of ▲, said that the limitation of the existing general model in industry-specific application and Chinese ability is the main reason why the existing general model is not intelligently chosen to train the unintelligent KDF model from zero. Chatglm is relatively weak in industry-specific application ability, MOSS uses the English model as the basis and lacks support for Chinese, and most of the LLaMA training data are English data while Chinese ability is relatively weak. So the R & D team chose to train the intelligent KDF model from scratch in order to improve its Chinese language ability and industry applicability.
In the process of model training, the development team continues to deeply understand the technical details and strive to create a "powerful and superior" Chinese model, as a large vertical model applied to finance and commerce, the unintelligent KDF model will continue to promote the development and innovation of the company's products.
Know whether the intelligent KDF model is currently open source in Hugging Face, there will be no restrictions on commercial use in the future, interested friends can learn about it here.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.