Tsinghua's second generation 6 billion parameter ChatGLM2 is open source, ranking first in Chinese, crushing GPT-4 and speeding up reasoning by 42%. 04/25 Update SLTechnology News&Howtos

Tsinghua's second generation 6 billion parameter ChatGLM2 is open source, ranking first in Chinese, crushing GPT-4 and speeding up reasoning by 42%.

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Tsinghua ChatGLM2-6B model begins to brush circles again! The new version improves reasoning ability by 42% and supports up to 32k contexts.

Since its release in March, ChatGLM-6B has become a hit in the AI community and has won 29.8k stars on GitHub.

Now, the second generation of ChatGLM is coming!

Tsinghua KEG and data Mining Group (THUDM) released the Chinese-English bilingual dialogue model ChatGLM2-6B.

Project address: https://github.com/ THUDM / ChatGLM2-6B

HuggingFace: https://huggingface.co/THUDM/chatglm2-6b

The latest version of ChatGLM2-6B adds a number of features:

-pedestal model upgrade for more powerful performance

-context that supports 8K-32k

-reasoning performance improved by 42%

-fully open to academic research, allowing applications for commercial authorization

It is worth mentioning that in the Chinese C-Eval list, ChatGLM2 topped the list with a score of 71.1, crushing GPT-4. The latest version, ChatGLM2-6B, ranks 6 with a quartile of 51.7.

The second-generation version of ChatGLM-6B, the highlight of ChatGLM2-6B upgrade, adds many new features on the basis of many excellent features such as smooth dialogue and low barriers to deployment of the original model:

1. The more powerful performance is based on the development experience of the initial model of ChatGLM, which comprehensively upgrades the base model of ChatGLM2-6B.

ChatGLM2-6B uses the mixed objective function of GLM and goes through 1.4T pre-training of Chinese and English identifiers and human preference alignment training.

The evaluation results show that, compared with the original model, the performance of ChatGLM2-6B in MMLU (+ 23%), CEval (+ 33%), GSM8K (+ 571%), BBH (+ 60%) and other data sets has been greatly improved, and has a strong competitiveness in the open source model of the same size.

two。 The longer context is based on FlashAttention technology. The researchers extend the context length of the pedestal model from 2K to 32K in ChatGLM-6B, and use 8K context length training in the conversation phase to allow more rounds of conversation.

However, the current version of ChatGLM2-6B has limited ability to understand ultra-long single-round documents and will focus on optimization in subsequent iterative upgrades.

3. More efficient reasoning is based on Multi-Query Attention technology. ChatGLM2-6B has more efficient reasoning speed and lower video memory consumption.

Under the implementation of the official model, the reasoning speed has been increased by 42% compared with the original generation. Under the quantization of 6G video memory, the conversation length supported by 6G video memory has been increased from 1K to 8K.

4. The more open agreement ChatGLM2-6B weights are fully open to academic research and are also allowed for commercial use with official written permission.

Compared with the original model, ChatGLM2-6B has made a great improvement in multi-dimensional ability.

Mathematical logic

Knowledge reasoning

Long document understanding

The research team selected some typical Chinese and English data sets for evaluation. The following are the evaluation results of ChatGLM2-6B model on MMLU (English), C-Eval (Chinese), GSM8K (mathematics) and BBH (English).

MMLU

C-Eval

GSM8K

BBH

Reasoning performance ChatGLM2-6B uses Multi-Query Attention to improve the generation speed. The comparison of the average speed of generating 2000 characters is as follows:

Multi-Query Attention also reduces the footprint of KV Cache during the generation process.

In addition, ChatGLM2-6B uses Causal Mask for dialogue training, and the KV Cache of previous rounds can be reused during continuous conversation, which further optimizes the occupation of video memory.

Therefore, when using the graphics card of 6GB video memory for INT4 quantization reasoning, the original ChatGLM-6B model can generate up to 1119 characters will prompt that the video memory is exhausted, while ChatGLM2-6B can generate at least 8192 characters.

The research team also tested the impact of quantization on the performance of the model. The results show that the influence of quantization on the performance of the model is acceptable.

Install using the method environment

First of all, you need to download this warehouse:

Git clone https://github.com/THUDM/ChatGLM2-6Bcd ChatGLM2-6B then uses pip to install dependencies: pip install-r requirements.txt, in which the transformers library version is recommended for 4.30.2 and above to get the best reasoning performance.

The code call can generate a conversation by calling the ChatGLM2-6B model with the following code:

> from transformers import AutoTokenizer, AutoModel > 1. Making a regular sleep schedule and maintaining a regular sleep schedule can help you establish healthy sleep habits and make it easier for you to fall asleep. Try to go to bed at the same time every day and get up at the same time. two。 Create a comfortable sleep environment to ensure that the sleep environment is comfortable, quiet, dark and warm. You can use comfortable bedding and keep the room ventilated. 3. Relax and do some relaxing activities before going to bed, such as taking a hot bath, listening to some soft music, reading some interesting books, etc., can help relieve tension and anxiety and make it easier for you to fall asleep. 4. Avoid caffeinated drinks caffeine is an irritant that can affect the quality of your sleep. Try to avoid caffeinated beverages such as coffee, tea and cola before going to bed. 5. Avoid doing things that have nothing to do with sleep in bed. Doing things that have nothing to do with sleep in bed, such as watching movies, playing games or working, may interfere with your sleep. 6. Try breathing techniques Deep breathing is a relaxation technique that can help you relieve tension and anxiety and make it easier for you to fall asleep. Try to inhale slowly for a few seconds and then exhale slowly. If these methods don't help you sleep, you can consider consulting your doctor or sleep expert for further advice.

Loading models locally before downloading models from Hugging Face Hub, you need to install Git LFS and then run:

Git clone https://huggingface.co/THUDM/chatglm2-6b if the download speed of checkpoint is slow, you can download only the model implementation:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b then manually downloads the model parameter file and replaces the file to the local chatglm2-6b directory.

Address: https://cloud.tsinghua.edu.cn/ d/674208019e314311ab5c/

After the model is downloaded locally, replace the THUDM / chatglm2-6b in the above code with the path of the local chatglm2-6b folder to load the model locally.

Reference:

Https://github.com/THUDM/ChatGLM2-6B

Https://huggingface.co/THUDM/chatglm2-6b

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.