Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A single GPU can run, UC Berkeley takes the lead, 13 billion parameter "Little Alpaca" weight is announced

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Just now, UC Berkeley, CMU, Stanford, etc., jointly released the weight of the latest open source model Vicuna.

On March 31, UC Berkeley teamed up with CMU, Stanford, UCSD and MBZUAI to launch the 13 billion-parameter Vicuna, commonly known as "Little Alpaca" (llama), which can achieve 90 percent of the ChatGPT's performance for as little as $300.

Today, the team officially released the weight of Vicuna-just a single GPU can run!

Project address: https://github.com/ lm-sys / FastChat/#fine-tuning130 100 million parameters, 90% rival ChatGPTVicuna is through fine-tuning LLaMA on user sharing conversations collected by ShareGPT, at a cost of nearly $300.

The researchers designed eight problem categories, including math, writing, and coding, and tested the performance of Vicuna-13B and four other models.

The test process uses GPT-4 as the criterion, and the results show that Vicuna-13B achieves the ability to compete with ChatGPT and Bard in more than 90% of the cases. At the same time, it outperformed other models, such as LLaMA and Stanford's Alpaca, in more than 90 per cent of cases.

The training process for training Vicuna-13B is as follows:

First, the researchers collected about 70K conversations from ShareGPT, a ChatGPT conversation-sharing site. Next, the researchers optimized the training scripts provided by Alpaca so that the model could better handle multiple rounds of conversations and long sequences. After that, we used PyTorch FSDP to train on 8 A100 GPU for one day.

Memory optimization:

In order for Vicuna to understand long contexts, the maximum context length is extended from 512 of Alpaca to 2048, which greatly increases the memory requirements of GPU. Here, the researchers solve the memory pressure by using gradient checkpoints and flash attention.

Multiple rounds of conversations:

The training loss is adjusted to consider multiple rounds of conversation, and the fine-tuning loss is calculated only on the output of the chatbot.

Reduce costs through Spot instances:

Using the Spot instance hosted by SkyPilot to reduce the cost, the training cost of 7B model is reduced from $500to about $140. the training cost of 13B model is reduced from about $1000 to $300.

Evaluation in terms of the quality evaluation of the model, the researchers created 80 different questions and evaluated the model output with GPT-4.

To compare different models, the researchers combined the output of each model into a separate prompt, and then asked GPT-4 to evaluate which model gave a better answer.

Among them, GPT-4 prefers Vicuna to the existing SOTA open source model (LLaMA, Alpaca) in more than 90% of the problems.

Of the 45% of the questions, GPT-4 believes that Vicuna's answer is similar to or even better than ChatGPT's.

Taken together, Vicuna has a total score of 92% of ChatGPT.

Installation using installation method 1:

# Install FastChatpip3 install fschat# Install a specific commit of huggingface/transformers# Our released weights do not work with commits after this due to some upstream changes in the tokenizer.pip3 install git+ https://github.com/huggingface/transformers@c612628045822f909020f7eb6784c79700813eda method 2:

1. Clone version library and change directory to FastChat folder

Git clone https://github.com/lm-sys/FastChat.gitcd FastChat2. Install Package

Pip3 install-upgrade pip # enable PEP 660 supportpip3 install-e. Weights are published in the form of delta according to the permission of the LLaMA model. Just add it to the original LLaMA weight, and you can get the final Vicuna weight.

1. Follow the instructions on huggingface to get the original LLaMA weight

two。 Automatically download delta weights from the team's Hugging Face account through scripts

Python3-m fastchat.model.ly_delta\-- base / path/to/llama-13b\-- target / output/path/to/vicuna-13b\-- delta lmsys/vicuna-13b-delta-v0 uses a single GPU

Vicuna-13B requires approximately 28GB's GPU memory.

Python3-m fastchat.serve.cli-- model-name / path/to/vicuna/weights multiple GPU

If there is not enough video memory, you can use model parallelism to aggregate the video memory of multiple GPU on a single machine.

Python3-m fastchat.serve.cli-- model-name / path/to/vicuna/weights-- num-gpus 2 uses CPU only

If you want to run on CPU, you need about 60GB memory.

Python3-m fastchat.serve.cli-- model-name / path/to/vicuna/weights-- device cpuWeb UI startup controller

Python3-m fastchat.serve.controller starts model worker

Python3-m fastchat.serve.model_worker-- model-path / path/to/vicuna/weights when the process finishes loading the model, it sees "Uvicorn running on...".

Send a test message

Python3-m fastchat.serve.test_message-- model-name vicuna-13b starts gradio network server

Python3-m fastchat.serve.gradio_web_server now, you can open a browser to chat with the model.

Fine-tuning data

Vicuna was created by fine-tuning a LLaMA base model using conversations and public API shared by about 70,000 users collected from ShareGPT.

To ensure the quality of the data, the team converted HTML back to markdown and filtered out some inappropriate or low-quality samples. In addition, the team divided the lengthy conversation into smaller pieces to match the maximum context length of the model.

Code and hyperparameters

The team used Stanford Alpaca code to fine-tune the model and made some modifications to support gradient checkpoints and Flash attention. In addition, the team also uses hyperparameters similar to Stanford Alpaca.

Fine-tune cloud services with SkyPilot

SkyPilot is a framework established by the University of California, Berkeley that can easily and economically run ML workloads on any cloud service (AWS, GCP, Azure, Lambda, etc.).

Installation instructions: https://skypilot.readthedocs.io/ en / latest / getting-started / installation.html

# Install skypilot from the master branchpip install git+ https://github.com/skypilot-org/skypilot.gitVicuna can be trained on 8 A100 GPU with 80GB memory. The following command will automatically start a node that meets the requirements, set up and run training jobs on it.

Sky launch-c vicuna-s scripts/train-vicuna.yaml-- env WANDB_API_KEY for Alpaca, the training job starts on a single node with four A100-80GB GPU.

Sky launch-c alpaca-s scripts/train-alpaca.yaml-- env WANDB_API_KEY uses native GPU for fine tuning

Vicuna can also be trained on 8 A100 GPU with the following code to display as 80GB.

If you are training on less GPU, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly to keep the global batch size constant. To set up the environment, see the settings section in scripts / train-vicuna.yaml.

Torchrun-- nnodes=1-- nproc_per_node=8-- master_port=\ fastchat/train/train_mem.py\-- model_name_or_path\-- data_path\-- bf16 True\-- output_dir. / checkpoints\-- num_train_epochs 3\-- per_device_train_batch_size 4\-- per_device_eval_batch_size 4\-- gradient_accumulation_steps 1 \-evaluation_strategy "no"\-save_strategy "steps"\-- save_steps 1200\-- save_total_limit 1200\-- learning_rate 2e-5\-- weight_decay 0. \-- warmup_ratio 0.03\-- lr_scheduler_type "cosine"\-- logging_steps 1\-- fsdp "full_shard auto_wrap"\-- fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'\-- tf32 True\-- model_max_length 2048\-- gradient_checkpointing True\-- lazy_preprocess True reference:

Https://github.com/lm-sys/FastChat/#fine-tuning

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report