Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

300USD equals ChatGPT, Stanford 13 billion parameter "Little Alpaca" was born.

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Following the grass mud horse (Alpaca), Stanford teamed up with CMU, UC Berkeley and other institutions to release the 13 billion parameter model Vicuna, which can achieve 90% ChatGPT performance for as little as $300.

Following the open source of Meta's LLaMA model, researchers in the AI community have derived many versions from this model.

Some time ago, Stanford released Alpaca, which is fine-tuned by Meta's LLaMA 7B, using only 52k of data, and its performance can rival that of GPT-3.5.

Today, Stanford scholars teamed up with CMU, UC Berkeley and others to launch a new model-Vicuna with 13 billion parameters, commonly known as "Little Alpaca" (llama).

Vicuna is trained to fine-tune LLaMA on user-shared conversations collected by ShareGPT, which costs nearly $300.

The researchers designed eight problem categories, including math, writing, and coding, and tested the performance of Vicuna-13B and four other models.

The test process uses GPT-4 as the criterion, and the results show that Vicuna-13B achieves the ability to compete with ChatGPT and Bard in more than 90% of the cases.

At the same time, it outperformed other models, such as LLaMA and Stanford's Alpaca, in more than 90 per cent of cases.

Team members come from the University of California, Berkeley, Carnegie Mellon University, Stanford University, the University of California, San Diego and Mohamed Ben Zayed University of artificial Intelligence.

90% rival ChatGPT researchers put Stanford's Alpaca and Vicuna in a big competition, demonstrating the answers to benchmark questions respectively.

After fine-tuning Vicuna using ChatGPT conversation data shared by 70K users, the study found that Vicuna can generate more detailed and structured answers than Alpaca.

Q: write an interesting travel blog article about a recent trip to Hawaii, emphasizing cultural experiences and must-see attractions.

Alpaca's answer can be said to be a condensed version, which was written in just a few lines and did not complete the task as required. It only mentioned that he had written a blog and made an overview of the blog content.

Let's take a look at Vicuna, which wrote a detailed and fascinating travel blog article that is not only interesting, but also details the cultural experience and must-see attractions of Hawaii.

Thus, let GPT-4 give a score, an Alpaca7 score, and a full Vicuna score.

So how does Vicuna perform against ChatGPT?

Both scored 9 points!

As you can see, the articles provided by these two models for a trip to Hawaii are not only fascinating, but also fluent.

In addition, the level of detail and accuracy in both answers are excellent, and both models effectively convey the excitement and beauty of the trip to Hawaii.

In addition, the researchers tested Vicuna and LLaMA, as well as Google's Bard model, and found that LLaMA performed worst (1), with little response.

The accuracy and relevance of Bard's answer is also high, with a score of 9, but it is slightly lower than Vicuna in terms of more attractive answers.

In addition to writing, the researchers compared the abilities of the Vicuna model with the other four models in terms of coding, mathematics, role-playing and common sense, with a total of 80 questions.

Finally, the researchers' preliminary assessment based on GPT-4 is summarized as shown in the figure. As you can see, Vicuna achieves more than 90% of the capability of Bard / ChatGPT.

Relative response quality assessed by GPT-4

Interestingly, in this Vicuna demo, the team also joined the trial of Alpaca and LLaMA, which has just been shut down.

Demo address: https://chat.lmsys.org/

The model introduces that the birth of ChatGPT is very exciting, but the fact that OpenAI is not Open really annoys insiders.

Precisely, Meta's LLaMA model is open source, providing a choice for many researchers to develop their own models.

The birth of Vicuna-13B was inspired by the LLaMA and Stanford Alpaca projects. This is an open source chat robot based on enhanced datasets and an easy-to-use, scalable infrastructure.

The training data of the model comes from the conversations shared by users collected by ShareGPT, and then the researchers fine-tune the basic model of LLaMA, and Vicuna-13B is born.

Vicuna-13B demonstrates performance comparable to other open source models such as Stanford Alpaca.

The researchers conducted a preliminary assessment of the performance of Vicuna-13B and described its training and service infrastructure.

At the same time, the model demonstrates that demo is online, and all researchers can participate in an online demonstration interaction to test the ability of the chat robot.

Overview of Workflow

For the Vicuna-13B training process, the details are as follows:

First, the researchers collected about 70K conversations from ShareGPT, a ChatGPT conversation-sharing site.

Next, the researchers optimized the training scripts provided by Alpaca so that the model could better handle multiple rounds of conversations and long sequences. After that, we used PyTorch FSDP to train on 8 A100 GPU for one day.

In terms of the quality evaluation of the model, the researchers created 80 different questions and evaluated the model output with GPT-4.

To compare different models, the researchers combined the output of each model into a separate prompt, and then asked GPT-4 to evaluate which model gave a better answer.

Comparison of LLaMA, Alpaca, Vicuna and ChatGPT

The training Vicuna was created by fine-tuning the data of about 70K users sharing conversations collected from ShareGPT Public API.

To ensure the quality of the data, the researchers converted HTML back to markdown and filtered out inappropriate or low-quality samples.

In addition, the researchers divided the longer conversation into smaller segments to adapt to the maximum context length of the model.

Vicuna's training method is based on Stanford's Alpaca, and has made the following improvements:

Memory optimization:

In order for Vicuna to understand long contexts, the maximum context length is extended from 512 of Alpaca to 2048, which greatly increases the memory requirements of GPU. Here, the researchers solve the memory pressure by using gradient checkpoints and flash attention.

Multiple rounds of conversations:

The training loss is adjusted to consider multiple rounds of conversation, and the fine-tuning loss is calculated only on the output of the chatbot.

Reduce costs through Spot instances:

40 times the data set and 4 times the sequence length bring considerable challenges to the training. The researchers used SkyPilot-hosted Spot instances to reduce costs by using automatic recovery preemption and automatic regional switching.

This solution reduces the training cost of the 7B model from $500to about $140and the training cost of the 13B model from about $1000 to $300.

Evaluating the AI chatbot is a challenging task because it needs to check language comprehension, reasoning, and context awareness. As AI chatbots become more and more advanced, existing open benchmarks may no longer be sufficient.

For example, the evaluation data set self-instruct used in Stanford Alpaca can be effectively answered by SOTA chatbots, making it difficult for humans to distinguish between performance differences. Additional restrictions include the contamination of training / test data and the potentially high cost of creating new benchmarks.

In order to solve these problems, the researchers proposed an evaluation framework based on GPT-4 to automatically evaluate the performance of chat robots.

First, GPT-4 can generate diverse and challenging questions through well-designed tips. A total of 80 questions from 8 different categories, such as role-playing, coding / mathematical tasks, were used to test the performance of these models (LLaMA, Alpaca, ChatGPT, Bard and Vicuna) in different fields.

The researchers then asked GPT-4 to rate the quality of the answers based on help, relevance, accuracy and detail. The results show that GPT-4 can not only produce relatively consistent scores, but also provide a detailed explanation of why such scores are given. However, GPT-4 is not good at judging coding / math tasks.

Responses assessed by GPT-4 prefer Vicuna to the existing SOTA open source model (LLaMA, Alpaca) than GPT-4 in more than 90% of the questions.

Of the 45% of the questions, GPT-4 believes that Vicuna's answer is similar to or even better than ChatGPT's.

Taken together, Vicuna has a total score of 92% of ChatGPT.

Limitation researchers point out that, like other large language models, Vicuna also has some limitations.

For example, Vicuna does not perform well on tasks that involve programming, reasoning, math, and factual accuracy.

In addition, it is not sufficiently optimized to ensure safety or reduce potential toxicity or bias.

To address security issues, the researchers used OpenAI's censorship API in demo to filter out inappropriate user input.

There are not many names left now, except for LLaMA, Alpaca and Vicuna.

The researchers should rush quickly, because there are not many names left for you (1).

Reference:

Https://vicuna.lmsys.org/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report