Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the small data problem of AI

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What this article shares with you is about how to solve the small data problem of AI. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

To solve the problem of small data in AI, if there is not enough data to train the deep learning algorithm, there are two ways to solve this problem: to generate synthetic data, or to develop an AI model that can work with small data.

As we all know, deep learning needs data, and its model training is carried out on the basis of a large number of labeled data, for example, using millions of animal marker images to train AI to learn recognition. However, a large amount of tagged data is not suitable for some applications, in which case, it is difficult to train an AI model from scratch, even if possible.

One potential solution is to extend real data sets with composite data. This has been widely used in the field of autopilot. When self-driving cars drive millions of miles in a realistic simulation environment, they will face a variety of situations such as snowstorms and sudden pedestrian behavior, for which it is difficult to obtain real data.

Another solution around data problems is to develop AI models that can learn based on small data sets. A method called transfer learning (transfer learning) has been applied in computer vision tasks. The method uses a pre-trained AI algorithm to perform a task with a large amount of tagged data (such as identifying a car in an image), and then transfers this knowledge to another different task (such as identifying a truck) with little data. Using a pre-trained model is like using ready-made dumpling skins when making dumplings, eliminating the step of mixing noodles.

Although the pre-trained model has made great progress in the field of computer vision, it has been a challenging task in the field of natural language processing (NLP) due to the lack of tagged data. However, a method called self-supervised pre-training (self-supervised pre-training) is becoming more and more popular in the field of natural language processing.

The so-called self-supervision pre-training, first of all, according to a large number of data on the network training AI model. For example, OpenAI performed an extremely computationally intensive task: using 8 million web pages as training data to train an AI model to predict the next text vocabulary based on a given text. This method is called self-supervised learning because there is no "tag" involved: AI learns the language by predicting a hidden word based on other words in the sentence.

Another typical example is Google BERT, whose AI language model can not only be predicted according to the previous content, but also can be expanded based on the following text, that is to say, the model adopts a two-way language model, which can better integrate the knowledge of the preceding and later text.

The Facebook AI research department, led by Yann LeCun, has always been optimistic about self-supervision. For example, they will first train a language model, then pre-train it and fine-tune it to identify hate speech. Facebook has also opened up its self-supervised speech recognition model, which well solves the need for manually tagged text in small research projects. The amount of tagging training data for non-English languages is often limited. To solve this problem, Facebook has opened up the code wav2vec, which is especially useful for speech recognition in non-English languages.

The above is how to solve the small data problem of AI, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report