In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Data is at the core of AI training, which has been confirmed and reconfirmed. Although data-driven is not the only way to train AI algorithms, there has been a clear trend in the industry, those areas with rich and cheap data will be more likely to give birth to AI technology. For example, the ability of machine translation between Chinese and English is much stronger than that between small languages, and the application rate of face recognition with easier data collection is also higher than that of intelligent recognition of biometric features such as iris and eye lines.
It can be said that the lack or high cost of data is often the direct reason that hinders the development of AI.
Of course, in the face of this situation, there is also a corresponding technical solution-data enhancement technology.
How does the mitosis of data proceed?
The so-called data enhancement technology can be understood as a reproductive dish of data, which can make the data mitotic and enhance the sample to expand the data set.
Take the image data as an example, when the image data is insufficient, we can make some slight changes to the image, such as cropping, rotation, mirror inversion, slight distortion, increasing noise, adding occlusion and so on. For humans, although it can be seen at a glance that there is no difference between the two. But for AI, even the change of a few pixels is a new data sample.
When applied to text data, there are two methods: mutual translation and word vector substitution. Through machine translation, if a sentence is translated from Chinese to English, and then from English to Chinese, the word order, synonyms and so on can be adjusted and replaced, and the effect of corpus multiplying two can be achieved. And through the natural language generation technology, different objects in a sentence are divided and replaced to generate new sentences.
These data enhancement technologies have also begun to gradually improve efficiency through the blessing of deep learning. For example, in April last year, Google launched a technology called AutoAugment, which designs an automatic search space, uses search algorithms to determine image enhancement strategies suitable for data sets (such as translation, zooming, etc.), sets the order of execution and executes automatically.
For example, inputting an animal photo data set into AutoAugment,AutoAugment will determine that panning and cropping is the solution that maximizes AI's "strangeness" to the data, and then starts to execute automatically.
Why is data enhancement not widely available? The cost Circle of AI Enterprises
Seeing these solutions, do you have a feeling of "dawn"? Since data can be "self-propagated", the accumulation and collection of data is no longer an obstacle to the development of AI. The translation of small languages and the identification of unpopular plants and animals can be quickly AI, and the data monopoly under the hegemony of giant mobile Internet enterprises is about to be cracked. Wait, if data enhancement technology is so powerful, it should at least attract as much attention in academia and industry as BERT, and quickly form an industry chain.
But in fact, we can still see a large number of AI enterprises worried about how to get the data today.
Why didn't data enhancement technology solve their problems completely? Behind this is the platitudes of cost.
Data enhancement technology has never been used for free, and many times the AI technology interface itself needs to be charged according to the number of calls, not to mention the computational cost and time cost behind it.
Take the machine translation often used by text data, for example, the machine translation services provided by Baidu, Sogou, youdao and so on are free for ordinary users, but they have to be charged when they exceed a certain traffic value. Data enhancement technology obviously falls into the category of charging. Some Zhihu users said that the charges for several mainstream machine translation software range from 48 yuan to 60 yuan per million characters. For enterprises, this is also a big expense.
AutoAugment, which is applied to image data enhancement, is a very expensive algorithm. In application, it is necessary to converge 15000 models, which consumes a lot of computational power. If you use an object like CIFAR-10 as a dataset, you need to use the Nvidia Tesla P100 GPU to calculate thousands of hours, which costs $7500 at Google's rate.
In other words, if the cost of data enhancement exceeds that of manual photo collection, companies will naturally choose a more cost-effective approach. In fact, most of the time, enterprises can neither cover the cost of obtaining data manually, nor cover the cost of data enhancement technology application.
When AI enters the era of cost-effective
This phenomenon reminds us that AI is entering a "cost-effective" era.
The years when capital spends a lot of money on AI have passed. With the gradual industrialization of giant AI technology, other AI enterprises have been unable to invest as much as before. At the same time, with the development of the industrial chain becoming more and more perfect, the "billing standard" in the development process of AI enterprises has gradually become clear. When the giant occupies the market, it is easy for AI enterprises to see what cost business risks they will experience in the future development of their business.
At this point, it is important to help reduce the cost of applying technologies such as data enhancement.
For example, the Berkeley Institute of artificial Intelligence recently launched a group enhancement technology to find data enhancement strategies more efficiently through data grouping methods. Compared with the simple application of AutoAugment, it can improve the efficiency nearly a thousand times. Enable many enterprises and even individuals who cannot afford high computing costs to take advantage of data enhancement technology.
This technology, which aims to reduce the cost of AI research and development, may become typical in the future. The reason for this is that giants like Google and research institutions like OpenAI have strong capital and resource support, which makes it difficult for them to consider the "cost performance" of technology when developing technology. What's more, most of the time, these enterprises themselves are in control of computing resources, and they have the pricing power between technology and computing power to some extent. From Google's point of view, they certainly want companies to consume as much computing resources as possible when developing AI. After all, everything from algorithmic technology to cloud computing resources is their business.
In the past, if the computing standards and computing resources of the technology are monopolized, AI enterprises will encounter an obvious threshold in their development. Once this threshold is exceeded, it will be very difficult for AI enterprises to cross the cost, so they will be blocked directly. Even tech giants can use this way to indirectly manipulate the industry. for example, if Google wants to protect its advantage in small language machine translation, it only needs to increase the cost of machine translation application in that language. it can hinder related research by increasing the cost of obtaining corpus data.
In this case, breaking the monopoly, reshaping pricing power and making AI research and development more cost-effective may be able to create a unique path.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.