AI is getting stronger and stronger, but we can hardly afford it. 10/27 Update SLTechnology News&Howtos

AI is getting stronger and stronger, but we can hardly afford it.

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Photo source: Pixabay2016,"AlphaGo" and Li Shishi's Go duel brought artificial intelligence and deep learning into the public's field of vision. AlphaDog won the man-machine battle 4 - 1 overall. Not only Go, but deep learning has developed rapidly in recent years, showing powerful abilities in many fields such as language and medical treatment. However, all this comes at a cost. In order to reduce the error rate, deep learning requires more and more calculations to complete the task, resulting in economic costs, electricity consumption, and environmental pollution that will exceed the affordability of human society. The day artificial intelligence becomes popular, perhaps the day human energy is exhausted by computers?

The current field of deep learning originated in the era of vacuum tube computers. In 1958 Frank Rosenblatt of Cornell University designed the first artificial neural network inspired by neurons in the brain, which was later named "deep learning." Rosenblatt knew that the technology was beyond the computing power of the day, lamenting that "as neural networks grow in number of connections... traditional digital computers will soon be unable to handle the load. "

Fortunately, computer hardware has been rapidly upgraded over the decades, increasing computing speed by about 10 million times. As a result, researchers in the 21st century are able to implement neural networks with more connections to simulate more complex phenomena. Deep learning is now widely used in many fields, such as Go, translation, predicting protein folding, and analyzing medical images.

The rise of deep learning is unstoppable, but its future is likely to be bumpy. The computational limitations that Rosenblatt worried about remain a cloud hanging over the field of deep learning. Today, researchers in the field of deep learning are approaching the limits of computational tools.

How Deep Learning Works Deep learning is the fruit of long-term developments in the field of artificial intelligence. Early AI systems were based on logic and rules given by human experts, and gradually introduced parameters that could be adjusted by learning. Neural networks can now learn to build highly malleable computer models. The output of the neural network is no longer the result of a single formula, but uses extremely complex calculations. A neural network model large enough can accommodate any type of data.

To understand the difference between an expert-system approach and a flexible-system approach, consider a scenario where a patient is diagnosed with cancer based on an X-ray. We assume that there are 100 features (variables) in the X-ray, but we don't know which features are important.

Expert systems solve problems by having experts in radiology and oncology specify important variables and allowing the system to examine only those variables. This method requires a small amount of calculation, so it has been widely used. But if experts fail to point out key variables, the system's ability to learn is poor.

Flexible systems solve problems by examining as many variables as possible and letting the system decide for itself what is important. This requires more data and higher computational costs, and is less efficient than expert systems. However, flexible systems can outperform expert systems provided sufficient data and computation are available.

Deep learning models are overparameterized, meaning there are more parameters than there are data points available for training. Noisy Student's neural network, for example, has 480 million parameters, but it was trained using only 1.2 million labeled images. Overparameterization often leads to overfitting, i.e., the model fits the training dataset so well that it does not grasp general trends, but learns the specificity of the training set. Deep learning iteratively adjusts parameter sets to avoid overfitting problems by randomly initializing parameters,"stochastic gradient descent," and so on.

Deep learning is already playing a role in machine translation. In the early days, translation software translated according to rules formulated by grammar experts. When translating Urdu, Arabic, Malay, etc., rule-based methods initially outperformed statistics-based deep learning methods. But with the increase in textual data, deep learning has overtaken all other approaches. Deep learning has proven to be superior in almost all application areas.

One rule that applies to all statistical models is that to improve performance k times, you need at least k2 times more data to train the model. And because of the overparameterization of the deep learning model, improving performance k times would require at least k4 times the computation effort. A "4" in the index means that a 10,000-fold increase in computation can lead to a tenfold improvement at most.

Obviously, in order to improve the performance of deep learning models, scientists need to build larger models and train them with more data. But how expensive will the calculations become? Will it be too high for us to afford, and therefore hinder the development of the field?

To explore this question, MIT scientists collected data from more than 1000 deep learning research papers in areas such as image classification, object detection, question answering systems, named entity recognition, and machine translation. Their research warns that deep learning is facing serious challenges. "If you can't improve performance without increasing the computational burden, computational limitations will stall the field of deep learning. "

Take image classification as an example. Reducing image classification errors comes with a huge computational burden. For example, the AlexNet model first demonstrated the ability to train a deep learning system on a graphics processor (GPU) in 2012, using two GPUs for five to six days of training. By 2018, NASNet-A, another model, had reduced its error rate to half that of AlexNet, but it used more than 1000 times as much computation.

Has the improvement in chip performance kept pace with the development of deep learning? Not really. Of NASNet-A's 1000-fold increase in computing, only six times the increase came from better hardware; the rest was achieved by using more processors or running longer, with higher costs.

Theory tells us that k-fold performance requires k4 times more computation, but in practice the increase is at least k9 times more computation. This means that halving the error rate requires 500 times more computing resources and is expensive. However, the gap between the actual situation and theoretical predictions also means that there may be room for improvement of the algorithm, and there is an opportunity to improve the efficiency of deep learning.

Reducing the error rate to 5 percent requires 1028 floating-point operations, according to the researchers 'estimate of the "computational cost-performance" curve for the field of image recognition. Another study from the University of Massachusetts Amherst shows the enormous economic and environmental costs implied by the computational burden: training an image-recognition model with an error rate of less than 5 percent would cost $100 billion and consume electricity producing carbon emissions equivalent to those of New York City in a month. Training an image recognition model with an error rate of less than 1% is even more expensive.

It is inferred that by 2025, the optimal image recognition system for the ImageNet dataset should reduce the error rate to 5%.

But training such a deep learning system would result in the equivalent of a month's CO2 emissions in New York City. (Photo source: N. C. Thompson, K. Greenewald, K. Lee, G. F. The burden of computational costs has become apparent at the forefront of deep learning. Machine learning think tank OpenAI spent more than $4 million to design and train the deep learning language system GPT-3. Although the researchers made a mistake in their operation, they did not fix it, simply explaining in an appendix to the paper: "Retraining the model is unrealistic due to the high cost of training. "

Enterprises are also beginning to shy away from the computational costs of deep learning. A large supermarket chain in Europe recently abandoned a system based on deep learning to predict which products will be purchased. Company executives judged the cost of training and running the system to be too high.

Faced with rising economic and environmental costs, the field of deep learning urgently needs methods to improve performance under the premise of controllable computation. Researchers have done a lot of research on this.

One strategy is to use processors specifically designed for deep learning. Over the past decade, CPUs have given way to GPUs, field-programmable gate arrays, and application-specific ICs. These methods increase the efficiency of specialization, but at the expense of versatility, they face diminishing returns. In the long run, we may need an entirely new hardware framework.

Another strategy to reduce the computational burden is to use smaller neural networks. This strategy reduces per-use costs, but generally increases training costs. The trade-off depends on the circumstances. For example, models that are widely used should prioritize large operating costs, while models that require continuous training should prioritize training costs.

Meta-learning promises to reduce deep learning training costs. The idea is to apply the results of a systematic learning process to multiple domains. For example, rather than building separate systems to recognize dogs, cats, and cars, train a recognition system and use it multiple times. However, studies have found that once there is a slight difference between the original data and the actual application scenario, the performance of the meta-learning system will be severely degraded. Therefore, a comprehensive meta-learning system may require a huge amount of data to support it.

Some undiscovered or undervalued types of machine learning may also reduce computation. For example, machine learning systems based on expert insights are more efficient, but if experts cannot identify all influencing factors, such systems cannot compete with deep learning systems. Technologies such as Neuro-symbolic methods, which are still being developed, promise to better integrate the knowledge of human experts with the reasoning power of neural networks.

Just as Rosenblatt felt the dilemma at the beginning of neural networks, today's deep learning researchers are beginning to face the limitations of computational tools. Under the dual pressures of economics and environment, if we cannot change the way deep learning is done, we must face a slow future in this field. We look forward to an algorithmic or hardware breakthrough that will allow flexible and powerful deep learning models to continue to evolve and be used by us.

Original link:

https://spectrum.ieee.org/deep-learning-computational-cost

Paper links:

https://arxiv.org/abs/2007.05558#

Reference link:

https://www.csail.mit.edu/news/computational-limits-deep-learning

This article comes from Weixin Official Accounts: Global Science (ID: huanqiukexue) Compilation| Zheng Yuhong revised| Bai Defan

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.