Science: there is an investigation and a truth! There has been no real progress in some AI areas for many years. 07/19 Update SLTechnology News&Howtos

Science: there is an investigation and a truth! There has been no real progress in some AI areas for many years.

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

2020-05-30 04:22:44

Author | Jiang Baoshang

| Editor | end of cluster |

On May 29th, Science published an article entitled "Core Progress in some areas of artificial Intelligence has been stagnant." in the article, author Matthew Hutson mentioned that some of the "old algorithms" of many years ago, if fine-tuned, can match the performance of the current SOTA.

In addition, the author also enumerates some papers, which analyze the current key AI modeling technologies, and there are mainly two kinds of analysis results: 1, the core innovation claimed by the researcher is only a micro-improvement of the original algorithm; 2, the performance of the new technology is not much different from the old algorithm many years ago.

Specific to the technical level, the AI modeling methods of comparative analysis in this paper include: neural network pruning, neural network recommendation algorithm, depth measurement learning, antagonistic training, language model.

Scientific research is risky, so you need to be careful when you enter the pit. Below, the AI Science and Technology Review briefly introduces these papers to provide you with a guide to avoid the pit.

1 Neural network pruning: fuzzy evaluation index

Paper address:

Https://proceedings.mlsys.org/static/paper_files/mlsys/2020/73-Paper.pdf

The paper on the comparative analysis of neural network pruning technology is "What is the State of Neural Network Pruning?". The first paper is written by Davis Blalock, a researcher from MIT.

By comparing 81 related papers and pruning hundreds of models under control conditions, they obviously found that there were no standardized benchmarks and indicators in the field of neural network pruning. In other words, the techniques currently published in the latest papers are difficult to quantify, so it is difficult to determine how much progress has been made in the field over the past three decades.

The main results are as follows: 1. Although many papers claim to improve the technical level, they ignore the comparison with other methods (these methods also claim to reach the SOTA). This neglect reflects two aspects, one is to ignore the pruning technology before 2010, and the other is to ignore the current pruning technology.

2. The dataset and architecture are "fragmented". A total of 49 datasets, 132 architectures and 195 combinations (datasets, architectures) were used in 81 papers.

3. The evaluation index is fragmented. The paper uses a variety of evaluation indicators, so it is difficult to compare the results between papers.

4. Confuse variables. Some confusing variables make quantitative analysis very difficult. For example, the accuracy and efficiency of the initial model, random changes in training and fine-tuning, and so on.

At the end of the paper, Davis Blalock puts forward specific remedial measures and introduces the open source framework ShrinkBench to promote the standardized evaluation of pruning methods. In addition, the paper was presented at the MLSys meeting in March.

2 Neural network recommendation algorithm: none of the 18 algorithms are immune.

Https://dl.acm.org/doi/pdf/10.1145/3298689.3347058

The paper analyzing the neural network recommendation algorithm is "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches". The author is a researcher from the Polytechnic University of Milan, Italy.

In this paper, the author makes a systematic analysis of the current top recommendation algorithms, and finds that only 7 of the 18 algorithms proposed in the top meeting in recent years can be reproduced reasonably. There are six other ways that can be outdone with relatively simple heuristics. The remaining several, although obviously better than baselines, but can not beat the fine-tuned non-neural network linear ranking method.

The author analyzes three reasons for this phenomenon: (1) weak benchmark (weak baselines); (2) establishing weak method as a new benchmark; (3) differences in comparing or duplicating the results of different papers.

In order to obtain the above results, the author introduces in the paper that there are two steps: the first step is to try to reproduce the results with the source code and data provided by the relevant papers. In the second step, the experiment reported in the original paper is re-performed, but an additional baseline method is added to the comparison, specifically, using the user-based and project-based nearest neighbor heuristic method, as well as the simple graph-based method (graph-based approach).

3 Deep measurement learning: there has been no progress in this field in the past 13 years.

Https://arxiv.org/pdf/2003.08505.pdf

The analysis of depth measurement learning came from researchers at Facebook AI and Cornell Tech, who published a preview of the research paper titled "A Metric Learning Reality Check".

In the paper, the researchers claim that there is no substantial improvement in the current research progress in the field of deep measurement learning (deep metric learning) in the past 13 years compared with the baseline methods (Contrastive, Triplet) of 13 years ago.

The researchers altogether pointed out three defects in the existing literature: unfair comparison, training through test set feedback, and unreasonable evaluation indicators.

Unfair comparison: it is generally stated that the performance of one algorithm is better than that of another, and it is usually necessary to ensure that as many parameters as possible remain the same, but this is not the case in the papers on metric learning. In addition, the accuracy improvement mentioned in some papers is actually brought about by the selected neural network, not the "innovative" method they proposed. For example, a 2017 paper claimed to have achieved a huge performance improvement using ResNet50, when in fact he compared it to the less accurate GoogleNet.

Training through test set feedback: not only in the field of measurement learning, most papers have this common problem: half of the data set is divided into test set, half into training set, no verification set. In the process of specific training, regularly check the accuracy of the test set of the model, and report the accuracy of the best test set, that is to say, model selection and super-parameter tuning are completed through direct feedback from the test set, which obviously has the risk of over-fitting.

Unreasonable evaluation indicators: in order to reflect accuracy, most measurement learning papers will report Recall@K, normalized mutual information (NMI) and F1 scores. But are these necessarily the best measures? The following figure shows three embedded spaces, each recall@1 index evaluation is close to a full score, but in fact, the characteristics between them are not the same. In addition, F1 and NMI scores are also similar, which partly shows that, in fact, these indicators do not bring much information.

Three toy examples: how different precise metrics are scored.

While pointing out the problems, the researchers of FB and Cornell naturally also pointed out suggestions for improvement, including fair comparison and repetitive experiments for the above three shortcomings, hyperparametric search through cross-validation, and more accurate information and accuracy measurements.

4 antagonistic training: all improvements can be achieved by "stopping in advance"

Https://openreview.net/pdf?id=ByJHuTgA-

The title of the paper on adversarial training is "Overfitting in adversarially robust deep learning", and the first author is Leslie Rice, a researcher from Carnegie Mellon University.

In this paper, the author mentioned that the progress of machine learning algorithm can come from the change of architecture, loss function, optimization strategy and so on, and fine-tuning any of these three factors can change the performance of the algorithm.

His research field is confrontation training, he said: the trained image recognition model can be protected from "adversarial attacks" by hackers, and the early confrontation training method is called projection gradient descent algorithm (projected gradient descent).

Many recent studies have claimed that their confrontation training algorithm is much better than the projection gradient descent algorithm, but it has been found that almost all recent algorithm improvements in adversarial training can be achieved by simply using "stop in advance". In addition, in the adversarial training model, effects such as double descent curve still exist, and the observed over-fitting can not be explained much.

Finally, the author studies several classical and modern deep learning over-fitting remedies, including regularization and data enhancement, and finds that none of them can exceed the benefits achieved by "stopping in advance". As a result, they conclude that innovations such as PGD are difficult to achieve and that current research has made few substantial improvements.

5 language model: LSTM still outshines others

The paper on the study of language translation is called On the State of the Art of Evaluation in Neural Language Models, which is jointly completed by DeepMind and the University of Oxford.

In this paper, the author mentions the continuous innovation of neural network architecture, which provides stable and up-to-date results for language modeling benchmarks. These results are evaluated using different code bases and limited computing resources, and this assessment is uncontrollable.

According to the content of the thesis, the author mainly studies three recursive model architectures (recurrent architectures), namely: LSTM, RHN (Recurrent Highway Network) and NAS. RHN is studied because it achieves SOTA on multiple data sets, while NAS is studied because its architecture is the result of an optimization process based on automatic reinforcement learning.

Finally, through large-scale automatic black box hyperparameter tuning, the author re-evaluates several popular architectures and regularization methods, and draws a conclusion that the performance of the standard LSTM architecture is better than that of the "recent" model after proper regularization.

Via

Https://www.sciencemag.org/news/2020/05/eye-catching-advances-some-ai-fields-are-not-real

Https://www.toutiao.com/i6832364243111641613/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.