Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Transformer is a new milestone, which has been born for 6 years, and the opening work has been attracted nearly 80,000.

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

The original title: "Transformer a new milestone!" Six years after its birth, the first work was attracted nearly 80,000. It didn't win the NeurIPS best paper, but it completely changed the AI world. "

Transformer,6 is old! In 2017, the foundation work of Attention is All You Need was published, and it has been cited nearly 80,000 so far. How much longer can this king structure continue to play?

On June 12, 2017, Attention is All You Need, the famous Transformer, was born with a blast of thunder.

Its emergence has not only changed NLP and become the mainstream model in the field of natural language, but also successfully crossed the border of CV, bringing unexpected surprises to the AI world.

As of today, the sixth anniversary of the birth of Transformer. And this paper is cited as high as 77926.

Nvidia scientist Jim Fan made an in-depth summary of this earthly work:

1. Transformer didn't invent attention, but pushed it to the extreme.

The first attention paper was published three years ago (2014).

This paper is from Yoshua Bengio's lab, and the title is inconspicuous "Neural Machine Translation by Jointly Learning to Align and Translate".

It is a combination of "RNN + context vector" (that is, attention).

Many people may not have heard of this paper, but it is one of the greatest milestones in NLP and has been cited 29K times (compared with 77K for Transformer).

2. Neither Transformer nor the original attention paper talked about general-purpose serial computers.

Instead, both aim to solve a narrow and specific problem: machine translation. It is worth noting that AGI (someday soon) can be traced back to the humble Google translation.

3. Transformer was published on NeurIPS in 2017, which is one of the world's top artificial intelligence conferences. However, it didn't even win an Oral speech, let alone an award.

There were three best papers on NeurIPS that year. As of today, they add up to 529 citations.

Transformer, a classic, has not attracted much attention in NeurIPS 2017.

In this regard, Jim Fan believes that until a good work becomes influential, it is difficult to get people to recognize it.

I don't blame the NeurIPS committee-the award-winning papers are still first-class, but not that influential. A counterexample is ResNet.

He Kaiming and others won the best paper in CVPR 2016. This paper is well deserved and correctly recognized.

In 2017, few of the smart people in the field could predict the revolutionary scale of LLM today. As in the 1980s, few people can foresee the tsunami of deep learning since 2012.

OpenAI scientist Andrej Karpathy was interested in the summary of point 2 of Jim Fan and said

Papers that introduce attention (by @ DBahdanau, @ kchonyc, Bengio) receive 1000 times less attention than "Attention is All You Need" papers. And historically, both papers are very common, but what is interesting is that both happen to be developed for machine translation.

All you need is attention! Before the birth of Transformer, people in AI circle mostly used encoder-decoder (Encoder-Decoder) structure based on RNN (cyclic Neural Network) to complete sequence translation in natural language processing.

However, the deadliest drawback of RNN and its derivative networks is slowness. The key problem is that the dependence of the hidden state before and after can not be realized in parallel.

The current world of Transformer is in its heyday, which makes many researchers start the journey of chasing stars.

In 2017, eight Google researchers published Attention is All You Need. It can be said that this paper is a disrupter in the field of NLP.

Paper address: https://arxiv.org/ pdf / 1706.03762.pdf it completely abandons the recursive structure, relies on attention mechanism, mining the relationship between input and output, and then realizes parallel computing.

Some people even asked, "can RNN be completely abandoned with the Transformer framework?" "

There is no doubt that what JimFan called Transformer was originally designed to solve the translation problem.

The blog posted by Google that year stated that Transformer is a new type of neural network architecture for language understanding.

Article address: https://ai.googleblog.com/ 2017 transformer-novel-neural-network.html 08 / specifically, Transformer consists of four parts: input, encoder, decoder, and output.

The input characters are first converted into vectors by Embedding, and position information is added by adding position coding (Positional Encoding).

Then, the features are extracted by using the "encoder" and "decoder" of the multi-head self-attention and feedforward neural network, and finally the results are output.

As shown in the figure below, Google gives an example of how Transformer can be used in machine translation.

The neural network of machine translation usually contains an encoder that generates a representation after reading the sentence. The hollow circle represents the initial representation generated by Transformer for each word.

Then, using self-attention, we aggregate information from all other words and generate a new representation for each word in the whole context, represented by a solid circle.

Then, repeat this step for all words in parallel and generate new representations in turn.

Similarly, the process of the decoder is similar, but one word is generated from left to right at a time. It focuses not only on other previously generated words, but also on the final representation generated by the encoder.

Google also applied for a patent specifically for it in 2019.

Since then, the counterattack of Transformer has been a king in natural language processing.

Going back to the origin, all kinds of GPT (Generative Pre-trained Transformer) all originate from this 17-year-old paper.

However, it is not just the NLP academic circle that Transformer has exploded.

Universal Transformer: from NLP to the Google blog in the year of CV2017, researchers have enjoyed the future application potential of Transformer:

It involves not only natural language, but also very different inputs and outputs, such as images and videos.

Yes, after the huge waves in the field of NLP, Transformer came to the field of computer vision. Even at that time, many people shouted that Transformer had taken another city.

Since 2012, CNN has been the architecture of choice for visual tasks.

With the emergence of more and more efficient structures, using Transformer to complete CV tasks has become a new research direction, which can reduce the complexity of the structure, explore scalability and training efficiency.

In October 2020, Google proposed Vision Transformer (ViT), which can classify images directly with Transformer without convolution neural network (CNN).

It is worth mentioning that ViT performs well, surpassing the state-of-the-art CNN with a four-fold reduction in computing resources.

Then, in 2021, the OpenAI company still had two bombs and released DALL-E based on Transformer, as well as CLIP.

These two models achieve good results with the help of Transformer. DALL-E can output stable images according to the text. CLIP can classify images and texts.

Later, the evolutionary version of DALL-E, DALL-E 2, and Stable Diffusion, also based on the Transformer architecture, once again subverted AI painting.

Here is the entire timeline of the model based on Transformer.

From this we can see how capable Transformer is.

In 2021, even Google researcher David Ha said Transformers was the new LSTMs.

Before the birth of Transformer, he called LSTM like the AK47 in a neural network. No matter how hard we try to replace it with something new, it will still be used in 50 years' time.

It took Transformer only four years to break this prediction.

The New Silicon Valley "Seven traitors" now, six years later, what happened to the Transformers who once teamed up to create Google's strongest Transformer?

Jakob Uszkoreit is recognized as a major contributor to the Transformer architecture.

He left Google in mid-2021 and co-founded Inceptive Labs to design mRNA using neural networks.

So far, they have raised $20 million and have a team of more than 20 people.

Ashish Vaswani left Google at the end of 2021 to start AdeptAILabs.

It can be said that AdeptAILabs is in the stage of rapid development.

So far, the company has not only raised $415 million, but is also valued at more than $1 billion.

In addition, the size of the team has just exceeded 40.

However, Ashish left Adept a few months ago.

Niki Parmar is the only female author in Transformers's thesis.

She left Google at the end of 2021 and founded AdeptAILabs with the just mentioned Ashish Vaswani.

However, Niki also left Adept a few months ago.

Noam Shazeer left Google at the end of 2021 after 20 years at Google.

He immediately founded Character AI with his friend Dan Abitbol.

Although the company has only about 20 employees, it is quite efficient.

So far, they have raised nearly $200 million and are about to join the ranks of unicorns.

Aidan Gomez left Google Brain in September 2019 to start CohereAI.

After three years of steady growth, the company is still expanding-Cohere recently has more than 180 employees.

At the same time, the funds raised by the company are about to break through the $400 million mark.

Lukasz Kaiser, one of the co-authors of TensorFlow, left Google in mid-2021 to join OpenAI.

Illia Polosukhin left Google in February 2017 and founded NEAR Protocol in June 2017.

NEAR is currently valued at about $2 billion.

At the same time, the company has raised about $375 million and raised a lot of secondary financing.

Right now, only Llion Jones still works at Google.

In terms of the contribution of the paper, he jokingly said: "my greatest significance lies in the title." "

Netizens' hot comments up to now, looking back at Transformer, it will still arouse the thinking of many netizens.

A groundbreaking paper in AI.

Marcus said it was a bit like the Portland Trail Blazers giving up on Michael Jordan.

This incident shows that even at this level of research, it is difficult to predict which paper will have what degree of impact in this field.

This story tells us that the real value of a research article is reflected in a long-term way.

Wow, how time flies! Surprisingly, this model broke through the limits of attention and revolutionized NLP.

During my PhD, my mentor @ WenmeiHwu always taught us that the most influential papers would never win the best thesis award or any recognition, but over time, they would eventually change the world. We should not fight for awards, but should focus on influential research!

Reference:

Https://twitter.com/DrJimFan/status/1668287791200108544

Https://twitter.com/karpathy/status/1668302116576976906

Https://twitter.com/JosephJacks_/status/1647328379266551808

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report