Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Huawei launched the AI poet "Yuefu": Tang poetry and Song ci are all out of the question, but they can't tell whether they are true or false for a while.

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Qianming comes from au Fei Temple.

Qubit report | official account QbitAI

Science students may not have anything to do with liberal arts students when it comes to literature and art.

You don't believe me? Take a look at this seven-sentence quilt:

After reading it, some netizens said:

Really, the rhyme, artistic conception and connotation are all very good.

Can not only write poems, but also write lyrics, such as this Manjianghong:

Moreover, he can also write Tibetan head poems:

Can you imagine that this is the masterpiece of a science student who doesn't know how to write poetry?

But it is.

These poems are from Yuefu, a new poem writing AI published by Noah's Ark Lab of Huawei.

When it came out, it attracted a lot of attention.

For its works, some people praise:

Rich poetry, neat and interesting, the program is really good, give likes to developers

There are also people who "make trouble" and say:

As soon as you plug the wild geese to the south of the Yangtze River, there are several letters from home to the north. Mo Tao Zhenhong did not shed tears, and he worked hard to Yan ran year after year. I don't believe that this AI is not as good as the average level of the Chinese Department of Peking University.

Some people even say, "Li Bai will be silent, while du Fu will shed tears."

Of course, some people have pointed out the problem:

Very neat, but I feel that at present most of them are still at the syntax level, not at the semantics level. A little lack of soul.

There is also the "Emperor of Truth" to speak out:

Xin Qiji's prose style and Lao du's depressed and frustrating methods are all relatively difficult for AI to learn. The problem is not that AI is too powerful, but that readers can no longer see the more sophisticated techniques in metrical poems.

Liu Qun, chief phonetic and semantic scientist at Huawei Noah's Ark Lab, also answered these questions on Weibo, revealing many of the stories behind the AI:

In fact, we do not understand poetry, and we do not use the rules of poetry to train the system, which is entirely learned by the system itself.

So how on earth did you learn this AI? The paper has been published.

The literature and art of the man of science and technology comes from GPT

Different from the free generation of texts, the generation of ancient Chinese poetry is a challenge, which usually needs to meet the requirements of both form and content.

There are various forms of ancient Chinese poetry, such as Wujie, Qijie, Wulu, Qilu, Manjianghong, Xijiang Moon, Shuidiaogetou and other ci plates and couplets, each of which has corresponding rules on the number of words, rhyme, tone, antithesis and so on.

Although the content is simple, the requirement is more elusive: a poem should be carried out around a theme and be consistent in content.

The "Yuefu" system proposed by Huawei, unlike most current solutions, does not require any manual rules or features, nor does it design any additional neuron components.

In the whole study, what needs to be done is to serialize the training poems into formatted text sequences as training data.

Then through the sampling of the language model token, we generate poems that meet the requirements of form and content, such as quatrains, rhythmic poems, ci, couplets and so on.

Moreover, they also proposed and implemented a method of fine-tuning the model to generate Tibetan poems.

The energy behind this comes from GPT, a pre-trained natural language model proposed by OpenAI. The core idea is to train and generate the language model with untagged text, and then fine-tune the model through tagged data according to specific tasks.

Yuefu AI is the first poetry writing system based on GPT and is closely related to the BERT proposed by Google.

The overall GPT model is implemented on the basis of the source code of BERT, the configuration of Transformer size is the same as BERT-Base, and the tokenization script and Chinese vocab released in BERT are also used.

Specifically, the process of training the poetry generation model is as follows:

The whole model training process has two stages: pre-training and fine-tuning.

Huawei's GPT model is pre-trained with a Chinese news corpus and then fine-tuned by collecting publicly available ancient Chinese poetry.

As shown in the figure above, the sample poem is first converted into a formatted sequence. The sequence consists of three main parts: format, theme and poetic style, separated by identifiers.

In the couplet, because there is no theme, the previous sentence is the theme and the second action text. Therefore, when generating couplets, it becomes the mode of giving upper couplets and generating lower couplets, which is also in line with the habit of "pair pair".

The overall data set is not small. The Chinese news corpus used for pre-training has 235 million sentences. The fine-tuning data set consists of 250000 quatrains and lawyers, 20, 000 words and 700000 couplets.

The pre-training was done on Huawei Cloud, using 8 Nvidia V100 (16G) GPU to train 4 echo, which took a total of 90 hours.

The process of fine-tuning is to input all the poetry sequences into Transformer and train an autoregressive language model. The goal is to maximize the probability of observing any sequence:

The process of fine-tuning does not take a long time. If the training is too long, the model will tend to use the original sentence directly from the corpus.

After the completion of the training, the format and theme of the generated poetry are first transformed into an initial sequence, then the initial sequence is input into the model, and then the remaining fields of the poetic part are decoded according to token.

In the decoding process, instead of using hard constraints to ensure the correctness of the format, the model automatically assigns commas and periods to specific locations, and when the token is recognized as "EOS", the decoding process ends.

Moreover, the truncated top-k sampling strategy is used to obtain different poems instead of beam search. Specifically, when sampling one Token at a time, first select the Token with the maximum probability of top-k, and then sample a specific token from the top-k Token.

They say that even with a truncated top-k sampling strategy, the resulting poem is still in the right form.

According to the paper, the method of training Tibetan head poems is the same, but the method is different when formatting the sequence: the combination of the first character in each line is used to replace the original theme of a poem: "five words quintessential sentence (format) there is bright moonlight in front of the bed (Tibetan head poem) there is bright moonlight in front of the bed." moon, bowed his head and missed his hometown. "

Huawei also fully demonstrated the effect in the paper, such as the following four songs "Jiangshang Tian Jia", only one was written by a poet in the Tang Dynasty, and the other three capitals came from Yuefu AI.

From top to bottom, ABCD, can you tell which one is authentic? (the answer is revealed at the end of the article)

Who is the first AI poet?

Huawei "Yuefu" is neither the first nor the last to generate AI in ancient Chinese poetry.

Before that, there was the "Nine songs" put forward by Sun Maosong's team of Tsinghua University.

According to the official introduction, this system adopts deep learning technology, combined with a number of specially designed models for poetry generation, and carries out training and learning based on more than 800000 poems written by human poets. it has the characteristics of multi-modal input, multi-genre and multi-style, human-computer interaction and so on.

Recently, some people have trained the Chinese version of GPT-2 based on the Chinese version of the corpus and used it for poetry generation.

On the day Yuefu was launched, Peking University, National Defense University of Science and Technology and other institutions jointly released a new poetry model based on unsupervised machine translation, using segment-based filling and reinforcement learning to generate seven-character rhythmic poems from the vernacular.

So, which one is stronger?

Because the Chinese version of GPT-2 and Peking University United's system is not yet open to experience, only Huawei "Yuefu" and Tsinghua University "Nine songs" participated in this "Huashan on the Sword".

The first round: the theme "Summer", with seven unique sentences.

Tsinghua Nine songs compose a poem:

Huafu's poems for Yuefu go like this:

In both AI flaws, Tsinghua Nine songs began to say "Autumn comes" as soon as they opened their mouths, and Huawei Yuefu also mentioned "April". It didn't mean anything, and it was obviously different from that of summer.

But in contrast, Huawei Yuefu has more summer elements, such as Hexiang, Xiayin and so on.

The second round: the theme "long Night", with five unique sentences

The poem from Tsinghua Nine songs goes like this:

Do not have to worry about sitting alone, relatively sad? This artistic conception Emmm. The marriage is falling apart?

Works of Huawei Yuefu:

Intuitively, the description of artistic conception is good, but the impact is insufficient.

This round, the two AI performance is good, and both have the corresponding artistic conception to reflect. Relatively speaking, the emotional level of Tsinghua Nine songs is richer.

The third round, the Tibetan head poem "Neural Network", seven-character quatrains

The nine songs of Tsinghua University are as follows:

From the point of view of rhyme and artistic conception, it is not bad. Huawei Yuefu gave a poem like this:

Similarly, this Tibetan head poem can also show some artistic conception.

This round, the two AI can more accurately complete the task, giving a bit of artistic conception of poetry.

So far, after three rounds of competition, on the whole, it is difficult to tell the difference. The difference lies in the way both sides realize it.

Tsinghua Nine songs, based on a number of specially designed models for poetry generation, is relatively complex, in the format of poetry, the control is relatively strict, although serious, but the speed of poetry is really slow.

And Huawei's Yuefu is only based on GPT, according to Liu Qun, they do not understand poetry, and do not use poetic rules to train the system, completely learned by the system itself, and the generation of poetry is very fast.

Liu Qun is also quite modest about the level of poetry produced by Yuefu AI:

We have talked to people who understand poetry and say that the prosody is not entirely in line with the rules, but it is easier for laymen to read.

As for the advantages and disadvantages of the two ways, you might as well refer to the old saying: there is no first place in the text.

Huawei Noah Ark Laboratory

Huawei Noah Ark Laboratory was established in 2012 and belongs to Huawei 2012 Laboratory.

The name Noah's Ark also reflects the importance of this laboratory within Huawei. Ren Zhengfei also mentioned earlier that he hoped these laboratories would become Huawei's "Noah's Ark".

At present, the laboratory has branches in Shenzhen, Hong Kong, Beijing, Shanghai, Xi'an, North America and Europe. Research interests include computer vision, natural language processing, search recommendation, decision reasoning, human-computer interaction, AI theory, high-speed computing and so on.

With regard to Yuefu AI, Huawei also noted in the paper that this is a by-product of their research on GPT. Currently, Huawei Yuefu AI has launched the Mini Program EI experience Space.

Support five-character quatrains, seven-character quatrains, five-character rhythm poems and seven-character rhythm poems, as well as Tibetan head poetry model. Lyrics and pairs are not online yet.

Finally, a seven-character rhythm poem artificial intelligence generated by Yuefu is attached.

By the way, the answer is C.

Related portals:

Yuefu AI thesis

GPT-based Generation for Classical Chinese Poetry

Https://arxiv.org/pdf/1907.00151.pdf

Tsinghua Nine songs and Poems website:

Http://118.190.162.99:8080/

-end-

Https://www.toutiao.com/a6734146874963395084/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report