Google shows off again: send an AI to help science fiction master Liu Yukun write the novel. 04/28 Update SLTechnology News&Howtos

Google shows off again: send an AI to help science fiction master Liu Yukun write the novel.

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

"more progress is coming," Jeff Dean said.

In recent years, the technology industry is crazy to add super-large-scale language models, one of the most important achievements is the rapid development of "artificial intelligence to create content" (AIGC) technology. Two years ago, OpenAI demonstrated the diversity of large language models through the GPT-3 model. Recently, a variety of AI products that generate pictures based on text prompts are numerous.

Interestingly, since the beginning of this year, AIGC has basically been robbed by "small players" such as Stable Diffusion, Craiyon, Midjourney and other "small players"-- AI giants like Google have seen little movement instead.

But in fact, Google is not "lying flat".

Towards the end of the year, on the morning of November 2, Google finally made a big move. The Silicon Valley giant, the most famous for AI research, unexpectedly released four new AIGC technologies that can be generated according to text prompts:

High resolution long video

3D model

Music

Code

And controllable text generation technology.

Source: Google Research "uses AI-enabled generative models with the potential to unleash creativity. Through these technologies, people from different cultures can more easily express themselves using images, videos and designs, which could not have been done before," said Jeff Dean, head of Google AI.

He said that through the unremitting efforts of Google researchers, the company now has not only industry-leading models in generating quality, but also further innovation based on these models.

Source of Jeff Dean images: Google's innovations include the "Super Resolution Video sequence Diffusion Model", which also extends AI's "Vincent Picture" to "Vincent Video" and still ensures ultra-high definition.

And AudioLM, which does not require training in words and musical symbols, can continue to generate audio models that fill music just by listening to audio samples.

From generating text to generating code, audio, pictures, video and 3D models, Google seems to be proving that the power of AIGC technology is far from reaching its limits and that it has plenty of opportunities to use it.

Next, let's take a good look at what Google has done this time.

AI writing assistant, loved by Liu Yukun. To tell you the truth, Silicon people are a little worried about losing their jobs when they see Google make an AI writing tool.

But after learning more about the tool, the mood turned more into relief.

We have been emphasizing the technical background of the "big language model" behind AI. And LaMDA Wordcraft launched by Google is such a technology that takes the core functions of the language model to the extreme.

LaMDA Wordcraft is a writing assistance tool based on the LaMDA language model, which is the collaboration of the Google brain team, the PAIR (People + AI Research) team, and the Magenta audio generation project team.

Its function is to generate new ideas according to existing texts or to help rewrite existing sentences in the process of writing, so as to help creators break through the "creative bottleneck" (writer's block).

Wordcraft user interface photo source: the design function of Google ResearchLaMDA itself is simple: given a word, predict what the next word is most likely to be, can also be understood as cloze or sentence completion.

But interestingly, because LaMDA's model size and training data are so large (from texts across the Internet), it has acquired a "subconscious" ability to learn many higher-level concepts from the language-and it is these high-level concepts that can be of great help to the creator's workflow.

Google has designed a number of different features in the Wordcraft user interface that allow creators to adjust the style of generated text. "We like to compare Wordcraft to a 'magical text editor', which looks like a familiar web editor but integrates a series of powerful LaMDA-driven features behind it," Google wrote.

You can use Wordcraft to rewrite sentences, or you can ask him to adjust your original text to be "to be funnier" or "to be more melancholy" a little bit.

Over the past year, Google has held a collaborative project called the Wordcraft Writers Seminar, bringing together 13 professional writers and text creators for a long and in-depth collaboration to create short stories with the help of Wordcraft editors in their own creative process.

It is worth noting that Liu Yukun, a well-known science fiction writer (author of the novel behind the hit drama Pantheon and translator of the English version of the three-body), is also involved in the project.

In the process of writing, he came across a scene that needed to describe the various items placed in the store-such writing details were easy to disrupt writing ideas in the past, but Liu Yukun could generate a list directly with the help of Wordcraft, saving his brain capacity and focusing on what is more important to the story.

Photo: Sina Weibo while in another scene, he finds his imagination limited and repeats familiar concepts all the time. So he gave the "initiative" to LaMDA and let it start, "so I could force me to explore possibilities I had never thought of and find new inspiration for writing."

You can find the short story Evaluative Soliloquies written by Liu Yukun with the help of Wordcraft on the official page of Wordcraft Writers Workshop (read the original button). By the way, he also borrowed Imagen to generate several illustrations for the novel:

Photo source: the generation of ultra-long and consistent video on Emily Reif via Imagen has finally been breached? Everyone should be no stranger to AI text to generate pictures. In the last year, well-known products such as DALL E 2, Midjourney, Stable Diffusion and Craiyon (regardless of whether they are first or later) have come out. Google also has its own AI text-to-picture model, and there are two: Imagen (based on the big prophecy model and the industry's popular diffusion model), and Parti (based on Google's own Pathways framework).

Photo Source: Google Research although the excitement of AIGC this year has been robbed by fried chicken such as Stable Diffusion, the low-key and calm Google has not lay flat.

While others seem "phased" to be content with making small pictures with text prompts, Google is already speeding up: it has entered "text-generated high-resolution video" earlier than anyone else, a complex technology that has never been explored.

"it's a very difficult job to generate high-resolution, time-coherent video," said Douglas Eck, senior research director at Google Research.

"fortunately, we recently have two studies, Imagen Video and Phenaki, that can solve the problem of video generation."

Picture source: Google Research you can understand it this way: text to picture is based on a text prompt to generate one (or more parallel pictures), while Imagen Video and Phenaki can generate multiple photos that are consistent in time sequence based on multiple text prompts-that is, video.

Specifically, Imagen Video is a diffusion model that generates images through text, which can generate high-definition pictures with unprecedented realism; at the same time, because it is based on a large-scale language model based on Transformer technology, it also has a strong ability of language understanding.

On the other hand, Phenaki generates video entirely through the large language model and constantly generating token in the timing. Its advantage is that it can generate extremely long (minutes) video, and the picture is more logical and visually coherent.

"to be honest, I didn't do this project, but I think it's really amazing." The most powerful thing about the technology, says Eck, is that it can use a sequence of text prompts to generate ultra-high-definition video, bringing a whole new ability to tell stories.

"of course, AI video generation technology is still in its infancy, and we look forward to working with more film and television professionals and visual content creators to see how they will use this technology."

Douglas Eck Picture Source: Google has no reference audio generation in the early years when OpenAI released the original GPT model, the title of the paper is classic: "Language models are few-shot learners", which points out that large language models can show powerful capabilities in a variety of natural language processing tasks on the basis of a very small number of samples-at the same time, this title predicts that more powerful big language models will be able to do more and more powerful things in the future.

Today, the AudioLM audio-only model shown by Google validates this prediction.

Image source: Google ResearchAudioLM is a "long-term consistency" high-quality audio generation framework, without any text or music symbol representation, only in a very short (three or four seconds) of audio samples based on training, you can generate natural, coherent, real audio results, and not limited to voice or music.

The sentences generated by AudioLM still maintain high syntactic and semantic credibility and coherence, and even continue the tone of the speaker in the sample.

To make matters worse, the model was initially not trained with any music data, but the result was amazing: the fact that it could be automatically "continued" from the recording of any instrument or piece of music once again demonstrated the true power of the big language model.

The following audio is a piece of piano music for about 20 seconds. Listen and feel it first:

In fact, only the first 4 seconds are given hints to the model, and the latter are all "mended" by AudioLM himself. And only this 4-second audio sample, there is no "piano", "march" and other professional text tips as a supplement.

"you don't need to give it a whole piece of music to learn, just give it a short piece and he can start writing directly in the music space-any audio clip, whether it's music or voice." Eck said that this ability to generate audio without reference has long gone beyond the cognitive boundaries of people's ability to create AI.

Other AIGC technologies and products in addition to the above new technologies, Google has also announced AI content generation technology in other content formats.

For example, Google has also turned text into 3D models on the basis of two-dimensional pictures / videos. By combining Imagen with the latest neural radiation field (Neural Radiance Field) technology, Google has developed DreamFusion technology that can generate 3D models with high-fidelity appearance, depth and normal vectors based on existing text descriptions, supporting rendering under different lighting conditions.

Photo: DreamFusion: Text-to-3D using 2D Diffusion (dreamfusion3d.github.io) and AI Test Kitchen, Google's new app for the public at this year's I / O conference, will also be updated in the near future to add new features unlocked by LaMDA model innovation. For example, "Urban dreamer" (City Dreamer) to use text commands to build the main city, or "Wobble" to create wriggling cartoon characters and so on.

Users can download AI Test Kitchen in the app mall of the corresponding system, and go to Google website to apply for test qualification, which is very fast.

AI Test Kitchen supports iOS and Android systems Photo Source: Google and Apple "our advances in neural network architecture, machine learning algorithms and new hardware approaches to machine learning have helped AI solve important real-world problems for billions of people," Jeff Dean said.

"more progress is coming. What we share today is a hopeful vision for the future: AI is asking us to reimagine how technology can help."

This article comes from the official account of Wechat: Silicon Man (ID:guixingren123), author: du Chen, Editor: VickyXiao

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.