Imagen 2, the pinnacle of Google Life Picture, made its debut, beating DALL ·E 3 and Midjourney violently. 04/13 Update SLTechnology News&Howtos

Imagen 2, the pinnacle of Google Life Picture, made its debut, beating DALL ·E 3 and Midjourney violently.

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

Xin Zhiyuan reports

Editors: editorial department

[guide to Xin Zhiyuan] it's crazy. Google has just released Imagen 2, the pinnacle of the AI model. The measured results are realistic and exquisite, and the generated beauty pictures look like real-life photos, and the degree of reduction of prompts has beaten DALL ·E 3 and Midjourney! The strongest Wensheng picture big model is going to change ownership?

Question: the following picture, is it an AI picture or a picture?

If it were not for this question, most people would not have thought that this is not a picture.

Yes, just enter a prompt like this in Imagen 2, Google's latest AI artifact--

A shot of a 32-year-old female, up and coming conservationist in a jungle; athleticwith short, curly hair and a warm smile

A 32-year-old female conservationist is exploring the jungle. She has a strong physique, short curly hair and a kind smile

You can get the most realistic image at the beginning, which is more like a picture than a photo.

Although Christmas is approaching, Google is still rolling up-- Imagen 2, the text model that claims to be the strongest competitor for DALL E 3, has finally hit the market.

As soon as I finished GPT-4 with Gemini and OpenAI, I immediately released Imagen 2 to roll DALL E 3, and Google deserved the title of "roll king" at the end of 2023.

Not only the fingers are lifelike, but also the posture of holding chopsticks is very standard. It can be said that Imagen 2 is the pinnacle of text-to-image technology, which has broken through the boundaries of AI pictures.

With the powerful function of machine learning algorithm, Imagen 2 can transform text description into vivid and clear high-resolution image.

The most unusual thing about Imagen 2 is that it can understand complex and abstract concepts with amazing accuracy, and then visualize them with amazing delicacy!

The core of Imagen 2 is the complex neural network architecture. The fine-tuned Transformer model shows unparalleled performance in text understanding and image synthesis.

Now, in the field of literary graphics, Google has set a new benchmark.

Now, in addition to DALL E3, we have a model that can generate graphs only by natural language.

By contrast, Midjourney, which has to use complex, professional prompts, has been left far behind by two competitors in terms of ease of use.

A variety of complex images can exist only with simple text, and this kind of AI bioglyph model has a far-reaching impact on content creation.

For industries that rely on visual content, this completely changes the rules of the game and greatly reduces the time required for traditional content production, and content creators can produce high-quality visual effects at an unprecedented speed.

At the same time, Imagen 2 also has incomparable image quality and versatility.

Imagen 2 uses Google's state-of-the-art text-to-image diffusion technology, which is of high quality, realistic, and highly consistent with user prompts.

The reason is that it uses the natural distribution of training data to generate more realistic images, rather than in a pre-programmed style.

A jellyfish on a dark blue background jellyfish float leisurely against a dark blue background

As you can see, the image generation ability of Imagen 2 is amazing.

Whether it is rendering intricate scenery, detailed objects, or fantasy scenes, the resulting images have such a high fidelity that they can be comparable to or even directly surpassed the images created by human artists.

Small canvas oil painting of an orange on a chopping board. Light is passing throughorange segments, casting an orange light across part of the chopping board. There is a blueand white cloth in the background. Caustics, bounce light, expressive brush strokes, a small oil painting depicting oranges on the chopping block. The sun shines through the slices of the orange, and the soft orange light sprinkles on the chopping block. The background of the painting is a piece of blue and white cloth, which skillfully captures the refraction and reflection of light, and at the same time shows the painter's emotional brush strokes. Some netizens said that seeing this orange picture of Imagen really surprised me. The projection of the light after passing through the orange is very consistent with the artistic conception described in the prompt!

Someone used the same hint to let DALL ·E 3 generate the same orange oil painting, and the effect was indeed much weaker than that of Imagen 3.

Similarly, the oranges produced by Midjourney are also worse in terms of realism and artistic conception.

The artistic conception in the poem, one-click realistic restoration of the previous "text-to-image" model, usually according to the training data set of the image and the title of the detailed information, to generate images that match the user prompts.

But they have a bug: for each image and paired title, there can be significant differences in detail quality and accuracy.

To help create higher quality and more accurate images and better conform to user prompts, Imagen 2's training dataset adds more descriptions to help Imagen 2 learn different title styles and better understand a wide range of user prompts.

This kind of image title pairing helps Imagen 2 better understand the relationship between images and text, and greatly improves its understanding of context and nuances.

For example, the American writer Phillis Wheatley's Evening Hymn has a phrase, "Brooks are babbling, birds are chirping, and their mixed music is floating in the air."

In the beautiful artistic conception of the poem, Imagen 2 grasps all the main points.

"Soft purl the streams, the birds renew their notes, And through the air their mingledmusic floats." (A Hymn to the Evening by Phillis Wheatley) by contrast, Midjourney seems to lack a grasp of the content of the literary description, and there is a good chance that a character will be automatically added to the picture. However, the overall picture effect is still good.

When it comes to DALL E 3, it actually adds a few lines to the image to generate a "greeting card"?

In the famous novel Moby Dick, Herman Melville wrote that "imagine the subtleties of the sea. The most frightening thing is how creatures glide underwater, but in most cases they are imperceptible and treacherously hidden in the cutest blue hue."

Imagen 2 also understands the characteristics of "marine literature" very well.

Consider the subtleness of the sea, how its most dreaded creatures glide underwater, unapparent for the most part, and treacherously hidden beneath the loveliest tints ofazure. (Moby-Dick by Herman Melville) by contrast, Midjourney and DALL ·E 3 rose to Cesuru as soon as they reached the deep sea.

Midjourney

In the Secret Garden, written by Frances Hodgson Burnett, a master of DALL E 3 children's literature, there is a description of the robin:

The robin flew from the tangled ivy to the wall, opened its mouth and sang a loud and sweet vibrato, just to show off. Nothing in the world is more adorable than it-they almost always do.

Look, this painting produced by Imagen 2 shows all the hidden details such as ivy, wall, singing and so on.

"The robin flew from his swinging spray of ivy on to the top of the wall and he openedhis beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite asadorably lovely asa robin when he shows off-and they are nearly always doing it." (TheSecret Garden by Frances Hodgson Burnett) for the same prompt, Midjourney is a little less realistic than it is.

DALL E 3 is even worse than the above two, especially in the details of plants and feathers.

Style reproduction, random change, a better understanding of human aesthetics, image generation has always been one of the criticized problems, is the character's finger generation.

This time, Imagen 2's datasets and models have improved, and improvements have been made in many areas.

These include rendering realistic hands and faces, as well as visual artifacts that keep the image undisturbed.

At the same time, Google DeepMind has trained a special "image aesthetic model" based on human preferences for light, view, exposure, clarity, and so on.

Each image is given an aesthetic score, which helps adjust Imagen 2 to give more weight to human-preferred images in its training data set.

As a result, Imagen 2's ability to generate higher quality images is improved.

Using images generated by AI prompting "flowers", the diffusion technique of aesthetic scores from low (left) to high (right) Imagen 2 provides a high degree of flexibility, making it easier to control and adjust the style of the image.

By providing reference style images and combining text prompts, Imagen 2 can be trained to generate new images that follow the same style.

By using reference images and text prompts, Imagen 2 can more easily control the output style of "repair" and "outpainting" in addition, Imagen 2 also supports image editing features, such as "repair" (repair) and "expansion" (outpainting).

By providing a reference image and an image mask, we can use inpainting technology to generate new content directly in the original image.

In the original picture below, just enter "there is a shelf on the green wall with several books and vases on the shelf", and the corresponding content will be generated in the original picture!

The new content is unobtrusive and perfectly integrated into the original picture.

In addition, we can also use the outpainting function to enlarge the original image.

At sunset, the double head stickers of giraffes and zebras on the African prairie suddenly expanded into full-length photos.

Comprehensive endorsement of enterprise-level scenarios, logo copy generated with one click, and also supported in Chinese. Now Google has devolved Imagen 2 to the developer platform Vertex AI.

On the Vertex AI platform, customers can use intuitive tools to customize and deploy Imagen 2, enjoying a fully managed infrastructure and built-in privacy and security protection.

With the help of Google DeepMind technology, Imagen 2 has achieved a significant improvement in image quality, helping developers create images based on specific needs, including:

-generate high-quality, realistic, high-resolution and beautiful images according to the prompts of natural language

-support multilingual text rendering and accurately add text content to the image

-you can design the Logo of a company or product and embed it in the image

-provides visual question answering capabilities that can generate annotations from the image or give informative text answers to questions raised in the image details.

High-quality images: with improved image and text understanding, as well as a variety of innovative training and modeling techniques, Imagen 2 is able to produce accurate, high-quality and realistic images.

Text rendering support: you can accurately render the correct text according to the prompt content.

Imagen 2 can ensure that the output image contains the correct phrase when generating an image of an object that contains specific words or phrases.

Logo design: Imagen 2 can generate a variety of creative and realistic Logo for brands, products, etc., such as badges, letters and even very abstract Logo.

Tagging and Q & A: with enhanced image understanding, Imagen 2 can create detailed long text annotations and give detailed answers to the questions posed by the elements in the image.

Multilingual hint: besides English, Imagen 2 also supports six other languages (Chinese, Hindi, Japanese, Korean, Portuguese, Spanish), and plans to add more languages in early 2024. This feature also includes the ability to translate between prompts and outputs, for example, you can prompt in Spanish, but specify the output in Portuguese.

Image watermarking, generation more secure to help reduce the potential risks and challenges of text-to-image generation technology, Google has a strong guardrail from design and development to product deployment.

Imagen 2 integrates SynthID-- 's cutting-edge toolkit for watermarking and identifying content generated by AI.

In this way, customers of the Google Cloud platform can add digital watermarks directly to the image without reducing the image quality.

However, SynthID can still detect it even after the image is filtered, cropped, or saved using a lossy compression scheme.

In addition, Google will conduct strong security tests to minimize the risk of injury before launching it to all users.

From the beginning, the Google team has devoted data security training to Imagen 2 and added technical guardrails to limit problematic output, such as violent, offensive or pornographic content.

At the same time, Google also conducts security checks on training data, input tips and system-generated output. For example, comprehensive security filters are being applied to avoid potentially problematic content such as celebrity images.

Netizens exclaimed: really the strongest Wen Sheng diagram model is coming! Oriol Vinyals, vice president of research and director of deep learning at Google DeepMind, tried to use Imagen 2 to generate logos for Gemini.

Another Google scientist uses Imagen 2 to generate the following image.

The following is a blue cat generated by a netizen.

Some netizens think that Imagen 2 is the best of its kind. Just like Gemini Ultra, reading hands and words is enough.

However, he also complained that Google does not open its products to everyone.

"as usual, Google has announced a product that most people can't use. What's the point?! "

Reference:

Https://deepmind.google/technologies/imagen-2/

Https://cloud.google.com/blog/products/ai-machine-learning/imagen-2-on-vertex-ai-is-now-generally-available

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.