Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Midjourney is in danger. Stable Diffusion-XL starts a public test: can draw hands, write, and no longer have to write long prompt.

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Open source, free Stable Diffusion can reach the Midjourney level!

Since the release of Midjourney v5, there has been a significant improvement in the authenticity of characters and finger details of the generated images, and progress has been made in the accuracy of prompt understanding, aesthetic diversity and language understanding.

By contrast, although Stable Diffusion is free and open source, it has to write a long list of prompt every time, and you need to draw cards many times to generate high-quality images.

Recently, Stability AI's official announced that the Stable Diffusion XL under development has been tested for the public and is now available for free trial on the Clipdrop platform.

Trial link: Emad Mostaque, founder and CEO of https://clipdrop.co/ stable-diffusionStability AI, says the model is still in training and will be open source when the parameters are stable; SD-XL will perform better in image details such as "handshake" and is almost completely controllable.

Stable Diffusion XL is not the name of the final release, nor is it v3, because the architecture of SD-XL is very similar to the model architecture of the SD-v2 series.

Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient.

Simple family gym, rubber floor, wall-mounted TV, weightlifting stool, medicine ball, dumbbell, yoga mat, high-tech equipment, high detail, organization and efficiency

The following sample pictures officially released by SD-XL show that the quality of the image is already very good.

But sometimes less does not represent more, some netizens think that in order to get rid of "bad taste", SD-XL has set too many rules, and the space for customization is getting smaller and smaller, which is not in line with the preferences of most people. Stable Diffusion v1.5 is still the most popular pedestal model in the community.

Netizens expressed the hope that the new version of SD will be compatible with the embedded, hypernetworkds and Lora models of SD 2.1. it would be too hard to retrain from scratch.

Some netizens also believe that the performance of SD-XL is similar to the model shared by netizens on civit, and the effect of the new model is not particularly amazing, that is, the average.

SD-XL: the open source version of Midjourney has not revealed much specific information about the Stable Diffusion XL model, except that it is similar to the v2 model in architecture, but with a larger number of parameters.

SD-v2.1 includes 900 million parameters, SD-XL has about 2.3 billion parameters, and Emad says the official version may release a smaller distillation version.

SD-XL 's improvements over previous versions are as follows:

High quality images can be generated using a short descriptive prompt

Images that are more suitable for prompt can be generated.

The human body structure in the image is more reasonable.

Compared with v2.1 and v1.5 (to a lesser extent), the images generated by SD-XL are more in line with public aesthetics.

Negative prompts (negative prompt) are optional

The resulting portrait is more realistic.

The text in the image is clearer

It is important to note that SD-XL may not be compatible with previous versions of plug-ins.

Clearly readable text does not have the ability to generate readable text in pictures in v1 series and v2.1 versions of the Stable Diffusion model.

Although the text information generated by SD-XL is not always accurate, it has been greatly improved.

Photo of a woman sitting in a restaurant holding a menu that says "Menu"

A woman was sitting in a restaurant with a menu with "Menu" on it.

Photo of a man holding a sign that says "Stable Diffusion"

A man held a sign that read "Stable Diffusion".

A young female holding a sign that says "Stable Diffusion", highlights in hair, sitting outside restaurant, brown eyes, wearing a dress, side light

A young woman held up a sign that read "Stable Diffusion", with bright hair, sitting outside the restaurant, brown eyes, wearing a skirt and side lights.

Better human structure Stable Diffusion has always had many problems in generating human anatomy. It is all too common to have more legs and fewer arms. It is usually necessary to use the inpaint function to further modify the image details, or to use the Open Pose function of ControlNet to copy the human posture from the reference image.

For example, when SD-v1.5 generates images of yoga, there are often distorted human bodies.

Photo of a woman in yoga outfit, triangle pose, beach in evening, rim lighting

A picture of a woman in yoga costumes, triangular poses, beaches at night, edge lighting

Although the image generated by SD-XL is not perfect, it has made remarkable progress in terms of human posture.

More aesthetic (more aesthetic) such as the same room as the theme, SD-XL can produce more symmetrical, better visual effects of the photo.

SD-XL has also made significant improvements in portrait photos.

Photo shot of a woman

A picture of a woman.

The image SD-XL which is more suitable to the prompt can better understand the input prompt and produce a more accurate image.

For example, take duotone (two-color) as an example, SD-v1.5 will only generate black-and-white images, while SD-XL can generate two-tone images with multiple colors.

Compared with the v1 model, the ability to understand prompts is improved.

Duotone portrait of a woman

A two-tone portrait of a woman

Because SD-XL also belongs to the v2 series model, the text model is larger in size and can understand prompts better than the v1 model.

For example, in the following example, the v1.5 model never understands the two themes in the image (robot and human), but the SD-XL model can produce a normal image (although the robot is still not big enough).

Big robot friend sitting next to a human, ghost in the shell style, anime wallpaper

Big robot friends sit next to humans. Mobile team style anime wallpaper.

A young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background

A young man with brightly dyed hair, brown eyes, wearing a white shirt and blue jeans, stood on the beach with a volcano in the background.

Artistic style SD-XL has no significant improvement in artistic style, which has its own advantages from the previous version.

For example, the two models generate Edward Hopper-style images from different angles.

New York city by Edward Hopper

New York drawn by Edward Hopper

In Leonid Afmov's style, SD-v1.5 is more accurate, and SD-XL lacks different color brushes (unmistakable colorful board brushstrokes).

New York city by Leonid Afremov

New York drawn by Leonid Afemov

In the William-Adolphe Bouguereau style, both V1.5 and SDXL can generate some similar content, of which SD-XL is closer to the classic academic paintings created by Bouguereau and has more facial details.

Portrait of beautiful woman by William-Adolphe Bouguereau

Portraits of Beauty drawn by William-Adolphe Bouguereau

Style change problems after adding some unimportant keywords, the style of the model may suddenly change.

For example, Mr. A becomes a photo-style image.

A young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background

A young man with brightly dyed hair, brown eyes, wearing a white shirt and blue jeans, stood on the beach with a volcano in the background.

After adding a yellow scarf, the image style becomes cartoon style.

A young man, highlights in hair, brown eyes, wearing a yellow scarf, in white shirt and blue jean on a beach with a volcano in background

A young man, with brightly dyed hair, brown eyes, a yellow scarf, a white shirt and blue jeans, stood on a beach with a volcanic background

The failure of the problem may arise from the preview problem, and it is not known whether the problem can be resolved after the official release.

Reference:

Https://clipdrop.co/stable-diffusion

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report