In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Google's new AI model directly solves the two problems of AI dressing-- preserving the details of clothes and changing positions at will. If you cut your hands later, I'm afraid it will be easier!
One-click change has been realized by Google!
This AI fitting model TryOnDiffusion, you only need to give it a full-length photo of yourself, and a picture of the clothing model, you will know what you look like after wearing this dress.
The main fight is a reality. So, it's a real-life version of Miracle Nikki, isn't it?
Generally speaking, there have long been a lot of AI changes, so what is the breakthrough in Google's AI model?
Project address: the key to https://tryondiffusion.github.io/ is that they propose a diffusion-based framework that unifies the two Parallel-Unet.
In the past, the key challenge of this model was how to retain the details of the clothes and deform them, while at the same time adapting to the posture and shape of different subjects.
The previous method can not do both at the same time, either can only retain the details of the clothes, but can not deal with the posture and shape changes, or can change the posture, but the clothing details will be missing.
Because TryOnDiffusion unifies the two UNet, it can retain the details of the clothes in a single network, and make important posture and body changes to the clothes.
As you can see, the deformation of the clothes on the characters is extremely natural, and the details of the clothes are also restored to a very good place.
Needless to say, let's take a look at how powerful Google's "AI try on" is.
Use AI to generate try-on images. Specifically, Virtual Try-On (VTO) can show customers the effect of clothes on real models of different sizes and sizes.
In trying on virtual clothes, there are many subtle but vital details, such as the effect of falling, folding, clinging, stretching and wrinkling.
Previous technologies, such as geometric warping (geometric deformation), can cut and paste clothing images and then deform them to fit the contours of the body.
But these functions, it is difficult for clothes to adapt to the body properly, and there will be some visual defects, such as misplaced folds, which will make clothes look deformed and unnatural.
As a result, Google researchers are committed to generating every pixel of clothing from scratch to produce high-quality, lifelike images.
The technology they adopted is a new Diffusion-based AI model, TryOnDiffusion.
Diffusion is to gradually add extra pixels (or "noise") to the image until it becomes unrecognizable, and then completely eliminate the noise until the original image is reconstructed with perfect quality.
Text-to-image models like Imagen use diffusion plus text from the large language model LLM, which can generate realistic images based only on the input text.
Diffusion gradually adds extra pixels (or "noise") to the image until it becomes unrecognizable, and then completely eliminates the noise until the original image is reconstructed with perfect quality. In TryOnDiffusion, instead of using words, you use a pair of pictures: a picture of clothes (or a model in clothes) and a picture of a model.
Each picture is sent to its own neural network (U-net) and shares information with each other through a process called "cross attention" to output realistic images of the new model wearing the dress.
This combination of image-based Diffusion and cross-attention technology constitutes the core of this AI model.
The VOT feature allows users to render and show the effect of a coat on a model that matches their figure. Massive and high-quality data training Google has done a lot of training on the AI model in order to make the VTO function as real as possible and really help users choose clothes.
However, instead of using a large language model to train it, Google uses Google's shopping maps.
This data set contains the most comprehensive and up-to-date product, seller, brand, review and inventory data in the world.
Google uses a number of pairs of image training models, each consisting of two different poses of models dressed in clothes.
For example, an image of a man in a shirt standing sideways and another standing forward.
Google's specialized diffusion model inputs images into their own neural network (U-net) to generate output: realistic images of models wearing the dress. In this pair of training images, the model learns to match the shirt shape of the side position with the picture of the forward pose.
And vice versa, until it can generate a realistic image of the person wearing a shirt from all angles.
In pursuit of better results, Google repeated the process many times using random images of millions of different clothes and characters.
The result is the effect of the picture we showed at the beginning of the article.
In short, TryOnDiffusion not only retains the details of the clothes, but also adapts to the figure and posture of the new model. Google's technology has achieved both, and the effect is quite lifelike.
Technical details under the condition of one picture showing the body of one model and another showing another model wearing a certain piece of clothing, TryOnDiffusion's goal is to generate a specific visual effect that shows how the dress might appear on that person.
The key difficulty to solve this problem is to keep the details of the clothing lifelike and at the same time properly deform the clothing to adapt to the changes in posture and shape between different models.
Previous methods either focused on preserving clothing details, but could not effectively deal with changes in posture and shape.
Either it is allowed to show the fitting effect according to the desired size and posture, but there is a lack of details of the clothing.
Google has proposed a Diffusion-based architecture that combines two UNet (called Parallel-UNet) into one, allowing Google to retain clothing details and make significant posture and body changes to the effect of trying on clothing in a single network.
The key ideas of Parallel-UNet include:
1) implicitly make wrinkles for clothing through cross-attention mechanism
2) the fold of clothing and the fusion of characters are regarded as a unified process, rather than a sequence of two independent tasks.
The experimental results show that TryOnDiffusion has reached the most advanced performance level both qualitatively and quantitatively.
The specific implementation is shown in the following figure.
In the preprocessing step, the target character is segmented from the character image, the "no clothing RGB" image is created, the target clothing is segmented from the clothing image, and the pose is calculated for the character and clothing image.
This information input is brought into 128 × 128 Parallel-UNet (the key step) to create a try-on image of the 128x128, which is further sent as input to 256 × 256 Parallel-UNet together with the input of the try-on condition.
The output of 256 × 256 Parallel-UNet is then sent to the standard super resolution diffusion (super resolution diffusion) to create a 1024 × 1024 image.
The framework and processing of 128 × 128 Parallel-UNet, which is the most important in the whole process above, is shown in the following figure.
Input clothing-independent RGB and noise images into the person-UNet at the top.
Since both inputs are pixel-aligned, the two images are connected directly along the channel dimension (channel demension) at the beginning of UNet processing.
Since both inputs are pixel-aligned, we connect them directly along the channel dimension at the beginning of the UNet processing.
Input the segmented clothing image into the garment-UNet at the bottom.
The features of clothing are fused into the target image by cross attention (cross attention).
To save the model parameters, Google researchers stopped garment-UNet ahead of time after sampling (Upsampling) on 32 × 32, when the final cross-attention module (final cross attention module) in person-UNet had been completed.
The posture of the person and the clothes are first sent into the linear layer to calculate the posture embedding respectively.
Then the posture embedding is integrated into the person-UNet through the attention mechanism.
In addition, they are used to use FiLM to modulate the characteristics of two UNet at all scales.
Comparison with mainstream technology
User research: for each group of images entered, 15 ordinary users chose one of the four alternative technologies they thought was the best, or chose "indistinguishable". TryOnDiffusion has significantly outperformed other technologies. The following figure is "input, TryOnGAN,SDAFN,HR-VITON, Google method" from left to right.
Limitations but TryOnDiffusion has some limitations.
First of all, in the process of preprocessing, if there are errors in the segmentation map and pose estimation, Google's method may have defects in clothing leakage.
Fortunately, the accuracy in this area has greatly improved in recent years, and this does not happen very often.
Second, it is not ideal to display the body without the RGB of clothing, because sometimes it may only retain part of the identity.
For example, tattoos will not be visible in this case, and some muscle structures will not be visible.
Third, our training and test data sets usually have a clean and uniform background, so it is impossible to determine how the method will perform in a more complex context.
Fourth, we can not guarantee whether the clothes really fit on the model, we only focus on the visual effects of the fitting.
Finally, this study focuses on the upper body clothing, Google has not yet tested the effect of the whole body, and will further study the whole body effect in the future.
Reference:
Https://blog.google/products/shopping/virtual-try-on-google-generative-ai/?continueFlag=3bff5717caf44179385521a75a571d04
This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.