Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

AI visual crossword puzzles go viral! Monroe turns 180 seconds into 176s, Einstein, Nvidia Senior AI scientist: the coolest diffusion model in recent times.

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)12/24 Report--

Marilyn Monroe painted by AI turned into Einstein after being reversed 180 degrees.

This is the spread model delusion picture that has become popular on social media recently. Give AI two different cues at random, and it can draw it for you!

Even a completely different object can, for example, a man, after anti-color processing, magically transformed into a woman:

Even words can be flipped to produce new effects. Happy and holiday are only in one rotation:

It turns out that this is a new study of "visual crossword puzzles" from the University of Michigan. As soon as the paper was published, it went viral on Hacker News, and the popularity soared to nearly 800.

Jim Fan, senior AI scientist at Nvidia, admires:

This is the coolest diffusion model I've seen recently!

Some netizens sighed:

This reminds me of the experience of working in fractal compression. I always thought it was pure art.

You know, to create a painting that presents a new theme after rotation, reverse color or deformation, it also requires the painter to have a certain understanding of color, shape and space.

Now even AI can draw such an effect, how on earth is it achieved? Is the actual effect so good?

We tried it out and explored the principle behind it.

Colab can try it directly. We used this model to draw a set of Lowpoly-style paintings to make it look like a mountain and, in turn, the skyline of the city.

At the same time, we asked ChatGPT (DALL ·E-3) to try to draw, and it didn't seem to have any advantage except for higher definition.

On the other hand, the effect shown by the author himself is richer and more wonderful.

A mountain after snow rotates 90 degrees to become a horse; a dining table becomes a waterfall at a different angle.

The best part is the following picture-the content in each direction is different from the top and bottom, left and right angles.

(here's a test for readers. Can you see what these four animals are? )

With the rabbit as the initial state, every 90 degrees counterclockwise rotation, you will see a bird, a giraffe and a teddy bear.

Although the following two pictures do not have "new content" in each of the four directions, they still make three different directions.

In addition to rotation, it can also cut the image into a jigsaw puzzle, then reassemble the new content, or even decompose it directly to the pixel level.

Style is also ever-changing, watercolor, oil painting, ink, line draft. You name it.

So where can I play with this model?

In order to let more netizens experience the new toy, the author prepared a Colab note.

However, the T4 of the free version of Colab is not very driving, and the V100 occasionally shows that the memory exceeds the limit, and it takes the A100 to run stably.

Even the author himself said that if anyone finds that the free version can be driven, please let him know immediately.

To get to the point, the first line of code runs and asks us to fill in the Hugging Face token and give the get address.

At the same time, you also need to agree to a user agreement on the project page of DeepFloyd before you can continue with the following steps.

After the preparation is complete, run the code of these three parts in turn to complete the deployment of the environment.

It should be noted that the author has not yet designed a graphical interface for the model, and the selection of effects and the modification of prompts require us to manually adjust the code.

The author puts three effects in his notes, uncomment whichever one he wants (remove the pound sign in front of that line), and delete or comment out the unused ones (plus the pound sign).

The three effects listed here are not all. If you want to replace the code manually with other effects, you can support these effects:

After the modification, run this line of code, and then the prompt will do the same:

After the modification and running, you can enter the generation link, where you can also modify the number of reasoning steps and guidance intensity.

It is important to note that you must first run the image_64 function to generate a small image, and then use the following image to turn it into a large image, otherwise an error will be reported.

To sum up, one of the feelings after our experience is that this model has relatively high requirements for prompts.

The author is also aware of this and gives some tips:

△ machine flip, for reference only

So how does the research team achieve these effects?

First of all, let's take a look at the key principle of the author's generation of visual illusion images.

In order to make the image show different picture effects according to different prompts from different perspectives, the author specially adopts the method of "noise averaging" to further combine the images of the two perspectives.

To put it simply, the core of the diffusion model (DDPM) is to "break and reassemble" the image by training the model, and generate a new image based on the "noise map":

Therefore, in order to make the image generate different images according to different prompts before and after transformation, it is necessary to change the denoising process of the diffusion model.

To put it simply, it is to "break" the original image and the transformed image to make a "noise map" by using the diffusion model at the same time, and average the processed results in this process to calculate a new "noise map".

Then, based on the new "noise map", the image can be transformed to show the desired visual effect.

Of course, the image processing process of this transformation must be orthogonal transformation, that is, the rotation, deformation, smashing and recombination or reverse color operations we see in the display effect.

Specific to the choice of diffusion model, there are also requirements.

Specifically, this paper uses DeepFloyd IF to achieve visual illusion image generation.

DeepFloyd IF is a pixel-based diffusion model. Compared with other diffusion models, it can operate directly on pixel space (rather than potential space or other intermediate representations).

It also makes it better to deal with the local information of the image, especially in generating low-resolution images.

In this way, the image can finally show the effect of visual illusion.

In order to evaluate the effectiveness of this method, the authors compiled a dataset of 50 image transform pairs based on GPT-3.5.

Specifically, they asked GPT-3.5 to randomly generate an image style (such as oil painting style, street art style), then randomly generate two sets of prompts (an old man and a snowy mountain) and give them to the model to generate a transformation painting.

This is the result of some random transformations:

Subsequently, they also tested the image generation between different models with CIFAR-10:

Then it is evaluated with CLIP, and the results show that the effect of the transformation is as good as that before the transformation:

The authors also tested how many image blocks the AI can withstand.

It turns out that from 8 × 8 to 64 × 64, the effect of smashing the reorganized image looks good:

For this series of image transformations, some netizens sighed that they were "impressed", especially the image transformation from men to women:

I've seen it about 10 times.

Some netizens already want to make it into a work of art and hang it on the wall, or use an e-ink screen:

But there are also professional photographers who believe that these images generated by AI are still not good at this stage:

If you look closely, you will find that the details do not stand up to scrutiny. The sharp eye can always tell the bad, but the public doesn't care about it.

So, what do you think of this series of visual illusion images generated by AI? Where else can it be used?

Reference link:

[1] https://news.ycombinator.com/item?id=38477259

[2] https://arxiv.org/pdf/2311.17919.pdf

[3] https://twitter.com/DrJimFan/status/1730253638935920738

This article is from the official account of Wechat: quantum bit (ID:QbitAI), author: Creasy Xiao Xiao

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report