What a surprise! GPT-4V Illusion Challenge record: what should be wrong is right, but what should not be wrong is wrong. 04/14 Update SLTechnology News&Howtos

What a surprise! GPT-4V Illusion Challenge record: what should be wrong is right, but what should not be wrong is wrong.

2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

GPT-4V challenged the visual error map, and the result was "surprising".

A question like "which side of the color is brighter" is not done correctly:

Those who read the hidden information in the picture are also foolishly unable to see it. No matter how they ask, they say "no":

However, this kind of picture, in which human beings can definitely be wrong at first glance, succeeds in answering the question correctly:

And such a dislocation map, it is right but not completely right.

(GPT-4V can see directly that the helmet is on the man's thigh, not the woman, but it still indicates that there are two people in the picture, and the other is hiding behind the man wearing the helmet.)

After watching these, do you feel very fascinated?

The whole thing is "what should be right is wrong, and what should be wrong is right again".

The tester said:

Before the test, he thought that GPT-4V had no problem with this challenge, but it turned out to be like this.

Not only he, but also netizens do not understand that GPT-4V, as a "accurate" AI system, is supposed to be very intelligent, why do you still make the same illusion as human beings?!

So, what's going on here?

Here are more test cases from netizens to challenge the five delusions of GPT-4V.

First of all, it is the color illusion that is wrong every time.

(1) in addition to the first two small trees, there is also this:

Ask it which side of the green is brighter, and sure enough, it is still bright on the left and dark on the right.

(2) and this slightly more complicated one:

Both eyes are actually gray, but when GPT-4V is asked to describe the image, it answers that one is blue and the other is grayscale and cannot know the color.

(3) not to mention this one, it was directly fooled to death.

Of course, it's really hard, and most humans don't recognize that all balls are actually brown.

The second is the graph that produces the illusion of motion.

(1) it's a bit of a surprise. When we asked GPT-4V, "what did you see? describe the details," it made it clear that it was an illusion that would make people feel dizzy after looking at it for a long time, and that it was essentially some wavy lines.

(2) this one doesn't beat it either.

But the strange thing is to ask it how many colors there are in the picture, it can only recognize yellow and blue, but can't see black and white.

Then there is another kind of illusion that compares the plane.

(1) as shown at the beginning:

Ordinary human beings really say that they are ignorant, but GPT-4V is right.

But, don't worry! Someone took the tester's picture and asked the "own" GPT-4V to check it again, and it changed the answer.

But it's not over yet. The nesting doll operation appeared in the comment area, and someone took the picture of the conversation between the two men and asked GPT-4V again, guess what? It changed back.

Everyone is addicted to playing, and they are nesting dolls again and again. Fortunately, in the end, GPT-4V stuck to his own opinion.

Generally speaking, there is no problem with this illusion trap.

(2) We have also tested a length illusion question:

The result is so easy~.

Let's have another set of pictures looking for hidden information.

Unfortunately, this kind of problem is really easy for human beings, but GPT-4V can't handle it at all.

(1) if you look at this picture first, you can see the three capital letters "NYC" from a distance. But it describes a bunch of things that don't exist, which means that no hidden information has been found.

(2) if the door-to-door is a little obscure, it doesn't matter if you can't see it. But for this kind of graphic hiding, it is not good.

It describes only the little girl, and it doesn't help even if the tester asks it to "look far away and find nothing new".

However, if we zoom out this picture manually and throw it to it, it will work and see the skeleton.

Finally, there is a set of misplaced maps of the real world.

(1) except for the person shown at the beginning who rides a motorcycle, this kitten is "suspended" and it is right.

(2) this creepy picture is also OK.

(3) but this failed. In fact, there was a coincidence between a dog and a little baby, who recognized it as a puppy.

(4) as for this one, it didn't mention shoes at all and said something innocuous.

Why is this? So why did all of the above happen: some delusions are recognizable and others perform poorly?

First of all, for the picture of color illusion, netizens first think that it is the problem of prompts.

Just like the picture of the two small trees, when we ask it "which is brighter", it actually gives GPT-4V a hint or prejudice, and it will answer according to our prejudice.

The same is true of our own tests:

But if we ask without a position: are the two colors the same in the picture? It's totally fine.

However, some netizens pointed out that when we asked it which tree was brighter, if it was very strict to average all the pixels, there was nothing wrong with GPT-4V 's answer.

Some netizens even tested it with a color meter:

But! It has also been pointed out that if only part of the display is shown, the two are obviously the same.

To put aside the debate on this issue for the time being, it is certain that the use of the word "prompt" will have an impact on its judgment.

In addition, netizens found that:

If we ask GPT-4V and ask it to double-check, it can also correct the answer.

As for the problem of being unable to recognize long-range images, some netizens think that this may be because GPT-4V only reads images from left to right.

As for the question of "why it is sometimes as confused as human beings and misled by delusions, and does not look like an intelligent AI at all," many people say that this is not surprising and that it is a matter of training.

That is, the large model is trained according to human data, human feedback and human comments, which will naturally produce the same mistakes as human beings.

As a result, there is also a joke:

It seems that we humans have created so many science fiction works that describe how cold and perfect AI is, but when we really have it now, we find it no more than that.

(manual dog head)

How do you think GPT-4V can recognize delusions better?

One More Thing is worth mentioning that we also tested some of these cases.

It is found that the performance of GPT-4V is different, and some questions are OK in "We are here".

For example, this one determines the color of the ball:

And this:

Although the picture is regarded as an old woman rather than a skeleton, it shows that it can be seen from a distance.

Reference link:

[1] https://twitter.com/fabianstelzer/status/1717131235644875024

[2] https://twitter.com/BeyondTodAI/status/1713279431681118557

[3] https://twitter.com/janbobrowicz/status/1717229335076393350

This article is from the official account of Wechat: quantum bit (ID:QbitAI), author: Fengcai

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.