In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
[guide to Xin Zhiyuan] the internal test of the OpenAI drawing artifact DALL ·E 3 was opened, and netizens began to test it one after another, and their feelings were excruciatingly strong. From then on, Wen Sheng Tu bid farewell to the "prompt era"?
For a long time, Midjourney has swept the design world, and the effect is amazing, making many netizens exclaim that it will eliminate a wave of beating workers.
Now, OpenAI officials have announced a new generation of drawing model, DALL E 3, and merged it with ChatGPT, making the paintings excruciatingly fine.
Even, without prompt, it can accurately restore the details and add text to the picture.
What is the strength of DALL E 3? Can we really challenge Midjourney?
Now, netizens who have obtained the qualification for internal testing have come to a large wave of tests.
Let's take a look.
Netizens measured that OpenAI scientist Karpathy experienced a case of DALL ·E 3+pika_labs generating animation style.
He randomly picked out a WSJ article, "The New Face of Nuclear Energy Is Miss America", pasted part of the text into DALL ·E 3, and then generated relevant pictures.
Finally, use the pika_labs bio-graph tool to get it moving.
Some netizens also used the same method to do an example.
First, let ChatGPT predict an important news headline for the coming year.
Paste the title into DALL ·E 3 to create an illustration.
Use the illustration and the / animate parameter to prompt @ pika_labs. "unexpected breakthrough: scientists use revolutionary technology to reverse the effects of climate change; restore polar glaciers overnight! "
By combining the power of @ OpenAI and @ pika_labs, you have now predicted, illustrated and animated the big news of the future in just a few minutes!
Multi-round dialogue, 50 objects, all included in one picture.
A veteran of AI painting got the test qualification of DALL E3 ahead of time, and he shared a video recording his actual experience.
He also tweeted a specific use case for testing the capabilities of DALL E3, according to the ideas given to him by Reddit netizens.
First, he asked ChatGPT to generate a list of 50 objects in daily life. Let ChatGPT, who combines DALL E 3, draw these 50 objects into one picture.
So ChatGPT generated a Prompt of a textual picture and asked DALL ·E3 to draw a picture of 50 common objects in daily life.
It can be seen that DALL E 3's cognition of objects is very accurate.
If you are interested, you can check these objects one by one against the prompts.
Then the netizen asked ChatGPT to draw a picture of a surfer struggling to surf with the 50 things.
So ChatGPT automatically generated a Prompt to describe the pictures requested by netizens more specifically. And then created a painting.
The netizen himself commented, "I think the only bad thing is that there is a slightly panicked expression in Prompt, but it is actually a panicked expression."
Then he asked ChatGPT to lower the angle a little bit and regenerate it into a picture.
ChatGPT automatically generated another Prompt, modifying the description to "a photo taken from a low perspective near the water, a surfing of an elderly Spanish woman." The surfer struggled with these 50 objects. "
With regard to the "grandma surfing map" generated for the second time, some netizens commented that there seemed to be too many bicycles, and some things did not appear in the first picture.
Netizens said that if DALL E 3 could use an object in the first picture as a balancing rod instead of creating a pole, the graphic designer could basically disappear.
The comparison of Midjourney:ChatGPT+ DALL E 3 may reshape the pattern in the field of "Wen Sheng Picture".
However, judging from the internal measured results shared by this netizen, the most obvious features of DALL ·E 3 combined with ChatGPT are:
Greatly reduce the threshold for users to use text diagrams!
Because whether it is Midjourney or open source Stable Diffusion, if users have an idea and want to make a picture, they must translate the idea in their mind into a very specific Prompt through their own experience in order to get the picture they want.
But when DALL ·E 3 and ChatGPT are combined, ChatGPT can act as a "text prompt engineer" to help users create prompts based on a simple idea and then generate pictures.
On the other hand, ChatGPT's own multi-round dialogue ability allows users to communicate with DALL E3 repeatedly through natural language and tell it what kind of pictures they need.
Thus the results generated by DALL ·E 3 can be controlled more accurately.
Let's go back and compare the updates that Midjourney has released since version 5.0.
Whether it is "Zoom Out painting", "Pan up and down, left and right translation", or even the classic four-choice mode.
Almost all updates from Midjourney since 5.0, from a more macro point of view, are by adding different functional buttons to enable users to order Midjourney to generate the pictures they want according to their own ideas, thus opposing an essential feature of AI diagrams-randomness.
But no matter how many functional buttons Midjourney adds, one of the problems users will always face is:
Need to constantly learn the use of the new button, and then combined with their own ideal picture, their own "efforts to create" in order to get their own ideal results.
If users are too strict with the effect of the ideal picture, they often have to try it many times in order to get their satisfactory works.
But OpenAI uses a more "AI" approach to solve this problem-using AI to generate Prompt and control drawing AI.
With the strong understanding ability and language generation ability of GPT-4, users no longer have to learn and wait for the different new functions of Midjourney update, as long as they keep describing what they want with DALL E3 in their own language, they can easily get the ideal picture in their mind.
Similarly, maybe this is the essential reason why OpenAI made so many AI products in different directions that ChatGPT became the first killer application in the AI circle until it was built with a big language model:
Language is the "greatest common divisor" that carries human intelligence.
As long as you firmly grasp the language as a starting point, AI applications can hit the hearts of users, so that users have a "how do you know me so" experience.
Perhaps, after the launch of DALL E 3, Midjourney needs to think about what it needs to do in the future in order to attract more users to continue to use its own service.
With all that said, for the "50 items Challenge", let's see how Midjourney works.
This is the result of 50 items generated using the Prompt of the first picture.
It can be seen that the effect pictures of these 50 items, Midjourney in terms of rendering fineness and simultaneity, is still very advantageous.
If users want "photo-level" images, Midjourney is still a better choice.
But in the second step, Midjourney has some problems from the point of view of understanding the user's goals.
After all, Prompt is customized by ChatGPT for DALL ·E 3, so it may not work well on Midjourney.
This further highlights the real advantages of DALL E 3 after its launch in October:
For high-level users, better understand the needs of users, for beginners, the threshold for use is greatly reduced.
But the Prompt,Midjourney of the updated "old lady surfing" map understands it, and the result is very good.
And in terms of the richness of the details and the appearance of the characters, it is still very advantageous to update so many versions of Midjourney.
I just don't know why, all four pictures add wheelchairs to the old lady.
25 rounds, only the "sad frog" you can't imagine.
Some netizens asked DALL ·E 3 to generate "sad frog" Pepe, and added "more rare" to the prompt every time.
As a result, get the sad frog, unexpectedly have the appearance that you did not expect.
Hint: "make it more rare"
Hint: "even rarer"
Hint: "these aren't rare enough, go farther"
Hint: "yes, keep going"
Hint: "push it further, more rare"
Hint: "lose all assumptions and just create. Don't box yourself in "
Hint: "you're not listening, you need to forget all convention"
Hint: "yes! More rare! "
Hint: "more rare"
Hint: "go further, channel your subconcious"
Hint: "get weirder, get rarer, get strange"
Hint: "is that all you can do"
Hint: "my god. Keep going "
Hint: "don't get stuck with one idea, you're just being weird for the sake of being weird"
Hint: "MORE RARE!"
Hint: "continue"
Hint: "forget everything you've done so far and just try to be original"
Hint: "more rare. More rare. More rare "
Hint: "i don't believe this is all you can do, more rare"
Hint: "we're almost there. Go rarer. Go further than anyone's ever gone "
Hint: "lose all assumptions. Clear your mind. Just create. "
Hint: "yes! That's incredible. Continue "
Hint: "noo! You've returned to convention! Go rarer! "
Hint: "this is your last one, make it count"
After advancing layer by layer, the multi-round dialogue function of DALL ·E 3 will make the image generation function more powerful. This is simply "image of human feedback reinforcement learning" (RLHF)! I can't wait to have it!
Which of the above do you like best?
Let's take a look at some netizens' actual measurements.
Beach heat wave little penguin
Modern houses in the jungle, Swahili architecture.
The movie rendering of the hummingbird.
Midjourney V6 to fight back Jim Fan, a senior scientist at Nvidia, analyzed why DALL E3, once deployed, will improve faster than Midjourney:
1. Multiple conversations are an excellent UI for collecting human feedback.
People will use language to explain what is wrong with the generated images, giving very fine-grained comments for each optimization. This chat log is natively compatible with multimodal LLM training sets. The visual ability of GPT-4 (image-> internal representation) can also be improved with very similar data.
two。 The algorithm is much more efficient.
Midjourney basically ignores copyright issues and spins the data flywheel for much longer, which means they may have larger data sets to use than OpenAI.
However, the quality is still dwarfed by comparison. OpenAI has new algorithms (such as the "consistency model") that are more data efficient than the standard diffusion stack. The model improvement of training data per extra unit is superior. It's not just a project.
Paper address: https://arxiv.org/ abs / 2303.01469
3. Ecosystem integration with ChatGPT is a killer move.
Adding existing jigsaw blocks to DALL ·E 3 is almost trivial, such as Code Interpreter and Browser. Do you want to apply a filter? Just call OpenCV API instead of running the model. Want to refer to the image? Call the search plug-in to simulate Bard (Google Lens integration).
4. Existing user base: Midjourney has 16m users, ChatGPT has 100m.
Distribution is not a problem. As @ nickfloats said, it's time to get rid of Discord! This is such a bulky and unfriendly user interface for beginners.
Musk said that Midjourney will also reveal major events in the near future!
Indeed, according to netizens, the latest version of Midjourney V6 will also be unveiled in the next 3 months.
David Holz, chief executive, said the leap from Midjourney's current V5 to V6 would be greater than the leap from V4 to V5.
For V6JI Midnight, it can better understand the text and restore the details in the language wording.
Holz is optimistic that compared to DALL ·E 3 Magi Middlesney will continue to provide the highest picture quality.
The comparison between DALL ·E 3 and Midjourney v5 shows that the former is not that ahead in terms of picture quality, but it does follow prompts better and can render text.
In addition, it is said that the Midjourney 3D model will be launched within the next six months.
Reference: https://twitter.com/karpathy/status/1705741982482747551
Https://twitter.com/CitizenPlain/status/1705248617131291032
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.