Why does AI painting advance by leaps and bounds? From history to technological breakthrough, one article can understand the hot history of AI painting. 04/20 Update SLTechnology News&Howtos

Why does AI painting advance by leaps and bounds? From history to technological breakthrough, one article can understand the hot history of AI painting.

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Since the preface was accidentally impressed by the level of current AI painting (the transcendent artifact of AI painting, and its 234 armored beautiful future warriors), the author feels that the rapid progress of today's AI painting may have far exceeded everyone's expectations. The causes and consequences here, including the history of AI painting, and recent breakthroughs, are worth combing and sharing with everyone. Therefore, there is this article.

This article is divided into the following sections:

1, 2022, the AI painting of attack

2. The history of AI painting

3. Why does AI painting advance by leaps and bounds?

4. The competition of top AI painting model.

5. What does the breakthrough of AI painting mean to mankind?

2022, attacking AI painting this year, AI painting artifacts that automatically generate pictures by entering text descriptions have sprung up suddenly.

The first is Disco Diffusion.

Disco Diffusion is an AI image generator that became popular in early February this year. It can render the corresponding image according to the keywords that describe the scene:

In April this year, the famous artificial intelligence team OpenAI also released a new model DALL E 2, named after the famous painter Dal í and Robot Mobilization (Wall-E), which also supports generating good images from text descriptions.

And many readers began to pay special attention to AI painting, perhaps from the news of the following AI painting:

This is a digital oil painting generated by MidJourney, an AI painting service, and the user who generated it won the first prize in the art competition at the Colorado State Fair. This matter has triggered a huge debate on the Internet since it was exposed.

At present, the technology of AI painting is still changing and developing, and its iteration speed can be described as "changing with each passing day". Even if you compare the AI painting at the beginning of this year with what it is now, the effect is very different.

At the beginning of the year, you can generate some ambient sketches with Disco Diffusion, but you can't generate faces yet; just two months later, DALL-E 2 has been able to generate accurate facial features; now, the most powerful Stable Diffusion has changed by an order of magnitude in terms of refinement and speed.

The technology of AI painting is not new in recent years, but since the beginning of this year, the quality of AI's work has been improving at a rate visible to the naked eye, and its efficiency has been reduced from one hour at the beginning of the year to more than ten seconds now.

What happened behind this change? Let's first comprehensively review the history of AI painting, and then understand the breakthrough development of AI painting technology in the annals of history for more than a year.

The history of AI painting the emergence of AI painting may be earlier than many people think.

Computers were invented in the 1960s, and in the 1970s, an artist, Harold Cohen Harold Cohen, a painter and professor at the University of California, San Diego, began to create a computer program, "AARON." But unlike the current AI painting output digital works, AARON really controls a robotic arm to paint.

Harold's improvements to AARON continued for decades until his death. In the 1980s, ARRON "mastered" the drawing of three-dimensional objects; in the 1990s, AARON was able to paint in a variety of colors, and ARRON is said to be still creating to this day.

However, AARON's code is not open source, so the details of its painting are not known, but it can be guessed that ARRON only describes the author Harold's own understanding of painting in a complex programming way-which is why, after decades of learning and iteration, ARRON still can only produce colorful abstract style paintings, which is Harold Cohen's own abstract color painting style. Harold spent decades programmatically guiding the robotic arm to present his understanding and expression of art on canvas.

(left: ARRON and Harold. Cohen right: ARRON's works in 1992)

Although it is hard to say how intelligent AARON is, as the first program to paint automatically and actually paint on canvas, it is also in line with its identity to give it the title of ancestor of AI painting.

In 2006, there was a computer painting product like ARRON, The Painting Fool. It can observe photos, extract block color information from photos, and create using real-life painting materials such as paints, pastels, and pencils.

The above two examples can be regarded as more "classical" computer automatic painting, a bit like a toddler, a little bit, but quite elementary from an intelligent point of view.

Now, what we call the concept of "AI drawing" refers more to computer programs that draw automatically based on deep learning models. The development of this way of painting is actually relatively late.

In 2012, Google, two famous AI gods, Wu Enda and Jef Dean, conducted an unprecedented experiment, teaming up with 16000 CPU to train what was then the world's largest deep learning network to guide computers to draw cat faces. At that time, they used 10 million cat face pictures from youtube, 16000 CPU trained for three days, and finally got a model that was exciting to produce a very blurry cat face.

Today, it seems that the training efficiency and output of this model are not worth mentioning. But for the AI research field at that time, it was a breakthrough attempt and officially opened the "brand-new" research direction of AI painting supported by deep learning model.

Let's talk a little bit about the technical details here: just how troublesome is AI drawing based on deep learning models, and why the days of training in a large computer cluster that is already very modern in 2012 can only produce pitiful results?

Readers may have a basic concept that the training of deep learning model is simply the process of using a large number of external labeled training data input and repeatedly adjusting the internal parameters of the model to match according to the input and corresponding expected output.

Then the process of letting AI learn to paint is to build the training data of existing paintings and input AI model to adjust the parameters iteratively.

How much information does a picture contain? The first is the length x width of RGB pixels. The simplest starting point for computer painting is to get an AI model that outputs regular pixel combinations.

But the combination of RGB pixels is not all paintings, it may just be noise. A painting with rich texture and natural strokes has many strokes, involving the position, shape, color and other parameters of each stroke in the painting. The combination of parameters involved here is very large. The computational complexity of depth model training increases sharply with the increase of parameter input combination. You can understand why this matter is not simple.

After Wu Enda and Jeff Dean's groundbreaking cat face generation model, AI scientists began to devote themselves to this new challenging field. In 2014, AI academia proposed a very important deep learning model, which is the famous confrontation generation network GAN (Generative Adverserial Network, GAN).

As the name "antagonistic generation", the core idea of this deep learning model is to make two internal programs "generator" and "discriminator" compete with each other to get the result.

GAN model has been popular in AI academic circles as soon as it came out, and has been widely used in many fields. It then becomes the basic framework of many AI painting models, in which the generator is used to generate pictures, and the discriminator is used to judge the quality of pictures. The emergence of GAN has greatly promoted the development of AI painting.

However, using the basic GAN model for AI painting also has obvious defects, on the one hand, the control over the output is very weak, and it is easy to produce random images, while the output of AI artists should be stable. Another problem is the low resolution of the generated image.

The problem of resolution is fine. GAN still has a dead knot at the point of "creation", which is precisely its own core feature: according to the basic architecture of GAN The discriminator determines whether the generated image is of the same category as other images that have been provided to the discriminator, which determines that in the best case, the output image is an imitation of the existing work, rather than innovation.

In addition to fighting against the generation of web-based GAN, researchers are also using other kinds of deep learning models to try to teach AI painting.

A famous example is Deep Dream, an imaging tool released by Google in 2015. Deep Dream released a series of paintings, which attracted a lot of attention for a while. Google even planned an exhibition for this dreamy work.

But if you take it seriously, Deep Dream is not so much an AI painting as an advanced AI filter, its filter style can be understood by looking at the above works.

Compared with the popular Deep Dream, Google is more reliable than a model trained with thousands of hand-drawn simple strokes in 2017, and AI can draw some simple strokes through training. (Google, "A Neural Representation of Sketch Drawings")

One reason why this model has attracted so much attention is that Google has opened up the relevant source code, so third-party developers can develop interesting AI sketch applications based on the model. An online application called Draw Together with a Neural Network, AI can automatically help you complete the whole figure with a few strokes at random.

It is worth noting that in the research process of the AI painting model, the major Internet companies have become the main force. In addition to the above-mentioned research done by Google, the more famous is the new model obtained in July 2017 by Facebook in conjunction with Rutgers University and the Department of Art History of Charleston College, known as the Creative confrontation Network (CAN, Creative Adversarial Networks).

(Facebook, "CAN: Creative Adversarial Networks, Generating" Art "by Learning About Styles and Deviating from Style Norms")

As can be seen from the collection of works below, this creative online CAN is trying to output pictures that look like the work of an artist, which are unique rather than imitations of existing works of art.

The creativity of the CAN model generation works shocked developers and researchers at that time, because these works looked very similar to the abstract paintings popular in the art circle. So the researchers organized a Turing test and asked viewers to guess whether the works were the work of human artists or the creation of artificial intelligence.

As a result, 53% of viewers believed that the AI works of art of CAN models were made by humans, which exceeded half for the first time in a similar Turing test in history.

But CAN, the AI painting, is limited to some abstract expressions, and in terms of artistic score, it is far from reaching the level of human masters.

Not to mention creating some realistic or figurative paintings, which do not exist.

In fact, until the beginning of 2021, OpenAI released the widely watched DALL-E system, and its AI painting level is mediocre. Here are the results of DALL-E 's painting of a fox, barely discernible.

But it is worth noting that when it comes to DALL-E, AI begins to have an important ability to follow text input prompts to create!

Next, we continue to explore the questions raised at the beginning of this article. I don't know if all readers feel the same way, but since the beginning of this year, the level of AI painting has suddenly soared, and there has been an essential leap in the quality of previous works.

When things happen, there must be demons. What on earth happened? Let's take our time.

Why AI painting advances by leaps and bounds in many sci-fi movies or dramas, there is often a scene in which the protagonist speaks to a particularly sci-fi computer AI, and then AI generates a 3D image, which is presented in front of the protagonist in the form of VR / AR / hologram.

Putting aside those cool visual effects packaging, the core competence here is that human input is used by language, and then the computer AI understands human expression, generates a graphic image that meets the requirements, and shows it to human beings.

If you think about it, the most basic form of this ability is a concept of AI painting. (of course, there is still a little distance from flat painting to 3D generation, but compared with the difficulty of AI creating a meaningful painting out of thin air, automatically generating a corresponding 3D model from a 2D image is not an order of magnitude problem.)

So, whether it's speech control or more mysterious brainwave control, the cool scenes in sci-fi movies and TV actually describe an AI ability to automatically turn "language descriptions" into images through AI understanding. At present, the technology of automatic speech recognition of text has been extremely mature, so it is essentially an AI painting process from text to image.

In fact, quite powerful, only rely on the text description, without any reference pictures, AI can understand and automatically draw the corresponding content, and the painting is getting better and better! Something that felt a little far away yesterday has really appeared in front of everyone now.

How on earth did all this happen?

First of all, I would like to mention the birth of a new model. The aforementioned OpenAI team opened up a new deep learning model, CLIP (Contrastive Language-Image Pre-Training), in January 2021. One of the most advanced artificial intelligence for image classification.

CLIP trained AI to do two things at the same time, one is natural language understanding, the other is computer vision analysis. It is designed to be a powerful tool for specific purposes, that is, to do general image classification, and CLIP can determine the degree of correspondence between images and text prompts, such as matching the image of a cat to the word "cat".

To put it simply, the training process of CLIP model is to use the labeled "text-image" training data, on the one hand, to train the text, on the other hand, to train the image with another model, and constantly adjust the internal parameters of the two models, so that the text eigenvalues and image eigenvalues output by the model can make the corresponding "text-image" match after simple verification.

The key point is, in fact, some people have tried to train the "text-image" matching model before, but the biggest difference of CLIP is that it scraped 4 billion "text-image" training data! Through this amount of data and the astonishingly expensive training time, the CLIP model has finally come to fruition.

Smart readers will ask, who made so many "text-image" tags? 4 billion, ah, if you need to manually mark image-related text, then the time cost and labor cost are sky-high. And this is the smartest thing about CLIP, which uses pictures that are widely distributed on the Internet!

Pictures on the Internet generally contain a variety of text descriptions, such as titles, comments, and even user tags, etc., which naturally become available training samples. In this particularly clever way, CLIP's training process completely avoids the most expensive and time-consuming manual tagging, or Internet users around the world have done it in advance.

CLIP is powerful, but anyway, at first glance, it seems to have nothing to do with artistic creation.

But just a few days after the open source release of CLIP, some machine learning engineer players realized that the model could be used to do more. Ryan Murdock, for example, figured out how to connect other AI to CLIP to create an AI image generator. "after a few days of playing with it, I realized I could generate images," Ryan Murdock said in an interview.

In the end, he chose BigGAN, a variant of the GAN model, and released the code as Colab notes The Big Sleep.

Note: Colab Notebook is a very convenient Python Notebook interactive programming notebook online service provided by Google, behind which is the support of Google cloud computing. Tech-savvy users can edit and run Python scripts and get output on a notebook-like Web interface. The important thing is that this programming note can be shared.

The pictures created by Big Sleep are actually a little weird and abstract, but this is a good start.

Subsequently, the Spanish player @ RiversHaveWings released the version and tutorial of CLIP+VQGAN, which was widely forwarded and spread through Twitter, which attracted great attention from the AI research community and enthusiasts. Behind this ID, it is now known as computer data scientist Katherine Crowson.

Previously, generation tools such as VQ-GAN can synthesize similar new images after training a large number of images. however, if the reader still has an impression, as mentioned earlier, GANs-type models themselves cannot generate new images through text prompts, nor are they good at creating entirely new image content.

The idea of grafting CLIP onto GAN to generate an image is simple and clear:

Since CLIP can be used to calculate which image eigenvalues match any string of text, as long as the matching verification process is linked to the AI model responsible for generating the image (for example, VQ-GAN here), the model responsible for generating the image in turn deduces a suitable image eigenvalue, which can be matched and verified by the image, won't you get a work that matches the text description?

Some people think that CLIP+VQGAN is the biggest innovation in artificial intelligence art since Deep Dream in 2015. The wonderful thing is that CLIP+VQGAN is readily available to anyone who wants to use them. According to Katherine Crowson's online tutorials and Colab Notebook, a user with some technical knowledge can run the system in minutes.

Interestingly, as mentioned in the previous chapter, at the same time (early 2021), the OpenAI team that released the open source CLIP also released its own image generation engine DALL-E. CLIP is also used inside DALL-E, but DALL-E is not open source!

So in terms of community influence and contribution, DALL-E can't compare with CLIP+VQGAN 's open source implementation release. Of course, open source CLIP is already OpenAI's great contribution to the community.

When it comes to open source contributions, we have to mention LAION.

LAION, a global non-profit machine learning research organization, opened the largest open source cross-modal database LAION-5B in March this year, containing nearly 6 billion (5.85 Billion) picture-text pairs, which can be used to train all generation models from text to image, as well as CLIP, which is used to rate the matching degree of text and image. Both of these are the core of the current AI image generation model.

In addition to providing the vast library of training materials mentioned above, LAION also trains AI to rate images in LAION-5B based on artistic and visual beauty, and classifies high-scoring images into a subset called LAION-Aesthetics.

In fact, the latest AI painting models, including the AI painting model King Stable Diffusion, are trained using LAION-Aesthetics, a high-quality data set.

CLIP+VQGAN has led the trend of a new generation of AI image generation technology, and now all the profiles of open source TTI (Text to Image) models will thank Katherine Crowson. She is the well-deserved founder of a new generation of AI painting models.

Technical players began to form communities around CLIP+VQGAN, with code refinements and Twitter accounts dedicated to collecting and publishing AI paintings. As a result, the earliest practitioner Ryan Murdoch was recruited into Adobe as a machine learning algorithm engineer.

However, the players of this wave of AI painting are mainly AI technology enthusiasts.

Although compared with the local deployment of AI development environment, the threshold for running CLIP+VQGAN on Golab Notebooks is relatively low, but after all, in Colab, you apply for GPU to run the code and call AI to output pictures, and you have to deal with code errors from time to time, which is not what popular people, especially artists without technical background, can do. And that's why silly AI-paid creative services like MidJourney now shine.

But the exciting progress is far from over. Careful readers have noticed that the powerful combination of CLIP+VQGAN was released early last year and spread in a small circle, but the popularity of AI painting, as mentioned at the beginning, was triggered by Disco Diffusion, an online service, at the beginning of this year. There is still more than half a year here. What is the delay?

One reason is that the result of the image generation part of the CLIP+VQGAN model, that is, the GAN model, is not satisfactory.

The AI staff noticed another way of generating images.

If you review the working principle of the GAN model, the image output is the result of a competitive compromise between the internal generator and the judge.

But there is another way of thinking, that is the Diffusion model (diffusion model).

The word Diffusion is also very classy, but the basic principle can be understood by everyone, which is actually "denoising". Yes, it is the familiar automatic noise reduction function of mobile phone photography (especially night scene photography). If the calculation process of denoising is repeated, in extreme cases, is it possible to restore a completely noisy picture to a clear picture?

Of course, it is impossible to rely on people, and a simple denoising program is impossible, but it is feasible to guess while denoising based on AI ability.

This is the basic idea of Diffusion diffusion model.

At present, Diffusion diffusion model has more and more influence in the field of computer vision. It can synthesize visual data efficiently, picture generation completely beats GAN model, and it also shows great potential in other fields such as video generation and audio synthesis.

AI painting product Disco Diffusion, which is first known by the public at the beginning of this year, is the first practical AI painting product based on CLIP + Diffusion model.

However, there are still some obvious shortcomings of Disco Diffusion. For example, Stijn Windig, a professional artist, has repeatedly tried Disco Diffusion and believes that Disco Diffusion has not replaced the ability of manual creation. There are two core reasons:

Disco Diffusion can't depict the details, and the rendered images are amazing at first glance, but if you look closely, you'll find that most of them are vague generalizations that don't reach the level of commercial detail.

The initial rendering time of Disco Diffusion is calculated in hours, and to depict the details based on the rendered image is tantamount to redrawing the whole picture, which takes more time and energy than direct hand-drawing.

However, Stijn Windig is optimistic about the development of AI painting. He thinks that although it is not feasible to use Disco Diffusion directly for commercial creation, it is still a very good inspiration reference: "… I find it more suitable to be used as a creative generator. Give a text hint that it returns some pictures to stimulate my imagination and can be used as sketches to paint on it."

In fact, technically speaking, the two major pain points put forward by Stijn: 1) the details of AI painting are not deep enough, and 2) the rendering time is too long, which is actually due to an inherent disadvantage of the Diffusion diffusion model, that is, the iterative process of reverse denoising to generate images is very slow, and the model is calculated in pixel space. This leads to a huge demand for computing time and memory resources, which becomes extremely expensive to generate high-resolution images.

(pixel space, which is a bit professional, actually means that the model calculates directly on the original pixel information level)

Therefore, for popular application-level platform products, this model can not calculate and mine more image details in a generation time acceptable to users, and even draft-level mapping takes Disco Diffusion hours to calculate.

But in any case, compared with all the previous AI painting models, the painting quality given by Disco Diffusion is crushed and beyond, and it is beyond the reach of most ordinary people. The picky of Stijn is only the requirement of standing on the high point of human professional creation.

However, Stijn may not have expected that the two major pain points he pointed out in AI painting were solved almost perfectly by AI researchers in a few months.

At this point, Dangdang, the world's most powerful AI painting model Stable Diffusion has finally made its debut!

Stable Diffusion began testing in July this year, and it solves the above pain points very well.

In fact, compared with the previous Diffusion diffusion model, Stable Diffusion focuses on doing one thing, that is, the computational space of the model is mathematically transformed from the pixel space to a low-dimensional space called Latent Space, and then heavy model training and image generation calculations are carried out.

How much impact has this "simple" transformation of thinking brought about?

Compared with the pixel space Diffusion model, the latent space-based Diffusion model greatly reduces the memory and computing requirements. For example, the latent space coding reduction factor used by Stable Diffusion is 8, that is, the length and width of the image are reduced by 8 times, and a 512x512 image is directly changed into 64x64 in the latent space, saving 8x8=64 times the memory!

This is why Stable Diffusion is so fast and good, it can quickly (in seconds) generate a 512x512 image full of details, just a consumer-grade 8GB 2060 graphics card!

Readers can simply calculate that without this space compression conversion, a super graphics card with 8Gx64=512G memory is needed to achieve a second-rate image generation experience like Stable Diffusion. According to the law of the development of graphics card hardware, consumer graphics cards to achieve this memory may be 8-10 years later.

And an important iterative algorithm for AI researchers has brought the fruits of AI painting that we may enjoy 10 years later directly to the computers of all ordinary users today!

So it's perfectly normal for people to be surprised by the progress of AI painting, because from last year to this year, there has been a continuous breakthrough in the technology of AI painting, from the training of CLIP models based on massive Internet images without labeling, to the grafting craze of AI painting models triggered by CLIP open source, and then found the Diffusion diffusion model as a better image generation module. Finally, the improved method of latent space dimensionality reduction is used to solve the problem of huge time and memory consumption of Diffusion model. All this is dizzying, it can be said that AI painting in this year, the change is calculated by the day!

In this process, nothing is happiest than all AI technology enthusiasts and art creators. We have witnessed that the level of AI painting, which has been stagnant for many years, reaches its peak at rocket speed. There is no doubt that this is a highlight moment in the history of AI development.

For all ordinary users, of course, the happiest is the great pleasure of using today's top painting AI such as Stable Diffusion or MidJourney to generate professional paintings.

Interestingly, the birth of Stable Diffusion is also related to the two pioneers mentioned earlier, Katherine Crowson and Ryan Murdoch. They became core members of EleutherAI, the open source research and development team of AI, a decentralized organization. Despite calling itself a grassroots team, EleutherAI is already a leader in open source teams in ultra-large-scale prediction models and AI image generation.

It is EleutherAI as the technical core team that supports Stability.AI, an AI solution provider founded in London, England. These ideal people get together, based on the above latest AI painting technology breakthroughs, launched today's most powerful AI painting model Stable Diffusion. The important thing is that Stable Diffusion is fully open source in August as promised! This important open source has moved AI scholars and AI technology enthusiasts around the world to tears. As soon as Stable Diffusion is open source, it always occupies the top spot of GitHub hot list.

Stability.AI fully fulfilled the Slogan "AI by the people, for the people" on the home page of its official website and must give a big compliment.

The following picture is the author's online running Stable Diffusion, thanks to open source! It is said that the Japanese boy with his own halo generated by AI is quite handsome:)

Competition for top AI painting models: Stable Diffusion V. S. MidJourney authors have introduced MidJourney, an online AI painting artifact, in previous articles. Its biggest advantage is zero threshold interaction and very good output. Creators can use Discord-based MidJourney bot to create conversational paintings without any technical background (well, of course, all English)

From the output style, MidJourney is very obvious for the portrait to do some optimization, after using too much, the style tendency of MidJourney is also more obvious (the author spent hundreds of dollars on MidJourney computing resources to try a variety of themes after the first-hand experience), to put it nicely, it is more delicate and delicate, or rather, a little bit greasy.

Stable Diffusion's works are obviously more elegant and artistic.

The following is a comparison of the AI works created on these two platforms using the same text. Readers might as well feel it directly.

(note: the following generated paintings are fully copyrighted, please indicate the source if they are reproduced separately)

Stable Diffusion (left) V.S. MidJourney (right):

Tree house

Diesel punk style city

Orgrimmar, the main city of World of Warcraft

Armored Wolf Knight

Blue fantasy style comic girl

Romantic realistic beauty oil painting (style reference Daniel Golhartz, American painter)

A labyrinth of old urban buildings with long and narrow walkways

Which style is better? As a matter of fact, each of the radish and green vegetables has his own preference.

Because have done targeted optimization, such as to create a portrait or sugar water style beauty picture with MidJourney more convenient. However, after comparing several works, the author thinks that Stable Diffusion is obviously superior, both in terms of artistic expression and the diversity of style changes.

However, the iteration of MidJourney in the past few months is obvious to all (after all, it is a paid service, very profitable and very motivated), coupled with the completely open source of Stable Diffusion, it is expected that the relevant technical advantages will soon be absorbed into MidJourney. On the other hand, the training of the Stable Diffusion model is still going on, and we can very much expect that the future version of the Stable Diffusion model will go further.

For all creator users, this is a very good thing.

What does the breakthrough of AI painting mean to mankind in the field of AI in 2022, the AI painting model based on text-based images is the main character in the limelight. Starting with Disco Diffusion in February, DALL-E 2 and MidJourney invited internal testing in April, and Google released two major models, Imagen and Parti, in May and June.

It's really dazzling. No wonder the author lamented in the last article that the level of AI painting has improved by leaps and bounds without paying attention to it. In fact, it is in this year and a half that revolutionary and even historic breakthroughs have taken place in AI painting.

And in the following time, AI painting, or more broadly, AI generates content areas (image, sound, video, 3D content, etc.) What else will happen is full of reverie and expectation.

But we do not have to wait for the future to experience the artistic height that can be reached by the most advanced AI painting model represented by Stable Diffusion, we can basically confirm that the two words once full of mysticism, "imagination" and "creativity", are also the last pride of mankind, in fact, they can also be deconstructed by technology.

For proponents of the divine supremacy of the human soul, the creativity shown by today's AI painting model is a merciless blow to faith. The so-called inspiration, creativity and imagination, which are full of divine words, are about to (or have been) mercilessly hit by the powerful combination of super computing power + big data + mathematical model.

In fact, a core idea of the AI generation model like Stable Diffusion, or the core idea of many in-depth learning of the AI model, is to represent the content created by human beings as a vector in a high-dimensional or low-dimensional mathematical space (a simpler understanding, a string of numbers). If the transformation design of this "content-> vector" is reasonable enough, then all human creative content can be expressed as a partial vector in a mathematical space. And the other vectors that exist in this infinite mathematical space are exactly what human beings may have created in theory, but have not yet been created. Through the reverse "vector-> content" transformation, these uncreated content is mined by AI.

This is exactly what the latest AI painting models such as MidJourney and Stable Diffusion do. AI can be said to be creating new content, can also be said to be the porter of new paintings. The new paintings produced by AI have always existed objectively in the mathematical sense, but they are restored from the mathematical space by AI in a very clever way.

The article is made in heaven, but I get it with a wonderful hand.

This sentence is very appropriate here. This "heaven" is the infinite mathematical space, and this "hand", from human beings, has been changed to AI.

Mathematics is really the supreme rule of the world:)

At present, the "creativity" of the latest AI painting has begun to catch up with or even almost equal to human beings, which may further deal a blow to human dignity. From the beginning of go AlphaGo, human dignity at this point of "wisdom" has become smaller and smaller, while the breakthrough in AI painting has further shattered the dignity of human "imagination" and "creativity"-perhaps not completely broken. But it is full of cracks and crumbling.

The author has always maintained a neutral view on the development of human science and technology: although we hope that science and technology will make human life better, in fact, just like the invention of the nuclear bomb, the emergence of some science and technology is neutral and may be fatal. It seems more and more possible to completely replace the human super AI in practice. What human beings need to think about is how to maintain dominance over the world in the not-too-distant future when we face AI in all fields.

A friend is right that if AI finally learns to write code-there seems to be no inevitable barrier to it-then the movie story may be about to happen. If this is too pessimistic, then human beings should at least consider how to get along with an AI world that transcends all their wisdom and creativity.

Of course, from an optimistic point of view, the future world will only be better: human beings access unified or personal meta-universe through AR / VR, and the omnipotent AI assistant can automatically generate content according to requirements, or even directly generate stories / games / virtual lives for human experience.

Is this a better Inception or a better Matrix? (laughter)

In any case, the breakthrough and transcendence of AI's painting ability that we are witnessing today is the first step on this road of no return.

Make a digression at the end. Although it hasn't appeared yet, it should be in these two years that we can directly ask AI to produce a complete novel of a specified style, especially those typed works, such as

< 凡人修仙传 >

For such fantasy novels, you can also specify the length, the number of heroines, the plot tendency, the degree of sadness and blood, or even the degree of xx. AI can be generated with one click:)

This is not a fantasy at all, and considering the rocket-like development of AI painting this year, the author even feels that the day is just around the corner.

At present, there is no AI model that can generate enough infectious and logical long-length literary content, but judging from the aggressive development of AI painting model, it is almost certain that AI will generate high-quality types of literary works in the near future, and there is no doubt in theory.

This may be a blow to those hard-working online writers, but as a technology enthusiast and fantasy lover, the author is still looking forward to this day. From now on, there is no need to urge the change or worry about the writing status of the serial author; better still, if you feel uncomfortable seeing half of it, you can always let AI adjust the direction of the subsequent plot and regenerate it.

If you are not sure that such a day is coming, we can seek common ground while reserving differences and wait together.

Finally, share a group of authors using stable diffusion to generate completely different details, the same style, and always maintain the quality of the "urban labyrinth old building area with long and narrow walkways" series. Looking at these exquisite AI works, the author has only one feeling that AI creation has a "soul". I wonder if readers feel the same way.

This article comes from the official account of Wechat: Web3 Sky City (ID:Web3SkyCity), author: city owner

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.