We tried to make painting AI the new colleague in charge of illustration. 04/08 Update SLTechnology News&Howtos

We tried to make painting AI the new colleague in charge of illustration.

2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

This article comes from Weixin Official Accounts: Touch Music (ID: chuappgame), by Liu Wenghua

Technology is advancing.

As the discussion fades, the topic of "AI painting" seems to have stopped making waves recently, but many game companies have quietly added AI painting to their workflows. In the wave of exploring AI painting tools, as a group of laymen in painting, Touch Music has also taken its first step. Under the instructions of Teacher Zhu Jiayin, we are trying to use AI painting tools to generate illustrations needed for articles.

The copyright issue of article illustrations is a big concern for most users, and it is not easy to find illustrations that fit the subject matter of an article on an open-source or copyrighted image site. AI painting seems to be a good choice-in our imagination, as long as we provide AI with a few paragraphs of description or keywords, AI can "read and draw" the picture we want. Is it really that simple? In order to achieve the goal of "let AI draw illustrations for us" and liberate the labor force to the greatest extent, we have made some attempts.

Painting, payment, localization? To generate stylized illustrations, the first step, of course, is to pick a suitable model. The model evolution progress of AI painting has made rapid progress in the last six months. There are five or six mainstream models at home and abroad alone, and all kinds of stylized models are blooming. However, it is not easy to find a model suitable for generating article illustrations. Some models are open source, some require a fee to use, some wander in gray areas of the law and are secretly downloaded after being cracked...

No matter what it is, you have to try it. In the end, we chose four models as alternatives: Stable Diffusion, once known as the "strongest painting AI" after open source; DALL·E, one of the earliest image-generating AI owned by the veteran AI research team OpenAI; Midjourney, which is set up in Discord Channel and continuously updates the model; and finally, NovelAI, which supports Japanese style local deployment.

First of all, it must be stated that although the current AI painting copyright issue is still unclear,"local deployment version NovelAI" must be one of the most unreliable_regardless of the copyright issue of the image library, the source of the model itself will wander in the gray area of the law. By contrast, on-premises Stable Diffusion is much more "legitimate." Since Stable Diffusion was announced as open source, new and old versions of Stable Diffusion can be downloaded on GitHub. After local installation, various parameters and image previews of generated images can be intuitively adjusted with the help of WebUI tools.

But there is no such thing as a free lunch-and while open source acts like "putting lunch in your mouth," running the program also requires enough computing power. The GeForce RTX 2060 graphics card, which was considered a good configuration a few years ago, is now somewhat overpowered. Some people have counted the time it takes different graphics cards to generate 512×512 images using the Stable Diffusion model. The 2060 graphics card takes 17 seconds, and the 3080 only takes 7 seconds-not necessarily accurate enough, but it also has reference value.

The 3080 takes less than half the time of the 2060, and of course, in practice, you realize that 17 seconds is an ideal situation. As the number of iteration steps, frame adjustments, and the number of images generated increase, the time required to generate images increases almost exponentially. The most reasonable way is to generate a 512-size image first, and then enlarge it through the image enlargement algorithm. Even so, as you increase the number of iteration steps, you run the risk of memory overflow. A more intuitive feeling is that during the image generation process, the computer fan almost never stops.

By contrast, two other paid drawing AI's, DALL·E and Midjourney, are far more graphics friendly. Their image generation doesn't require you to nervously monitor the temperature of the graphics card to prevent it from burning out, just send the corresponding descriptor to their server, and the server will spit out a set of images for users to choose from. Only corresponding to this, each time the server resources are occupied to generate images, a certain amount of points will be consumed by the user. At the beginning, you can try it out for free. After trying out the free quota of each account, you must recharge the points for the account to continue generating.

In general, generating four 512×512 examples using one set of keywords takes about 1 point. The point pricing for each painting AI is slightly different_DALL·E's paid points are relatively more expensive, about $15 and 115 points, equivalent to about RMB 1 yuan generated once;Midjourney offers a monthly package of about 200 pictures for $10 a month, which is much cheaper.

Whether it's DALL·E, Midjourney or Stable Diffusion, the most important thing is, of course, the quality of the image generated. We used several different sets of keywords to test the AI's performance.

Description and keywords In terms of illustrations, compared with exquisite 3D modeling pictures or realistic style pictures close to photos, Teacher Zhu Jiayin prefers hand-drawn magazine illustration style. But there's a lot of trouble with style descriptions: how do we tell AI what we need?

At first, we tried to use the title of a magazine to describe its style of illustration in general terms: for example, by adding New Yorker to the keywords. The problem is that even within the same magazine, the style of illustration is not uniform. At this point, the free Stable Diffusion provides us with a lot of trial and error cases_even if you add keywords such as "hand-drawn,""no blur,""clear line draft" and artist's name, the painting AI still doesn't quite understand what you want, can only give you a few different styles of pictures at a time to choose from, you can see that these styles have indeed appeared in magazines. As for whether you can find what you want, you have to count on luck.

When you specify a wide range, Stable Diffusion will generate several different styles of pictures at once, and it is not easy to know the name of a specific painting style. In most cases, we can only think of general descriptions such as "hand-painted" or "watercolor" for a long time. Fortunately, we found Lexica, a search engine. There are many descriptive words and cases of generating images on the website. You can find the words you want through text or image search.

Lexica can be searched for cases shared by other users, but Lexica is not always tried. On the one hand, if you're targeting less popular artists, there aren't many users trying to generate stylistic images, and the examples available are limited. On the other hand, the process of AI generating pictures is accompanied by a lot of randomness. The pictures and keywords uploaded by users are not accurate enough every time. The uploaded example pictures look good, but it is normal that similar pictures cannot be generated in practice.

The problem with Stable Diffusion is also gradually revealed here: the model is excellent at generating real photo style or delicate original painting style, especially after the updated Stable Diffusion 2.1 version, the generated photo style image can almost be fake. But correspondingly, when it comes to relatively flat art styles, it takes a lot of effort to generate appropriate images.

Stable Diffusion does a pretty good job of generating photorealistic images

Of course, the threshold of descriptors is also one of the problems-it also takes time for anyone to constantly adjust, modify and find the right keywords in the process of generating images. At this point, Stable Diffusion isn't that friendly to zero-base users. If you don't fine-tune keywords and describe the desired image directly in natural language, you may need to generate a lot of images to get a satisfactory image. For example, we describe a specific scenario: "A girl at a cluttered desk with takeout bags and instant noodles stacked high, a calendar on the wall with the numbers after 'release date' crossed out in red. The girl hugged her head and looked very painful. "

When this entire description is shoved directly into Stable Diffusion, it shows unprecedented confusion.

Stable Diffusion does not always produce a satisfactory picture if the description is inaccurate. Compared to the paid DALL·E and Midjourney models, the probability of producing unusable "waste pictures" after repeated iterations and adjustments is much lower. Using the same set of keywords in Midjourney requires only a simple keyword "by Yuko Shimizu" to specify the style, and you can get pretty good results.

It can be seen that Midjourney correctly understands "takeout box," but his understanding of "painful" is somewhat strange. DALL·E does not understand the painting style well, but correctly understands the content of the description. In several models, DALL·E depicts the emotions of the characters vividly.

DALL·E associates "pain" with "hands on head" In terms of the need to generate article illustrations, paid DALL·E and Midjourney do seem to be better choices if you consider actual use. In stylized illustrations, Midjourney is even better. With just a few key words about the style or author of the painting, Midjourney can quickly "understand" exactly what you want.

American comics and storyboard style copyright, and the next trouble is obvious, there are already some painting AI put into commercial operation, and some of its users want to put AI-generated images into commercial use. In this case, copyright is naturally an unavoidable topic. At this point, due to the rapid development of AI painting field, in general, relevant laws and regulations have not had time to keep up. At present, the copyright terms of most AI painting models adhere to the style of "hands-off shopkeeper." Both Midjourney and Stable Diffusion make it clear that the copyright to the images generated belongs to the creator, but also state: Do not attempt to create images that involve pornography, discrimination, etc. that may cause harm to others. In case of dispute, all responsibilities have nothing to do with the platform and shall be resolved by both parties to the dispute.

Whether the content is legal or not finally falls on the specific work. If it is a work of an artist who is no longer protected by copyright law (usually 50 years after the artist's death), it is certainly not a problem to imitate their style and create again. If you want to imitate modern artists who are still protected by copyright law, you have to think carefully.

Van Gogh style paintings generated by Stable Diffusion Although in most cases AI painting does not perfectly generate the single style you want, it looks more like a mixture of styles, but if you train for a certain style or the work of a certain artist, the work generated by AI will inevitably appear too similar to the imitation object. In this regard, the mainstream large models are actually "doing well": even if the author is specified, it is difficult to draw exactly the same in DALL·E or Midjourney.

It is worth emphasizing again that in the last six months, the evolution speed of AI painting model can be described as rapid progress, but laws and regulations have not been able to catch up with this speed, there is still no corresponding international laws and regulations to regulate the copyright of AI painting. Commercial use of AI images is still at risk-when we pay AI painting platforms, should they pay for the collection of web images used for training?

The problem faced by individual users is relatively simple. As long as the AI painting platform does not make a counter-attack and suddenly claims that the copyright does not belong to the creator, generally speaking, there will be no moth in copyright, especially for personal non-commercial use, and there is not much risk. Another problem for artists who use AI tools to assist their work is that AI-generated images may not be accepted by all audiences. A few days ago, a promotional image of a mobile game was suspected of being painted with AI tools_the metal parts in the background were clearly "not like human drawings." This incident caused a stir among players-some players were disappointed, in their view,"AI painting" seems to be naturally associated with descriptions such as "cheap" and "not serious."

This is not an isolated incident, as many game developers have revealed that they have added AI painting to their workflow, but dare not make this fact public. But from another point of view, if AI painting is used as an auxiliary tool to eliminate hidden dangers in the copyright of the training library, what is the difference between it and 3D auxiliary software such as Blender and Enscape? If AI painting is used as a picture material library, how different is it from the results obtained from copyright-free material libraries such as Unsplash?

In any case, the current AI painting model is not very mature, but you can intuitively feel the rapid progress of technology_maybe soon, we will actually use AI to generate illustrations, which will be more realistic, more like human drawings, and then, I wonder if readers will find it?

Midjourney's "Pope with Corky and Potato Chips in Dip" looks really good.(The pictures in this article are all generated by painting AI Midjourney.)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.