While humans are busy rolling around, AI has completed its biggest evolution in recent years. 07/01 Update SLTechnology News&Howtos

While humans are busy rolling around, AI has completed its biggest evolution in recent years.

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

For a long time, the word AI became less exciting.

People have been unable to avoid it, but at the same time, it seems that both the technological evolution and commercial application of AI have encountered bottlenecks. People haven't experienced the amazing AlphaGo for many years, and the industry has not experienced the business opportunity like the popularity of voice assistants, and even many investors reluctantly turn around and look back on the opportunity of AI when there is really no new story.

However, just as the 2022 AI, which all mankind is busy rolling around, is undergoing the biggest evolution in recent years.

AI suddenly became popular on October 18, when StabilityAI, an artificial intelligence company that became popular with the launch of the Stable Diffusion text-to-image AI generation model, announced that it had completed a $101 million seed wheel financing, valuing the company at $1 billion and becoming a so-called "unicorn" company. It is only two years since the establishment of StabilityAI.

Photo Source: StabilityAI official website, even by the standards of the development of the technology Internet industry, the growth rate of StabilityAI is amazing, which is a microcosm of the explosive growth of the global AI industry since 2022. It has been less than two months since StabilityAI's Stable Diffusion open source model became popular all over the world.

The rapid advance of this sudden storm can be regarded as a real revolution, especially against a backdrop of weaker global economic expectations.

Like all revolutions, the AI revolution was not completed overnight.

People have always had a dream, that is, to expand the existing boundaries of human wisdom, knowledge and creativity with the help of artificial intelligence AI technology, but the learning ability brought by the complex structure of the human brain far exceeds the human ability to build AI, so AI can only break through some specific areas through a variety of specific deep learning models, such as alphaGO to learn go, and through the astronomy big data to help find pulsar candidates.

And AIGC, that is, content creation based on AI capabilities (including text, pictures, videos, etc.) is also one of the important categories. Before 2022, due to the limitations of the core technology, this field has been tepid, because AI does not have the magic of turning stone into gold, it does not have the ability to create out of thin air. AI's "deep learning" training is not self-conscious autonomous learning, but a process in which AI summarizes rules from massive data by collecting a large number of samples, and then reproduces content based on laws according to human instructions. It is also limited by core algorithms, hardware conditions, database samples and other aspects.

Before 2022, the most widely used algorithm model in the field of AIGC is called confrontation Generation Network (Generative adversarial networks), which, as the name implies, allows two programs within AI to compare with each other to produce the correct image that is closest to the human mind. However, this algorithm has a serious problem, because the standard of program comparison is ready-made samples, so the generated content is essentially an infinite approximation of the existing content, and imitation means that it can not really break through.

Image source: the shortcomings of https://developers.google.com/ machine-learning / gan / gan_structureGAN are finally overcome by the diffusion diffusion model, which is the technical core of many AIGC image generation models emerging this year, including the Stable Diffusion open source model.

The principle of diffusion diffusion model is similar to denoising photos, by learning the process of denoising a picture to understand how meaningful images are generated, so the images generated by diffusion model are more accurate than GAN model, and more in line with human vision and aesthetic logic. At the same time, with the accumulation of the number of samples and deep learning time, diffusion model shows a better ability to imitate artistic expression style.

Image source: https://towardsdatascience.com/diffusion-models-made-easy-8414298ce4da caused widespread concern from Disco Diffusion at the beginning of this year, and then to DALL-E2, MidJourney and other models are based on the Diffusion model, and the financing of Stable Diffusion is one of the most popular. Due to StabilityAI's support for the technology community atmosphere and recognition of the principle of technology neutrality, Stable Diffusion actively opened its own source code, which is not only convenient for people to deploy locally (ordinary consumer graphics cards can meet the hardware requirements of Stable Diffusion), but also brought a magic user experience: open the URL, enter the keywords you want to picture, wait a few minutes, the model will generate a very complete picture work. As a result, the threshold for ordinary people to use the most cutting-edge AI technology has been reduced to the lowest, with more than 1.7 trillion generated images produced through the official platform DreamStudio alone since its launch.

Image source: a picture generated by Stable Diffusion. Picture source: AIGC, the official website of StabilityAI, started a prairie fire after a long period of silence.

The gorgeous blue sea AIGC picture generation model represented by StabilityAI has developed to a very mature stage in such a short time, which indicates that it has great potential for development from the more traditional design and drawing, illustration, game vision, e-commerce and other fields to the hot meta-universe and virtual reality technology.

Image source: after entering AI wins, DreamStudio generates images based on Stable Diffusion. Imagine that in the future VR / AR virtual world, the images you have in mind can be rendered in real time with the help of ai generation technology. What kind of subversion will this have on the way people entertain and get information?

But that's not the whole reason why the market voted for AI at a time when the economy is in the doldrums. The broad commercial potential is attractive, but what's more worth investing in is AI technology itself. The revolution is not over yet, and its next chapter has come to the people.

That is to generate video.

In essence, video is a continuous still image. With the increasing maturity of ai picture generation technology, many people have turned their attention to the field of video generation. Since September, Meta and Google have successively announced their latest achievements in this frontier field of AIGC.

Meta's model is called Make-A-Video. By learning a large number of text-image combination sample data and unmarked videos to understand the motion logic of objects in the real world, Make-A-Video can initially make the image move on the basis of building images, and has the ability to understand the three-dimensional structure of three-dimensional objects.

Image source: the related paper https://arxiv.org/ pdf / 2209.14792.pdf published by Meta, a model called imagen video, generates video through a method called cascade diffusion series model. That is, the video with low resolution is generated by the basic diffusion model, and then the resolution and frame number of the video are improved by a series of temporal and spatial super-resolution models.

Image source: horizontal comparison of imagen video official website shows that the video resolution (1280X768) of imagen is higher than that of Make-A-Video, and the duration is also slightly longer.

Image source: Google published a related paper https://imagen.research.google/ video / paper.pdf, but the breakthrough does not stop there. Another AI video generation model called Phenaki (also from the Google team) announced its technology to generate variable-length videos based on text content, that is, Phenaki has the ability to extract and understand the storyline from the text and turn it into video.

In the published demo, Phenaki generates a continuous video for more than 2 minutes based on a series of logical text instructions composed of hundreds of words. this prototype of feature film full of mirrors, rich plots and transitions is bound to have an extensive impact on the entire video industry in the future, including short videos, TV movies and so on.

The generation of video model is still in its infancy, and it is still immature in the details of the specific movement, the fineness of the picture, and the interaction between different objects and people, and there are also strong traces of "artificial intelligence" in terms of resolution and picture quality. however, recalling that the AI image generation model has also gone through the process from network-wide mockery to counterattack, who can say that this is not the coming prediction of the next wonderful part of the AIGC revolution?

Image source: the drastic changes in the controversy brought about by the drastic changes in the pictures generated by Midjourney according to the instruction AI wins are always accompanied by controversy, and so is the "image phase" of the AIGC revolution represented by Stable Diffusion. We try to summarize it into the following questions and make preliminary answers.

(1) how to define the copyright of AI generated content? China's copyright law stipulates that only natural persons or organizations can be recognized as authors, so there is no copyright entity for ai-generated content. Without more contract constraints, AI-generated content can be used arbitrarily, including commercial use. Midjourney, Dell-e, etc., all make it clear that users have ownership of their own generated works.

Image source: response to copyright questions on StabilityAI's official website

Image source: the response of StabilityAI's official website on copyright issues is worth mentioning that many databases used for in-depth learning and training of AI generation technology may contain infringing content, but the possibility of infringement of user-generated content is very low, because the generated content itself is highly random and uncertain, even if it is involved in copyright disputes, the process of proof will be extremely difficult.

(2) is the content generated by AI artistic? If so, how should it be evaluated and defined? The artistry of the content generated by AI was a boring question six months ago, but after the award for the work "Opera House Space", people began to talk about it more and more.

In general, AI-generated content is not created by itself, it is affected by its own model algorithm and database sample size, which is why many people claim that ai-generated content is "soulless".

However, it is unfair to regard AI generation technology as a pure tool, because it can not only be imitated, but also algorithms and samples together provide a creative perspective that existing human beings cannot fully provide.

Image source: a picture generated by Stable Diffusion. Image source: the existing AI image generation technology on StabilityAI official website has made the threshold for people to participate in image creation infinitely low, so the artistic appreciation of generated works may start from a more subdivided point of view, just as NFT is for traditional works of art, its value needs to be tested by the market, and the art market is in the initial stage of understanding and acceptance.

(3) what does the "image phase" of the AIGC revolution mean for image workers and artists? With the "democratization" of AI generation technology, the middle and low-end painting content and its market will be replaced by AI in the future, which means that a large number of image workers, illustrators and designers below the waist will lose their existing jobs.

As the content of images generated by AI becomes more and more rich and realistic, they are also fundamentally deconstructing the mode of operation of commercial photo libraries-who would pay for pictures if they could be generated on their own?

Image source: gettyimages's statement about the content generated by AI, but the AI generation technology also expands people's understanding of the capabilities of painting tools. For art creators, AI generation technology will help them to create more dimensional and creative works based on their own ideas (rather than techniques).

The future will be a competition for creators' creativity, because AI "removes the barriers for laymen to express their creativity." (Bjorn Ohm says his team developed the original basic algorithm for Stable Diffusion. )

Image source: a picture generated by Stable Diffusion. Picture source: StabilityAI official website (4) how to regulate the content generated by AI, and how to prevent the spread of false information and inappropriate information? Researchers who hold a technology-neutral attitude, such as StabilityAI, will minimize the control and intervention of content. They believe that an open and fully discussed community will gradually develop a monitoring mechanism for the dissemination of information content.

"users themselves are responsible for how the technology is used, including moral and legal compliance." CEO Emad Mostaque, a Stability AI company, said in an interview.

Image source: a picture generated by Stable Diffusion. Photo Source: StabilityAI official website at the same time, although the database used in deep learning has been strictly screened, blocking pornography, violence, terror and other content, but the social stereotype, racial discrimination and other content can not be completely eliminated technically, more importantly, how to define the so-called prejudice is still a controversial issue in ethics. For this reason, Google decided to postpone the public release of imagen video models before eliminating the related risks, and many published models chose to generate works with irrevocable watermarks to avoid potential disputes.

The AIGC revolution is in full swing, it is not the future tense, but the continuous tense. We're already in it.

Now is the future.

This article comes from the official account of Wechat: ID:pinwancool, author: Neil Shen

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.