How to analyze the principle of Deep Learning algorithm behind ZAO 04/21 Update SLTechnology News&Howtos

How to analyze the principle of Deep Learning algorithm behind ZAO

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to analyze the principle of deep learning algorithm behind ZAO, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

From the point of view of the lower-level algorithm, we will go deep into the nature of the algorithm to understand how ZAO changes its face based on GAN.

First of all, we give an overall flow chart of changing faces:

Photo Source: Exposing DeepFake Videos By Detecting FaceWarping Artifacts

The above picture shows the general process based on the deepFake face-changing algorithm. First, the input image (a) is used for human face detection (b), and after the face is detected, the key points are detected (c). Then (c) through the transformation matrix (d) to achieve face alignment, and then put the face into DeepFake (GAN/CycleGAN) to achieve face replacement, then the replaced face (g) through the inverse transformation of the transformation matrix to do key point alignment, and finally replace back to the original image for fusion to get (I) and (h).

What we give here is the general process of face replacement on the image. For a short video, it is necessary to intercept the video frame first, and then replace the face frame by frame. In the process of video frame replacement, there should be a face recognition network to ensure the unity of the replaced object (for example, if we want to replace the face of a swallow in a video, it is necessary to identify whether the detected face is a swallow. The face of crape myrtle can not be replaced.), of course, because the video is replaced frame by frame, in order to ensure the nature and consistency of the face replaced by the video frame before and after time, it is necessary to smooth the transfer of the face in the front and back frames to ensure a strong visual effect.

The above is the general process of image face change and video face change. Of course, for ZAO, we find that its face change effect is better than our general face change algorithm, especially in the head rotation (bow, turn back, head up), the effect is very good, so we have reason to believe that ZAO's algorithm should use 3D face key points detection, so it will change more naturally in the process of replacement.

OK, now that we understand the process, let's introduce the working principle of the DeepFake (GAN/CycleGAN) algorithm in more detail. In order to simplify your understanding of GAN/CycleGAN, we also show it in a graphical way:

First of all, the above image shows the simplest face replacement network. For the output face (left), the intermediate state (often a vector or a very small image) is obtained by neural network coding. Then enter the decoder to restore the reconstructed face (right). We note that the coding state in the middle is equivalent to all the information of the saved human face. In the picture above, we do not have the related operation of face replacement, that is, face An is encoded and face An is decoded, and face B is decoded after coding.

Next, what happens if we decode the vector encoded by B face with the decoding of A face? Yes, B's face will appear in the original A's face, but the facial expression and some details will retain A. In this way, the face can be changed.

One more thing to note from the image above is that because of the requirement of replaceable coding, we must keep the encoders of all faces consistent, that is, all faces before replacement are encoded with a uniform encoder (the uniform red encoder above). But for each different face, you have to use different decoders to decode it (different blue and green decoders pictured above) in order to complete the face change.

However, if you only use the above algorithm structure, the generated face will be relatively fake, and you can see quite obvious traces of artificial replacement. In order to make the replacement happen more real, CycleGan arises at the historic moment, or a simple picture to understand the algorithm nature of CycleGan:

We can see that, in the final analysis, CycleGan only adds one more loss between the fake face and the real face after changing the face to reduce the gap between the two. At the same time, compared with the previous AME-> B, CycleGan also realizes the generation of BMY-> An and narrows the gap at the same time, and the whole process presents a closed loop, so it is called Cycle.

The cyclic training of CycleGan can significantly reduce the inauthenticity caused by decoding the B face directly with the A decoder.

Of course, in the real scene, some post-processing may be needed to make the result more smooth and natural, such as doing some blurring at the edge of the face change, doing some style transfer with the original face in the face area, and so on.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.