Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

New AI technology shows up: it only takes a big picture to generate a realistic 3D face.

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

2020-06-19 19:12:13

How do you create a lifelike digital avatar of a person if you have only one image?

During the 2020 Computer Vision and Pattern Recognition Conference (CVPR), researchers at Imperial College London and AI facial analytics startup FaceSoft.io introduced AvatarMe technology that can reconstruct realistic 3D busts from just an ordinary image or photo. What's more, it can not only generate real 4K x 6K resolution 3D faces from low-resolution targets, but also perform detailed light reflections.

Fig.| 3D face reconstruction and real-time rendering effects (Source: GitHub)

From video conferencing to virtual reality to video games, rendering 3D faces has countless application scenarios, and although geometric shapes can be fitted without AI, more information is needed to render faces in arbitrary scenes.

To extract this information, the researchers used a sampling device with 168 LED lights and nine SLR cameras to capture pore-level reflection maps of 200 faces, and then they used this data to train an artificial intelligence model GANFIT that can synthesize realistic face maps from textures while optimizing "identity matching" between rendering and output.

Similar to other generative adversarial networks (GANs), GANFIT is a two-part model: a generator that generates samples and a discriminator that tries to distinguish between generated samples and real samples. The generator and discriminator complement each other until the discriminator cannot distinguish real examples from synthetic examples.

In addition, another component of AvatarMe is responsible for enhancing the resolution of textures, and a separate module predicts the reflectivity of each pixel in skin structure (such as pores, wrinkles, or hair) from illuminated textures, and even estimates surface details (such as fine wrinkles, scars, and skin pores).

In experiments, AvatarMe produced no artifacts in the final rendering and successfully handled "extreme" cases like sunglasses and occlusion, the reflectivity was consistent and the system "truly" illuminated the subject even in different environments, the researchers said.

Fig.| Adaptive face light reflection in different scenes (Source: GitHub)

3D face and geometric texture reconstruction is the most popular direction in the intersection of computer vision, graphics and machine learning. One of the key tasks of this research is to improve the 3D deformable model (3DMM) fitting method.

Fit the 3DMM to the "wild" input image and synthesize the full UV texture while optimizing identity matching between rendering and input.

Textures are upsampled 8 times to synthesize reasonable high frequency detail. The researchers then used an image transformation network to illuminate the texture and obtain diffuse albedo with high frequency detail, using separate networks to infer specular reflectance, diffuse normal, and specular normal from diffuse albedo and 3DMM shape normal. In addition, the network was trained on 512 x 512 patches and the inference process was performed on 1536 x 1536 patches. Finally, the face shape and consistently inferred reflectivity are passed to the head model, rendering the effect in real-time in any environment.

Fig.| AvatarMe's basic methodology framework (Source: GitHub)

How to enhance details? At its core is patch-based image-to-image conversion. The task of illumination, dimming, and inferring diffuse and specular components from a given input image (UV) can be formulated as a domain adaptation problem, and the model chosen by the researchers is pix2 pixHD, which shows impressive results in image-to-image conversion of high-resolution data.

To achieve realistic skin rendering, the researchers modeled diffuse reflectance, specular albedo, and normals of the desired geometry separately. Thus, given an unconstrained face image as input, they were able to infer the geometric parameters of the face as well as diffuse albedo (AD), diffuse normal (ND), specular albedo (as), and specular normal (NS).

Fig.| a. Image input;b. Base reconstruction;c. Super resolution;d. Dimming;e. Final rendering (Source: GitHub)

There are still some minor bumps in this detail optimization process. For example, to train the algorithmic model, the researchers captured data with very high resolution (over 4K) and therefore could not be used for "as-is" training using pix2 pixHD because of hardware limitations (even on a 32GB GPU, such high-resolution data could not be fit in its original format). In addition, pix2 pixHD only considers texture information and cannot take advantage of geometric details in the form of shape normals and depths to improve the quality of the resulting diffuse and specular components.

So, to overcome this problem, the developers split the raw high-resolution data into small blocks of 512×512 pixels for training, and in the inference process, since the network is fully convoluted, the patches can be larger (e.g., 1536×1536 pixels).

AvatarMe is not without its limitations. This limitation is the problem of "racial discrimination" that American technology companies are now vigorously calling for.

The paper mentions that since the training dataset does not contain subject examples from certain races, attempts to reconstruct darker skinned faces will result in poor results, and the reconstructed specular albedo and normals will sometimes show slight blurring of some high-frequency pore details due to minor alignment errors between the required data and the 3DMM model. Finally, the accuracy of facial reconstruction is closely related to the quality of the input photo, and a well-lit, higher-resolution photo will produce more accurate results.

The researchers say this is the first way in the industry to achieve "renderable" faces from any portrait image, including black-and-white photos and hand-drawn images, and AvatarMe, the latest AI system for 3D face generation and real-time rendering, promises to gradually automate processes that previously required manual design.

https://www.toutiao.com/i6840015163211383310/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report