Facebook also released cool techs, and the mobile phone photos were cut into 3D blockbusters with one click. 05/09 Update SLTechnology News&Howtos

Facebook also released cool techs, and the mobile phone photos were cut into 3D blockbusters with one click.

2025-05-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

2020-03-13 09:21 introduction: CNN network can also be used in this way!

With the development of science and technology, people can use mobile phones and other devices to take photos and record their favorite moments anytime and anywhere. Many people may have thought that if there is a cool techs, let us take flat 2 D photos into 3 D photos.

Facebook also thought of this problem a long time ago. To improve the user experience, Facebook launched the 3D photo feature in 2018. This is a new immersive format that you can use to share photos with friends and family. However, this feature relies on the dual-lens "portrait mode" feature of high-end smartphones and cannot be used on ordinary mobile devices.

In order to enable more people to experience this new visual format, Facebook developed a system using machine learning. This system can infer the 3D structure of any image, and the image taken by any device and at any time can be converted into 3D form. This makes it easy for people to use 3D photo technology.

Not only that, it can also deal with family photos and other precious images from decades ago. Anyone with iPhone7 or above, or mid-range or above Android devices, can now try this feature in Facebook applications.

Estimate the depth of different areas of a 2D image to create a 3D image

Building such enhanced 3D images requires a number of technical challenges, such as training a model that can correctly infer the 3D location of various themes and optimizing the system to run on a typical mobile processor device in less than a second. To overcome these challenges, Facebook trained the convolution Neural Network (CNN) on millions of public 3D images and their accompanying depth maps, and utilized various mobile optimization techniques previously developed by Facebook AI, such as FBNet and ChamNet. The team also recently discussed research on 3D understanding.

Now that this feature is available to anyone using Facebook, how on earth is it built? We can take a look at the technical details together.

The original photo of the dog was taken with a single-lens camera without any depth map data, and the system converted it into the 3D image shown above.

Provide efficient performance on mobile Devic

Given a standard RGB image, 3D Photos CNN (3D Photo convolution Neural Network) can estimate the distance between each pixel and the camera. The researchers achieved this goal in four ways:

Build a network architecture using a set of parameterized, mobile and optimized neural building blocks

Automate architecture search to find effective configurations for these blocks, enabling the system to perform tasks on various devices in less than a second

Quantitative perception training, using high-performance INT8 quantization on mobile devices, while minimizing performance degradation in the quantization process

Get a lot of training data from public 3D photos.

Neural building block

The architectural use of Facebook is inspired by the building blocks of FBNet. FBNet is a framework for optimizing ConvNet architecture for resource-constrained devices such as mobile devices. A building block consists of pointwise convolution (pointwise convolution), optional upsampling, kxk depth convolution, and additional point-by-point convolution. Facebook implements a U-net-style architecture that has been modified to place FBNet building blocks along skipped connections. The U-net encoder and decoder each contain five stages, each corresponding to a different spatial resolution.

Network Architecture Overview: an U-net places additional macro-level building blocks along skipped connections

Automate architecture search

In order to find an effective architecture configuration, the ChamNet algorithm developed by Facebook AI completes the search process automatically. The ChamNet algorithm constantly extracts points from the search space to train the accuracy predictor. The accuracy predictor is used to accelerate genetic search to find the model that maximizes the prediction accuracy while satisfying specific resource constraints.

A search space is used in this setting, which can change the channel expansion factor and the number of output channels per block, resulting in a possible 3.4x1022 architecture. Facebook then uses 800 Tesla V100 GPU to complete the search in about three days, setting and adjusting FLOP constraints on the model schema to achieve different operation points.

Quantitative perception training

By default, its model is trained with single-precision floating-point weights and activations, but the researchers found that quantifying weights and activations to 8 bits has a significant advantage. In particular, int8 weights require only 1/4 of the storage required for float32 weights, reducing the number of bytes that must be transferred to the device when used for the first time.

Each image starts with a regular 2D image and is then converted into a 3D image using a depth estimation neural network.

Int8-based operators also have much higher throughput than float32-based operators, thanks to optimized libraries such as Facebook AI's QNNPACK, which have been integrated into PyTorch. We use quantitative perception training (QAT) to avoid quality degradation caused by quantification. QAT is now part of PyTorch, which simulates quantification and supports back propagation during training, thus bridging the gap between training and production performance.

Neural networks deal with a variety of content, including paintings and images of complex scenes

Looking for new ways to create 3D experiences

In addition to improving the depth estimation algorithm, researchers are also committed to providing high-quality depth estimation for videos captured by mobile devices.

Video processing technology is challenging because the depth of each frame must be the same as that of the next frame, but it is also an opportunity to improve performance. Multiple observations of the same object can provide additional signals for high-precision depth estimation. With the continuous improvement of the performance of Facebook neural networks, the team will also explore the use of depth estimation, surface normal estimation and spatial reasoning in real-time applications such as augmented reality.

In addition to these potential new experiences, this work will help researchers better understand the content of 2D images. A better understanding of 3D scenes can also help robots navigate and interact with the physical world. By sharing the details of the 3D photo system, Facebook hopes to help the artificial intelligence community make progress in these areas and create new experiences that take advantage of advanced 3D.

Via: https://ai.facebook.com/blog/-powered-by-ai-turning-any-2d-photo-into-3d-using-convolutional-neural-nets/

Https://www.leiphone.com/news/202003/CVEKRbNuCKTGR5Xw.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.