32 papers hard core inventory of AI hotspots in 2022 04/24 Update SLTechnology News&Howtos

32 papers hard core inventory of AI hotspots in 2022

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

The 2022 super complete AI circle research collection is here! Well-known blogger Louis Bouchard self-made video explanation plus short story analysis, is also super friendly to rookies.

Although the world is still recovering, research has not slowed down its frenzied pace, especially in the field of artificial intelligence.

In addition, new attention has been paid to AI ethics, prejudice, governance and transparency this year.

Artificial intelligence and our understanding of the human brain and its connection with artificial intelligence are constantly developing, and these applications that improve our quality of life will shine in the near future.

Well-known blogger Louis Bouchard also counted 32 (!) AI technology breakthroughs in 2022 on his blog.

Next, let's take a look at these amazing studies.

Article address: https://www.louisbouchard.ai/ 2022-ai-recap/LaMA: resolution robust large mask repair based on Fourier convolution you must have experienced this situation: you and your friend took a great picture. As a result, you find someone behind you, ruining the photo you want to send to your moments or Little Red Book. But now, that's no longer a problem.

The resolution robust large mask repair method based on Fourier convolution allows users to easily remove unwanted content from the image. Both people and trash cans can easily disappear.

It's like a professional ps designer in your pocket, which can be easily cleared with a single click.

Although it seems simple, image restoration is a problem that many AI researchers need to solve for a long time.

Paper link: https://arxiv.org/ abs / 2109.07161 Project address: https://github.com/ saic-mdal / lamaColab Demo: https://colab.research.google.com/github/saic-mdal/lama/blob/master/colab/LaMa_inpainting.ipynb

Video description: https://youtu.be/ Ia79AvGzveQ short story analysis: https://www.louisbouchard.ai/ lama/STIT: real video face editing based on GAN you must have had this experience: when you watch a movie, you will find that the actors in the movie look much younger than themselves.

Before will Smith in the Gemini Killer, it required professionals to spend hundreds or even thousands of hours working to manually edit the scenes in which these actors appeared. But with AI, you can do it in a few minutes.

In fact, many technologies can increase your smile and make you look younger or older, all automatically using algorithms based on artificial intelligence. It is called AI-based facial manipulation (AI-based face manipulations) in the video and represents the latest technology in 2022.

Paper link: https://arxiv.org/ abs / 2201.08361 Project address: https://github.com/ rotemtzaban / STIT

Video description: https://youtu.be/ mqItu9XoUgk short story analysis: https://www.louisbouchard.ai/ stitch-it-in-time/NeROIC: using the neural rendering nerve rendering of the online gallery, you can generate realistic 3D models in space through pictures of objects, people or scenes.

With this technology, you only need to have a few pictures of an object, and you can ask the machine to understand the object in these pictures and simulate what it looks like in space.

It is easy for human beings to understand the physical shape of objects through images, because we know the real world. But for machines that can only see pixels, this is a completely different challenge.

How does the generated model fit into the new scene? If the lighting conditions and angles of the photos are different, the resulting model will also change, what should I do? These are the problems that Snapchat and the University of Southern California need to address in this new study.

Paper link: https://arxiv.org/ abs / 2201.02533 Project address: https://github.com/ snap-research/NeROIC

Video description: https://youtu.be/ 88Pl9zD1Z78 short story analysis: https://www.louisbouchard.ai/ neroic/SpeechPainter: speech restoration under text conditions for images, machine learning-based repair technology can not only remove the content, but also fill the missing parts of the image according to the background information.

For video repair, the challenge is not only to maintain the consistency between frames, but also to avoid generating false artifacts. At the same time, when you successfully "kick" a person from the video, you need to delete his or her voice as well.

To this end, Google researchers have proposed a new speech repair method, which can correct the grammar, pronunciation and even eliminate background noise in the video.

Links to papers: https://arxiv.org/ abs / 2202.07273

Video description: https://youtu.be/ zIIc4bRf5Hg short story analysis: https://www.louisbouchard.ai/ speech-inpainting-with-ai/GFP-GAN: using generative facial priors to repair blind faces in the real world, do you have any treasured old photos that are blurred because of their age? Don't worry, with blind face repair (Blind Face Restoration), your memories will be refreshed for a long time.

This new and free AI model can repair most of your old photos in an instant. It works well even if the quality of the photo before restoration is very low. This used to be a considerable challenge.

What's even cooler is that you can try it the way you like. They have opened up the code and created a demo and online application for everyone to try out. I'm sure this technology will surprise you!

Paper link: https://arxiv.org/ abs / 2101.04061 Project address: https://github.com/ TencentARC / GFPGANColab Demo: https://colab.research.google.com/drive/1sVsoBd9AjckIXThgtZhGrHRfFI6UUYOo online Application: https://huggingface.co/ spaces / akhaliq / GFPGAN

Video description: https://youtu.be/ nLDVtzcSeqM short story analysis: https://www.louisbouchard.ai/ gfp-gan/4D-Net: multi-modal alignment learning self-driving cars how to "keep an eye on everything"?

You may have heard of the LiDAR sensors or other strange cameras that car companies are using. But how do they work, how do they see the world, and what exactly do they see differently from us?

Article link: https://arxiv.org/ abs / 2109.01066 and Tesla only use cameras to understand the world. Most self-driving car manufacturers, such as Waymo, use ordinary cameras and 3D LiDAR sensors.

Instead of generating images like ordinary cameras, they generate 3D point clouds that use RGB sensing information to measure the distance between objects and calculate the propagation time of the pulsed laser they project to the object.

Still, how can we effectively combine this information and make the vehicle understand it? What will the vehicle eventually see? Is autopilot safe enough? A new research paper from Waymo and Google will solve these puzzles.

Video description: https://youtu.be/ 0nJMnw1Ldks short story analysis: https://www.louisbouchard.ai/ waymo-lidar/Instant NeRF: how do real-time neural elements based on multi-resolution hash coding simulate what the world looks like through photos?

Using AI models, people can turn captured images into high-quality 3D models. This challenging task allows researchers to use 2D images to create what objects or people look like in a three-dimensional world.

Through the graphical primitives based on hash coding, Nvidia achieves 5-second training of NeRF and achieves better results. In less than two years of research, the training speed of NeRF has been increased by more than 1000 times.

Paper link: https://arxiv.org/ abs / 2201.05989 Project address: https://github.com/ NVlabs / instant-ngp

Video explanation: https://youtu.be/ UHQZBQOVAIU short story analysis: https://www.louisbouchard.ai/nvidia-photos-into-3d-scenes/DALL E 2: text generation image model based on CLIP features last year, OpenAI released the text-image generation model DALL E. Now, the upgraded version of DALL E 2 is here again.

DALL E 2 can not only generate realistic images from text, but its output resolution is four times that of the former!

However, the performance improvements don't seem to be enough to satisfy OpenAI, so they taught DALL E 2 a new skill: image repair.

In other words, you can edit the image with DALL E 2, or add any new elements you want, such as adding a flamingo to the background.

Links to papers: https://arxiv.org/ abs / 2204.06125

Video explanation: https://youtu.be/ rdGVbPI42sA short story analysis: https://www.louisbouchard.ai/openais-new-model-dall-e-2-is-amazing/MyStyle: personalized generation a priori Google and Tel Aviv University have proposed a very powerful DeepFake technology. With it, you can do almost anything.

By taking hundreds of photos of a person, you can encode his or her image and repair, edit or create whatever you want.

This is both amazing and frightening, especially when you see the results generated.

Paper link: https://arxiv.org/ abs / 2203.17272 Project address: https://mystyle-personalized-prior.github.io/ Video explanation: https://youtu.be/ BNWAEvFfFvQ short story analysis: https://www.louisbouchard.ai/ mystyle/OPT: open pre-training Transformer language model GPT-3 is so powerful because of its architecture and size.

It has 175 billion parameters, twice the number of neurons in the human brain! Such a huge neural network makes the model learn almost the content of the entire Internet and understand how we write, exchange and understand text.

Just as people marveled at the power of GPT-3, Meta took a big step towards the open source community. They released an equally powerful model, and the model is completely open source!

This model not only has more than 100 billion-level parameters, but also OPT-175B is more open and easy to access than GPT-3.

Paper link: https://arxiv.org/ abs / 2205.01068 Project address: https://github.com/ facebookresearch / metaseq

Video link: https://youtu.be/ Ejg0OunCi9U short story analysis: https://www.louisbouchard.ai/ opt-meta/BlobGAN: spatially discrete scene representation the Adobe team has come up with a new approach to how to describe a scene: BlobGAN.

BlobGAN uses "blob" to describe objects in the scene. Researchers can move these spots, make them larger, smaller, or even delete them, which has the same effect on the objects represented by the spots in the image.

As the authors share in their results, you can create new images in the dataset by copying speckles.

Now, the BlobGAN code has been open source, interested partners, hurry up and try it!

Paper link: https://arxiv.org/ abs / 2205.02837 Project address: https://github.com/ dave-epstein / blobgan

Video description: https://youtu.be/ mnEzjpiA_4E short story analysis: https://www.louisbouchard.ai/ blobgan/Gato: generalist agent DeepMind builds a single "general" agent Gato. You can play Atari games, do subtitle images, chat with people, and control the robotic arm!

What is even more shocking is that it can complete all tasks only once and with the same weight.

Gato is a multimodal agent. This means that it can both create titles for images and answer questions as chatbots.

Although GPT-3 can chat with you, it is clear that Gato can do more. After all, there are often AI people who can chat, but not many people who can play games with them.

Links to papers: https://arxiv.org/ abs / 2205.06175

Video explanation: https://youtu.be/ xZKSWNv6Esc short story analysis: https://www.louisbouchard.ai/ deepmind-gato/Imagen: text-to-image diffusion model with deep language understanding if you think DALL E 2 is excellent, take a look at this new model from Google Brain-Imagen--.

DALL E is amazing, but the images often lack realism, which is what the Imagen developed by the Google team is trying to solve.

According to the benchmark of comparing text to image model, Imagen has achieved remarkable results in text-image synthesis of text embedding in large language models. The resulting image is both wild and authentic.

Paper link: https://arxiv.org/ abs / 2205.11487 Project address: https://imagen.research.google/

Video explanation: https://youtu.be/ qhtYPhPWCsI short story analysis: https://www.louisbouchard.ai/ google-brain-imagen/ DALL ·E Mini a group of thriller pictures of Xiaoza were popular on Twitter for a while. This group of San value crazy works, by DALL E. mini.

As the "youth version" of the DALL E family, DALL E mini is better than free and open source. The code has been left, who will be the next character to be changed by magic?

Project address: https://github.com/ borisdayma / dalle-mini online experience: https://huggingface.co/ spaces / dalle-mini/ dalle-mini Video explanation: https://youtu.be/ K3bZXXjW788 short story analysis: https://www.louisbouchard.ai/ dalle-mini/NLLB: this NLLB-200 model released by Meta AI, the model naming concept comes from "Don't leave any language behind" (No Language Left Behind) Arbitrary translation has been realized in more than 200 languages.

The highlight of the study is that the researchers improved most of the low-resource language training by multiple orders of magnitude and achieved the SOTA results of 200 + language translation.

Paper link: https://research.facebook.com/ publications / no-language-left-behind/ Project address: https://github.com/ facebookresearch / fairseq / tree / nllb online experience: https://nllb.metademolab.com/

Video explanation: https://youtu.be/ 2G4NeG17Eis short story analysis: https://www.louisbouchard.ai/ no-language-left-behind/Dual-Shutter optical vibration sensing system sound can also be seen?

This research, which won the Best Paper Award of CVPR 2022, proposes a novel Dual-Shutter method by using a "slow" camera (130FPS) to detect high-speed (up to 63kHz) surface vibrations from multiple scene sources at the same time, and by capturing the vibration caused by audio sources.

As a result, various requirements such as the separation of musical instruments and the elimination of noise can be realized.

Links to papers: https://openaccess.thecvf.com/ content / CVPR2022 / papers / Sheinin_Dual-Shutter_Optical_Vibration_Sensing_CVPR_2022_paper.pdf Project address: https://imaging.cs.cmu.edu/ vibration/

Video description: https://youtu.be/ n1M8ZVspJcs short story analysis: https://www.louisbouchard.ai/ cvpr-2022-best-paper/Make-A-Scene: scene-based and human a priori text-to-image generation Make-A-Scene is not just "another DALL E."

While it's cool that DALL ·E can generate random images based on text prompts, it also limits user control over the results.

The goal of Meta is to promote creative expression, combining this trend of text-to-image with previous sketches to image models to produce "Make-A-Scene": a wonderful fusion between text and sketch conditional image generation.

Links to papers: https://arxiv.org/ abs / 2203.13131

Video description: https://youtu.be/ K3bZXXjW788 short story analysis: https://www.louisbouchard.ai/ make-a-scene/BANMo: building a 3D animation model of a target from any video based on Meta, you just need to give any video that captures a deformable object, such as uploading a few kittens and dogs, and BANMo can integrate 2D clues from thousands of images into the canonical space. Then an editable animated 3D model is reconstructed without a predefined shape template.

Paper link: https://arxiv.org/ abs / 2112.12761 Project address: https://github.com/ facebookresearch / banmo

Video description: https://youtu.be/ jDTy-liFoCQ short story analysis: https://www.louisbouchard.ai/ banmo/ uses the potential diffusion model to synthesize high-resolution images. This year's fire image generation models DALL ·E, Imagen and strong out-of-circle Stable Diffusion, what do these powerful image generation models have in common? In addition to high computing costs and a lot of training time, they are all based on the same diffusion mechanism.

The diffusion model has recently achieved SOTA results in most image tasks, including text-to-image using DALL E, as well as many other image generation-related tasks, such as image restoration, style conversion, or image super resolution.

Paper link: https://arxiv.org/ abs / 2112.10752 Project address: https://github.com/ CompVis / latent-diffusion

Video description: https://youtu.be/ RGBNdD3Wn-g short story analysis: https://www.louisbouchard.ai/ latent-diffusion-models/PSG: scene-based image generation model AI can help you accurately identify objects in the image, but it is not so easy to understand the relationship between objects and the environment.

For this reason, a panoramic panoptic scene graph generation (PSG) task based on panoramic segmentation is proposed for researchers from Nanyang Technology.

Compared with the traditional scene graph generation based on detection box, the PSG task requires to comprehensively output all the relationships in the image (including the relationship between object and object, the relationship between object and background, and the relationship between background and background), and use accurate segmentation blocks to locate objects.

Paper link: https://arxiv.org/ abs / 2207.11247 Project address: https://psgdataset.org/ online Application: https://huggingface.co/ spaces / ECCV2022 / PSG

Video description: https://youtu.be/ cSsE_H_0Cr8 short story analysis: https://www.louisbouchard.ai/ psg/ uses text reversal to achieve personalized generation of text to image. This year, the image generation models of the major factories can be described as eight Immortals crossing the sea, but how to make the model generate image works of a specific style?

Scholars from Tel Aviv University and Nvidia have jointly launched a personalized image generation model that can DIY the images you want.

Paper link: https://arxiv.org/ abs / 2208.01618 Project address: https://textual-inversion.github.io/

Video explanation: https://youtu.be/ f3oXa7_SYek short story analysis: https://www.louisbouchard.ai/ imageworthoneword/ language image pre-training model for general video recognition visual text model learning has undoubtedly achieved great success, but how to extend this new language image pre-training method to the video field is still an open question.

Scholars from Microsoft and the Chinese Academy of Sciences have proposed a simple and effective method to make the pre-trained language image model directly adapt to video recognition instead of pre-training the new model from scratch.

Paper link: https://arxiv.org/ abs / 2208.02816 Project address: https://github.com/ microsoft / VideoX / tree / master / X-CLIP

Video description: https://youtu.be/ seb4lmVPEe8 short story analysis: https://www.louisbouchard.ai/ general-video-recognition/Make-A-Video: one-click text generation video model painters enjoy painting on the canvas, such a clear and smooth picture, can you think of every frame of the video is generated by AI?

The Make-A-Video launched by MetaAI can generate different styles of video in a few seconds by simply typing a few words. It is not too much to say that it is a "video version of DALL E.".

Links to papers: https://arxiv.org/ abs / 2209.14792

Video explanation: https://youtu.be/ MWwESVyHWto short story analysis: https://www.louisbouchard.ai/ make-a-video/Whisper: large-scale weakly supervised speech recognition model have you ever thought that there is a translation software that can quickly translate the voice in the video, even the languages you don't understand?

OpenAI's open source Whisper can do just that.

Whisper is trained on more than 680000 hours of multilingual data to recognize and translate multilingual sounds into text in noisy backgrounds, as well as the translation of professional terms.

Paper link: https://arxiv.org/ abs / 2212.04356 Project address: https://github.com/ openai / whisper

Video explanation: https://youtu.be/ uFOkMme19Zs short story analysis: https://www.louisbouchard.ai/ whisper/DreamFusion: using 2D images to generate 3D model text can generate images, videos, and 3D models.

Google's DreamFusion can generate 3D models with one click by using pre-trained 2D text-to-image diffusion models, which are trained on billions of image text pairs to promote the latest breakthrough in text-to-3D model synthesis.

Links to papers: https://arxiv.org/ abs / 2209.14988

Video description: https://youtu.be/ epuU0VRIcjE short story analysis: https://www.louisbouchard.ai/ dreamfusion/Imagic: based on the diffusion model of real image editing method using DALL E and other text image generation model, only need to enter a line of text to get the desired picture, but the image generated by AI is sometimes not so perfect.

Researchers from Google, Israel Institute of Technology and Weizman Institute of Science introduced a real image editing method based on diffusion model-Imagic, which uses only text to realize the PS of real photos.

For example, we can change a person's posture and composition while retaining its original features, or I want a standing dog to sit down and a bird to spread its wings.

Paper link: https://arxiv.org/ abs / 2210.09276 Project address: https://imagic-editing.github.io/

Video explanation: https://youtu.be/ gbpPQ5kVJhM short story analysis: https://www.louisbouchard.ai/ imagic/eDiffi: higher quality text image synthesis model stronger than DALL E and Stable Diffusion image synthesis model!

This is Nvidia's eDiffi, which can generate higher quality images more accurately, and adding a brush mold can add more creativity and flexibility to your work.

Paper link: https://arxiv.org/ abs / 2211.01324 Project address: https://deepimagination.cc/ eDiff-I/

Video description: https://youtu.be/ grwp-ht_ixo short story analysis: https://www.louisbouchard.ai/ ediffi/Infinite Nature: learning infinite view generation of natural scenes from a single image have you ever thought of taking a picture and flying into it like opening a door?

Scholars from Google and Cornell University have turned this vision into reality: InfiniteNature-Zero, who can generate unlimited views of natural scenes from a single image.

Paper link: https://arxiv.org/ abs / 2207.11148 Project address: https://infinite-nature.github.io/

Video description: https://youtu.be/ FQzGhukV-l0 short story analysis: https://www.louisbouchard.ai/ infinitenature-zeroGalactica: a large language model for science Galactica developed by Meta is a large language model of the same size as GPT-3, but it specializes in scientific knowledge.

The model can write government white papers, news reviews, Wikipedia pages and code, and it knows how to quote and write equations. This is a big deal for artificial intelligence and science.

Links to papers: https://arxiv.org/ abs / 2211.09085

Video explanation: https://youtu.be/ 2GfxkCWWzLU short story analysis: https://www.louisbouchard.ai/ galactica/RAD-NeRF: real-time portrait composition model based on audio spatial decomposition since the advent of DeepFake and NeRF, AI face change seems to be common, but there is a problem. AI's face is sometimes exposed because it doesn't match the mouth shape.

The emergence of RAD-NeRF can solve this problem, it can make real-time portraits of the speakers in the video, and it also supports custom avatars.

Paper link: https://arxiv.org/ abs / 2211.12368 Project address: https://me.kiui.moe/ radnerf/ChatGPT: language Model optimized for Dialogue 2022 how can AI's blockbuster work without ChatGPT, which has been popular all over the network and has been developed by netizens to write Xiao Huang, typing code and other applications of the universal model, if you do not know it, then come and have a look!

Video explanation: https://youtu.be/ AsFgn8vU-tQ short story analysis: https://www.louisbouchard.ai/ chatgpt/ can be directly used in the production of video face re-aging. Although the current computer vision model can generate the age of the face, style transfer, etc., but it only looks cool, but it has almost no effect in practical application. The existing techniques usually have the problems of loss of facial features, low resolution and unstable results in subsequent video frames, which often require manual secondary editing.

Recently, Disney released FRAN (Face Re-Aging Network), the first practical and fully automated method that can be used to produce re-age faces in video images, officially announcing the end of the visual effect of relying on makeup artists to change the age of actors in the film.

Link to the paper: https://dl.acm.org/ doi / pdf / 10.1145 Universe 3550454.3555520 Project address: https://studios.disneyresearch.com/2022/11/30/production-ready-face-re-aging-for-visual-effects/

Video description: https://youtu.be/ WC03N0NFfwk short story analysis: https://www.louisbouchard.ai/ disney-re-age/ Resources:

Https://www.louisbouchard.ai/2022-ai-recap/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.