Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Meta open source multi-sensory artificial intelligence model, integrating text, audio, visual and other six types of data

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

CTOnews.com, May 9 / PRNewswire-FirstCall-Asianet /-- Meta has released a new open source artificial intelligence model, ImageBind, which integrates multiple data streams, including text, audio, visual data, temperature and motion readings. The model is currently just a research project and has no direct consumer or practical application, but it shows the possibility of future generative artificial intelligence systems that can create immersive, multi-sensory experiences. At the same time, the model also shows that Meta is open to artificial intelligence research, while its competitors such as OpenAI and Google are becoming more and more closed.

The core concept of this study is to integrate multiple types of data into a multidimensional index (or, in artificial intelligence terms, "embedded space"). This concept may be abstract, but it is the basis of the recent craze for generative artificial intelligence. For example, artificial intelligence image generators, such as DALL-E, Stable Diffusion, and Midjourney, rely on systems that link text and images during the training phase. While looking for patterns in the visual data, they connect the information with the description of the image. This is why these systems can generate pictures based on the user's text input. The same applies to many artificial intelligence tools that can generate video or audio in the same way.

Meta says its model ImageBind is the first to integrate six types of data into one embedded space. These six types of data include: vision (including images and video); thermal (infrared images); text; audio; depth information; and, the most interesting one, motion readings generated by inertial measurement units (IMU). (IMU exists in mobile phones and smartwatches and is used to perform a variety of tasks, from switching from horizontal to vertical to distinguishing between different types of motion. )

Future artificial intelligence systems will be able to cross-reference this data like current systems for text input. For example, imagine a future virtual reality device that can generate not only audio and visual inputs, but also the motion of your environment and physical platform. You can ask it to simulate a long sea trip. It will not only put you on a ship with the sound of the waves in the background, but also make you feel the deck shaking under your feet and the sea breeze blowing.

Meta pointed out in a blog post that future models could also add other sensory input streams, including "tactile, voice, smell and brain fMRI signals". The company also claims that the research "brings machines closer to the ability of humans to learn from many different forms of information simultaneously, comprehensively and directly."

Of course, much of this is based on prediction, and it is likely that the direct application of this study will be very limited. Last year, for example, Meta showed an artificial intelligence model that can generate short, blurry videos based on text descriptions. Research like ImageBind shows how future versions of the system can integrate other data streams, such as generating audio that matches the video output.

The study is also interesting to industry watchers because CTOnews.com notes that Meta has opened up the underlying model, which is a growing concern in the field of artificial intelligence.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report