Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Shh, AI is quietly listening to you.

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

When watching spy movies when they were young, agents often had a necessary skill to stare at the person they were talking to hundreds of meters away and judge what they were saying from the shape of their mouth. With this kind of memory, it makes all kinds of speech recognition technology feel quite ordinary nowadays-you don't understand until I say it out loud, what an AI hero.

But a recent patent application shows that Microsoft is learning from agents and introducing silent speech recognition technology. When AI can act like an agent and argue silently, how on earth will our world be different?

Apart from lip recognition, how else can AI quietly understand what you are saying?

When it comes to silent speech recognition, perhaps many people's first reaction is to copy the human way and use image recognition to read lips. The use of lips for speech recognition has a long history, but the accuracy of recognition has not been high. DeepMind took a test in 2016, and after 10, 000 hours of news video training, AI's lip accuracy reached 46.8%. A domestic company once provided relevant data: in the recognition of Chinese news videos, the accuracy reached 70%. The lip recognition in the driving scene introduced by Sogou can reach 90% accuracy because the vocabulary involved is very small.

It can be found that, compared with the current mainstream speech recognition accuracy of 95%, 97%, lip recognition accuracy is really not up to the table. It's not bad for Chinese, which is a word-by-syllable language, and for English, a language with a lot of connections, lip recognition really has a lot of thresholds to cross.

On the other hand, the ethical problems involved in lip recognition are too serious. The "range" of lip recognition is too far, and if this technology is really mature, it means that we will no longer have privacy when talking under Skynet. In today's increasingly anxious about privacy, any company that studies this technology publicly may feel that its public relations department is too idle.

Therefore, the industry and academia, including Microsoft, are looking for a more accurate and private silent language recognition. At present, it seems that the technical direction of silent speech recognition can be divided into two "factions", one is "Air sect" and the other is "Electric sect".

The patent filed by Microsoft is a typical "Airbender"-adding a sensor to the terminal to judge what the user is saying by sensing the airflow as they speak. This kind of terminal is like a small microphone, placed next to the user's mouth, and the air flow formed by the user when speaking will be reflected in the device. After training, the signals reflected by these air currents can correspond to the text one by one.

On the other hand, E-Zong is even more magical. We know that people need to mobilize the muscles of the lower half of the face when they speak, and different words pronounce them in different ways. Through the collection of facial EMG (electromyography) signal, we can learn the characteristics of facial EMG signal when human speaking, and correspond the EMG signal to the text through the training of neural network.

It can be seen that these two kinds of silent speech recognition have a common feature, that is autonomy and privacy. Whether it is EMG signal acquisition or airflow acquisition, it is necessary to wear a good device on the speaker, rather than being able to collect and analyze remotely and without the speaker's knowledge, as is the case with image technology.

Silent speech recognition becomes real Qigong?

Whether it is Qizong or Dianzong, these silent speech recognition technologies are faced with the same problem-since users have to speak out in order to recognize them, why not directly use speech recognition for text conversion and translation? do you have to do some tricks that don't actually apply scenarios like Qigong?

In fact, the application of silent speech recognition may not be as widespread as people think, it can not help the hearing impaired in the most efficient way, nor is it allowed to be used in monitoring and other work. However, in some key situations, silent speech recognition can play a wonderful role.

We can use our brains together to think about where people need to talk, but we can't hear each other. The answer is simple, either in places where the sound cannot be transmitted, or in places that are particularly noisy. So silent speech recognition has the following application scenarios:

Disaster site, extravehicular exploration, underwater operation.

In such places, people may wear special clothes such as biochemical suits and astronaut suits in order to avoid polluted air or to breathe oxygen. After putting it on, you can neither see each other's expression nor hear each other's voice, let alone use voice interaction to control other devices. At the same time, environmental conditions (such as insufficient oxygen) often do not allow people to speak in a normal voice, coupled with the closed condition of protective clothing will cause voice echo, the previous voice recognition is very difficult to play a role in this case.

At this time, silent speech recognition, which can be placed inside the protective clothing, is very valuable, and the speaker only needs to do the exit type to convey information to the outside world.

In addition, there are noisy roads, factory workshops, airports.

In these places, you often need to yell at the top of your voice if you want the other person to hear you clearly. It is even more difficult for speech recognition to pick up sound accurately. At this time, the use of silent speech recognition will be much easier, not only can accurately express information, but also allow some staff in this situation to wear soundproof earplugs to protect their hearing.

In fact, EMG signal silent speech recognition technology has been applied in some European fighter planes because of the loud noise in the cabin and the inability of pilots to communicate with each other.

Of course, at present, compared with speech recognition technology and even lip speech recognition, the development stage of silent speech recognition technology is still very primary, and the application efficiency is not high.

In fact, silent speech recognition is a typical "beautiful and useless" AI technology, which perfectly embodies the arrangement and combination of a series of technologies, such as the combination of AI and neuroscience in EMG signal silent speech recognition. In addition, the application is greatly limited, even in some scenarios where it is difficult to transmit sound, it is necessary to consider the calculation conditions, the medium of information transmission after speech recognition, not to mention the complex data collection work.

But we have reason to believe that in the future, when AI technology is becoming more and more popular and the application cost is getting lower and lower, there will always be some extreme scenarios in which these seemingly useless technologies will be applied-perhaps one day, voice interaction will also be used in the control of fighter planes.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report