Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A brief History of looking for Jarvis

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Maybe everyone wants to have a Jarvis.

Although we have been out of Marvel movies for many years, we can still remember the omnipotent AI assistant in the Iron Man suit. Unique humor, elegant tone, and very reliable people have made countless sci-fi fans like this invisible character.

How crazy is the obsession with Jarvis? I've seen Jarvis's version of the computer desktop program, Jarvis's mobile phone UI, and the AI algorithm named after Jarvis. Designers and geeks have so far thought of countless ways to "revive" their own Jarvis.

However, the increasingly realistic Jarvis interface obviously can not get its essence: communication and companionship like friends.

If there is any technology that wants to find Jarvis in its "essence", it must be a voice assistant.

In an age of technological explosion, we may not pay much attention to the development of a certain technology in our daily life. But if you stop and look back, you will be surprised to find that a technology has undergone quite amazing changes. For example, the voice assistant in the mobile phone has become a part of daily life. And if you look back on its history, you might be surprised: using a voice assistant to revive Jarvis is no joke. Looking back at the growth of voice assistant over the years, we will find that in terms of human interaction, application capability development, and function integration, the experience of voice assistant is indeed approaching the ultimate goal of "Jarvis".

It seems to be popular to write all kinds of brief history recently, so let's talk about a brief history of "looking for Jarvis" today. It is not difficult to find that the evolutionary trajectory of voice assistants is actually very clear.

Voice assistant is just growing up.

As we all know, the earliest voice assistant was Apple's Siri.

Back in those days, when no one had ever talked to a mobile phone, Siri was really all-powerful. There is even an episode in the Big Bang Theory in which Rajesh and Siri fall in love.

However, it should be admitted that technology must have its stages. It's like a classic car, but it certainly can't drive on the highway of the 21st century. The initial stage of Siri is actually very rudimentary to experience today.

The reason for the large-scale investment of Apple in the matter of voice assistant is mainly due to the work of the cloud computing system. In fact, earlier, the prototype of voice assistant has already appeared. However, at that time, the cloud computing system was not rolled out, so we could only store a number of voice instructions in the mobile phone, which led to the monotony of the user's question and answer template.

Starting with Siri, the voice assistant completes the cloud deployment. In this way, a large number of corresponding templates can be stored in the cloud and updated in real time, thus making voice services possible.

At this time, although Siri knows more, but his IQ is very worrying. Because the voice assistant at that time was completely templated, in other words, you must accurately say the problem and be accurately identified by Siri in order to find out the corresponding answer for you.

However, tricks are always routines. At that time, the experience of using Siri looked something like this:

Siri, call xx for me.

Okay, I've dialed

Siri, make a call for me. I need to contact xx.

I'm sorry, Siri didn't understand.

In short, the template can not be a little wrong, otherwise it will be pushed to start all over again. Fortunately, the era of this kind of voice assistant completely competing with the number of templates passed quickly-- because AI was coming.

After AI came,

The biggest disadvantage of early voice assistants is that questions and answers can only be templated. The user is not actually chatting with the voice assistant, just replacing the remote control with voice operation. To some extent, this not only increases the interaction cost of users, but also reduces the intelligent expectation of the concept of "voice assistant" in the minds of users.

It's a good thing AI came.

With the revival of deep learning, using neural network to deal with speech tasks has gradually become the mainstream. With the accession of AI, speech assistants begin to present the capabilities of semantic understanding, multi-round dialogue, sound synthesis and so on, and continue to develop on these tracks. As a result, we see that the listening ability of voice assistants is becoming more and more agile, and in many cases, they can talk to users like normal chats. Even dialects, children's sounds and other problems have been solved to a certain extent.

The most distinctive representatives of voice assistants in this period should be Google voice assistants Google Assistant and Microsoft Xiaoice. The former created a feat of imitating a real person on the phone without revealing it last year, while the latter is also active in the chat world all the year round and is famous for not being discovered to be AI.

At this time, the voice assistant experience has been significantly upgraded. For example, if the user says that I want to make a phone call / make a phone call for me / check someone's number for me, and then make a phone call, the voice assistant can basically understand.

And with the help of in-depth learning, the voice assistant can also remember the user's chat habits, and is not often able to tease users and set up personal chat settings.

However, this is not the end. With the popularity of AI voice assistant, especially in mobile phones, users find that there is still too little it can do. Basically stay on the phone, send text messages, organize the schedule and so on. However, these are veritable marginal applications today.

The chatty but useless voice assistant is as if Jarvis can only chat with Tony but can't start the Iron Man suit-which obviously has a serious impact on the box office.

In the past two years of mobile AI awakening, with the terminal AI processing power becoming more powerful, things began to have more development.

Fold out a humanized interaction

In the evolution of voice assistants, there are two things that determine the direction of today's story. One is that objectively speaking, the AI processing capacity of both end-side and cloud-side is getting stronger. Many AI apps that could not be triggered before are becoming a reality, and these capabilities are like Iron Man's armor and weapons, giving voice assistants more room to manipulate.

The more obvious change comes from consumers and manufacturers, and there has been a subjective upgrade to the voice assistant. In the past, as a "tasting food", the main ability of voice assistant was to show how intelligent and powerful he was. Today, users who have become accustomed to voice interaction need to "take the initiative" and let voice assistants provide services, not show off skills.

As we all know, our main services in mobile phones are based on APP. Then the next step of the voice assistant is to integrate it into APP to abstract the service and bring convenience to users. For example, Samsung's Bixby was the first to try to break through the APP.

The combination of subjective and objective conditions constitutes the main upgrade scheme of today's voice assistant: folding interaction, integrating applications, and reflecting humanization.

There are already a lot of contestants along the way, and what you can see pushing the voice assistant to new heights is the newly updated EMUI9.1.

The name Xiaoyi is no stranger to Huawei users. On the other hand, Xiaoyi version of EMUI 9.1 has undergone a very large upgrade. From the perspective of the entire history of voice assistant development, these upgrades may also be very important. Let's take a look at what Xiaoyi has just brought based on the changes in several scenes.

1. Be able to listen and speak as well as to see

Voice assistant this thing, everyone has acquiesced that its ability is to listen and speak. In fact, however, a real "assistant" also needs a basic ability, that is, "watching words and watching faces".

The reason why voice assistants have been lacking the ability to "see" this level is largely due to the fact that most mobile phones are still unable to carry complex AI visual computing. As soon as Huawei has taken the lead in this field in the past two years, it has naturally given birth to the foundation for voice assistants to move towards listening, speaking, seeing and all-around.

In the EMUI9.1 version, Xiaoyi turns on multimodal fusion interaction. Previously, if users wanted to use their mobile phones to identify flowers and plants, cars and so on, they needed to turn on the camera or a dedicated APP. This process is actually a bit tedious, and a lot of fleeting scenery may be missed.

The new Xiaoyi, on the other hand, allows users to press the power supply for a second to wake it up and say "what is this" directly to the phone, and the voice assistant will automatically identify flowers, cars, and so on. This ability is more useful in identifying food calories. You can ask your phone directly, "will I get fat if I eat this?" and then Xiaoyi will automatically turn on calorie recognition and report the number of calories in the food. Xiaoyi will also advise you to eat less or rest assured according to the calorie level of the object. It is estimated that Xiaoyi is so friendly. If you ask your friends the same question, they will probably say, "you will get fat even if you eat air."

The goal is not only to see the camera, but also to see the pictures on the phone. Another way to play Xiaoyi's visual combination is to search pictures by voice. For example, a user can say, "find a picture of my girlfriend last year," and the phone will find what you want among the many pictures in the library.

2. Penetrate APP to serve you

Our life today is shuttling through countless APP. At the same time, we have to admit that many functions are actually hidden in the depths of APP, and it is a bit troublesome to complete them.

For example, when you are thirsty, you want to buy a bottle of water from the vending machine. A meal finally selected the right drink, then prompted you to scan the code to pay, and then go through the experience of finding Alipay, opening Alipay, and clicking on the scan code-every second of the process is painful.

In the upgrade of EMUI9.1, Xiaoyi added more functions to unlock APP scenes with voice. For example, the above operations can now directly say "scan code" to the mobile phone and wake up in a word. This skill is suitable for all kinds of situations where you are in a hurry to buy things, and you might as well experience for yourself.

Another typical scene is in Wechat. Now Wechat voice seems to have completely replaced the phone, but in the confusion of Wechat, it is actually very difficult to find the right person. To go through click search, typing, complete the search click into the dialog box, initiate voice, a total of four steps. On the other hand, Xiaoyi can just say "call someone on Wechat" and get it done.

In addition, different applications trigger Xiaoyi, it is said that there are different surprises.

3. Learnable and definable

Another upgrade point of Xiaoyi is that the ability to learn users' voice habits and usage habits has been strengthened, and a new mode of custom combination skills has been opened up.

For example, when we get into the car after work on Friday, there are actually a lot of things that have to be done with mobile phones in advance. For example, open the navigation to confirm the destination, listen to "Today is a good day", send Wechat to my buddies about going to the game, and send a text message to my wife that I will not be home for dinner today. This practice is not worth advocating.

What we can see is that although this wave of operation is in a pleasant mood, it is rather complicated. People with slow hands will be overtaken by the evening rush in minutes. In the new version of Xiaoyi, users can customize these operations and integrate them into the same voice command. For example, yell at Xiaoyi, "Labor and management are off duty!" Xiaoyi will automatically carry out the above operation set by the user, which is simple and relieving gas.

What can be seen from Xiaoyi's story is that today, the mobile voice assistant is not only a question of the intelligence of the voice software itself, but also the ability of the whole phone, which has to be integrated and transferred through the voice assistant. Fold a variety of capabilities, multiple applications, multiple interactions, and finally fold out a more time-saving, labor-saving, more in line with the natural interaction that users expect.

From the set of templates when the voice assistant was born, and then to the admission of AI, and then to the intelligent folding era of the integration of software and hardware of mobile phones, it seems that a formula can be summed up in the changing history of voice assistants.

Jarvis Formula of Voice Assistant

How on earth can we make Jarvis in everyone's mind? Through the continuous coupling of the mobile phone and the voice assistant, we can find that the following things are preperception conditions:

1. Continuously upgrading AI complex

From the fiercely competitive voice assistants of Google, Microsoft and Amazon to the domestic boom of smart speakers and the continuous upgrading of mobile voice assistants, the ability of AI has always been the "main plot" of this story.

With stepping into the era of mobile AI, the task of voice assistant has developed from the integration of AI technology to the three-in-one mode of integrating AI applications, AI capabilities and AI technology.

Looking back on the two years when Huawei mobile phones started the evolution of AI, many intelligent abilities began to emerge from EMUI in the 8.0 period. To the 9.0 era, began to strengthen the visual AI application, and the integration of AI capabilities. By 9. 1, voice assistants began to integrate more AI applications. It can be seen that AI from scratch, from weak to strong, from dispersion to integration, is the main line of the evolution of mobile phone assistants.

2. Hardware and software are no different from each other.

In mobile phones, the ability of a piece of software is always limited, and only when it is better combined with application, content and hardware can users get intelligent experience and practical application value.

It's as if Jarvis, who can only tease, is not liked, knows everything about astronomy and geography, and can manipulate the Iron Man suit is the real Jarvis. The reason why today's voice assistant has become the leader of Huawei and EMUI is largely due to Huawei's active exploration of the integration of software and hardware in recent years, breaking down the barriers of the old mobile phone model to some extent.

Being a voice assistant can mobilize smart vision and a variety of applications, and its value is as if your friend has suddenly been promoted from an employee to a boss.

3. Fold every interaction based on human nature

It is important to note that voice assistants are always accompanied by a way to show off their skills: because of the abundance of technology brought by AI, it is easy for developers to make voice interactions extremely complex. Although the original intention is to let consumers feel the charm of the technology explosion, but in practical applications, consumers are often too disturbed and can only stay away.

Therefore, in the evolution of voice assistant, it is necessary to fold and omit interaction based on user perception based on humanized product thinking. As long as the technology is complex and the interaction is simple, the voice assistant has affinity at all.

At this point, it is not difficult to find that the road to finding Jarvis seems to be summed up into such a formula: more complex intelligent technology, more integrated products, more human interaction = voice assistants are more like Jarvis.

And we have reason to believe that all searches will eventually reach the finish line.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report