How to perform speech recognition in Javascript applications 07/06 Update SLTechnology News&Howtos

How to perform speech recognition in Javascript applications

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article will explain in detail how to perform speech recognition in Javascript applications. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Speech recognition is an interdisciplinary sub-field of computer science and computational linguistics. It can recognize spoken language and translate it into text. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT).

Machine learning (ML) is an application of artificial intelligence (AI), which enables the system to automatically learn and improve from experience without explicit programming. Machine learning has provided most breakthroughs in speech recognition in this century. Nowadays, speech recognition technologies are everywhere, such as Apple Siri,Amazon Echo and Google Nest.

Speech recognition and voice response (also known as speech synthesis or text-to-speech (TTS)) are supported by Web speech API.

In this article, we focus on speech recognition in JavaScript applications. Another article introduces speech synthesis.

Speech recognition interface

SpeechRecognition is the controller interface that identifies the service, which is called webkitSpeechRecognition in Chrome. The SpeechRecognition processes the SpeechRecognitionEvent sent from the identification service. SpeechRecognitionEvent.results returns a SpeechRecognitionResultList object that represents all speech recognition results for the current session.

You can initialize SpeechRecognition with the following lines of code:

/ / create a SpeechRecognition object const recognition = new webkitSpeechRecognition (); / / configure settings to return continuous results for each recognition recognition.continuous = true; / / configure settings that should return temporary results recognition.interimResults = true; / / event handlers for correct recognition of words or phrases recognition.onresult = function (event) {console.log (event.results);}

Ognition.start () starts speech recognition, while ognition.stop () stops speech recognition, it can also stop (recognition.abort).

When the page is accessing your microphone, a microphone icon appears in the address bar to show that the microphone is on and running.

We say to the page in sentences. "hello comma I'm talking period." Onresult displays all interim results as we speak.

This is the HTML code for this example:

Speech Recognition _ window.onload = () = > {const button = document.getElementById ('button'); button.addEventListener (' click', () = > {if (button.style ['animation-name'] =' flash') {recognition.stop (); button.style ['animation-name'] =' none'; button.innerText = 'Press to Start' Content.innerText =';} else {button.style ['animation-name'] =' flash'; button.innerText = 'Press to Stop'; recognition.start ();}}); const content = document.getElementById (' content'); const recognition = new webkitSpeechRecognition (); recognition.continuous = true Recognition.interimResults = true; recognition.onresult = function (event) {let result ='; for (let I = event.resultIndex; I

< event.results.length; i++) { result += event.results[i][0].transcript; } content.innerText = result; }; }; button { background: yellow; animation-name: none; animation-duration: 3s; animation-iteration-count: infinite; } @keyframes flash { 0% { background: red; } 50% { background: green; } } Press to Start 第25行创建了 SpeechRecognition 对象，第26和27行配置了 SpeechRecognition 对象。当一个单词或短语被正确识别时，第28-34行设置一个事件处理程序。第19行开始语音识别，第12行停止语音识别。在第12行，单击该按钮后，它可能仍会打印出一些消息。这是因为 Recognition.stop() 尝试返回到目前为止捕获的SpeechRecognitionResult。如果您希望它完全停止，请改用 ognition.abort()。您会看到动画按钮的代码（第38-51行）比语音识别代码长。这是该示例的视频剪辑：https://youtu.be/5V3bb5YOnj0 以下是浏览器兼容性表：

Network speech recognition depends on the browser's own speech recognition engine. In Chrome, this engine performs identification in the cloud. Therefore, it can only be run online.

Speech recognition library

There are some open source speech recognition libraries, and here is a list of these libraries based on npm trends:

1. Annyang

Annyang is a JavaScript speech recognition library that is used to control websites through voice commands. It is based on SpeechRecognition Web API. In the next section, we will give examples of how annyang works.

2. Artyom.js

Artyom.js is a JavaScript speech recognition and speech synthesis library. It is based on the Web voice API. In addition to voice commands, it also provides voice response.

3. Mumble

Mumble is a JavaScript speech recognition library that is used to control websites through voice commands. It is based on SpeechRecognition Web API, which is similar to the way annyang works.

4. Julius.js

Julius is a large vocabulary continuous speech recognition (LVCSR) decoder software with high performance and small space consumption for speech-related researchers and developers. It can perform real-time decoding on a variety of computers and devices from microcomputers to cloud servers. Julis is built in C, while julius.js is what Julius thinks is a portable version of JavaScript.

5.voice-commands.js

Voice-commands.js is a JavaScript speech recognition library that is used to control websites through voice commands. It is based on SpeechRecognition Web API, which is similar to the way annyang works.

Annyang

Annyang initializes a SpeechRecognition object, which is defined as follows:

There are some API that can start or stop annyang:

Annyang.start: start listening using options (automatic restart, continuous or paused), such as annyang.start ({autoRestart:true,Continuous:false}).

Annyang.abort: stop listening (stop the SpeechRecognition engine or turn off the microphone).

Annyang.pause: stop listening (no need to stop the SpeechRecognition engine or turn off the microphone).

Annyang.resume: start listening without any options.

This is the HTML code for this example:

Annyang _ window.onload = () = > {const button = document.getElementById ('button'); button.addEventListener (' click', () = > {if (button.style ['animation-name'] =' flash') {annyang.pause (); button.style ['animation-name'] =' none'; button.innerText = 'Press to Start' Content.innerText =';} else {button.style ['animation-name'] =' flash'; button.innerText = 'Press to Stop'; annyang.start ();}}); const content = document.getElementById (' content') Const commands = {hello: () = > {content.innerText = 'You said hello.';},' hi * splats': (name) = > {content.innerText = `You greeted to ${name} .`;}, 'Today is: day': (day) = > {content.innerText = `You said ${day} .` },'(red) (green) (blue)': () = > {content.innerText = 'You said a primary color name.';},}; annyang.addCommands (commands);}; button {background: yellow; animation-name: none; animation-duration: 3s Animation-iteration-count: infinite;} @ keyframes flash {0% {background: red;} 50% {background: green;}} Press to Start

Line 7 adds the annyang source code.

Line 20 starts annyang and line 13 pauses annyang.

Annyang provides voice commands to control the web page (lines 26-42).

Line 27 is a simple command. If the user says hello, the page will reply, "you say 'Hello'."

Line 30 is the command with splats, which greedily captures the multi-word text at the end of the command. If you say "hi, Alice e", its answer is "you greet Alice." If you say "Hi, Alice and John", its answer is "say hello to Alice and John."

Line 33 is a command with named variables. The date of the week is captured as day and called out in the response.

Line 36 is a command with optional words. If you say "yellow", ignore it. If you mention any of the primary colors, you will respond with "you are talking about the primary color name".

All commands defined from lines 26 to 39 are added to the annyang at line 41.

We have learned about speech recognition in JavaScript applications, and Chrome provides the best support for Web voice API. All of our examples are implemented and tested on Chrome browsers.

Here are some tips when exploring Web Voice API: if you don't want to listen in your daily life, remember to turn off your voice recognition application.

So much for sharing about how to perform speech recognition in Javascript applications. I hope the above content can be of some help and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.