How to realize speech recognition when Python calls Baidu api 04/16 Update SLTechnology News&Howtos

How to realize speech recognition when Python calls Baidu api

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Python calls Baidu api how to achieve speech recognition, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Recently, I am learning python, doing some python exercises.

Exercises from a few years ago on github

One question goes like this:

Use Python to achieve: yell at the computer and automatically open the default website in the browser.

For example, when you yell "Baidu" at a laptop, the browser automatically opens the home page of Baidu.

Then start the modules (windows10) required for the corresponding functions of search, and sort out the ideas:

Local recording

Upload the recording and get the returned result

Group a map and open the corresponding web page according to the result

Required modules:

PyAudio: recording interface

Wave: open a recording file and set audio parameters

Requests:GET/POST

Why use Baidu speech recognition api? Because of the free trial.

Needless to say, log in to Baidu Cloud and create an application.

The simple summary is that

1. You can download and use SDK

two。 No need to download and use SDK

Choose 2.

Get token by assembling url according to document

POST the local audio to Baidu speech recognition server in JSON format and get the returned result.

Speech format

Format support: pcm (no compression), wav (no compression, pcm encoding), amr (compression format). Recommended pcm sampling rate: 16000 fixed value. Coding: mono with 16bit bit depth.

Baidu server will convert non-pcm format to pcm format, so using wav and amr will take extra time to convert.

Save to pcm format can be recognized, but windows built-in player can not recognize pcm format, so use wav format, after all, the module is wave?

The first is local recording.

Import wavefrom pyaudio import PyAudio, paInt16framerate = 16000 # sampling rate num_samples = 2000 # sampling point channels = 1 # channel sampwidth = 2 # sampling width 2bytesFILEPATH = 'speech.wav'def save_wave_file (filepath, data): wf = wave.open (filepath 'wb') wf.setnchannels (channels) wf.setsampwidth (sampwidth) wf.setframerate (framerate) wf.writeframes (b''.join (data)) wf.close () # recording def my_record (): pa = PyAudio () # Open a new audio stream stream = pa.open (format=paInt16, channels=channels, rate=framerate, input=True Frames_per_buffer=num_samples) my_buf = [] # storing recording data t = time.time () print ('recording...') While time.time () < t + 4: # set recording time (seconds) # Loop read, each time read 2000frames string_audio_data = stream.read (num_samples) my_buf.append (string_audio_data) print ('recording ends.') Save_wave_file (FILEPATH, my_buf) stream.close ()

Then get the token.

Import requestsimport base64 # Baidu voice requires base64 encoding of local voice binary data # assembling url to obtain token For details, see document base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"APIKey =" LZAdqHUGC*mbfKm "SecretKey =" WYPPwgHu*BU6GM* "HOST = base_url% (APIKey, SecretKey) def getToken (host): res = requests.post (host) return res.json () ['access_token'] # incoming voice binary data Several language choices provided by token#dev_pid for Baidu speech recognition def speech3text (speech_data, token, dev_pid=1537): FORMAT = 'wav' RATE =' 16000' CHANNEL = 1 CUID ='* 'SPEECH = base64.b64encode (speech_data). Decode (' utf-8') data = {'format': FORMAT,' rate': RATE, 'channel': CHANNEL 'cuid': CUID,' len': len (speech_data), 'speech': SPEECH,' token': token, 'dev_pid':dev_pid} url =' https://vop.baidu.com/server_api' headers= {'Content-Type':' application/json'} # r=requests.post (url,data=json.dumps (data), headers=headers) print ('identifying...') R = requests.post (url, json=data, headers=headers) Result = r.json () if 'result' in Result: return Result [' result'] [0] else: return Result

Finally, the returned results are matched. The module webbrowser is used here.

Webbrower.open (url)

Complete demo

#! / usr/bin/env python#-*-coding: utf-8-*-# Date: 2018-12-02 19:04:55import waveimport requestsimport timeimport base64from pyaudio import PyAudio PaInt16import webbrowserframerate = 16000 # sampling rate num_samples = 2000 # sampling point channels = 1 # channel sampwidth = 2 # sampling width 2bytesFILEPATH = 'speech.wav'base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"APIKey =" * "SecretKey =" * "HOST = base_url% (APIKey SecretKey) def getToken (host): res = requests.post (host) return res.json () ['access_token'] def save_wave_file (filepath, data): wf = wave.open (filepath,' wb') wf.setnchannels (channels) wf.setsampwidth (sampwidth) wf.setframerate (framerate) wf.writeframes (b''.join (data)) wf.close () def my_record (): pa = PyAudio () stream = pa.open (format=paInt16, channels=channels) Rate=framerate, input=True, frames_per_buffer=num_samples) my_buf = [] # count = 0 t = time.time () print ('recording...') While time.time () < t + 4: # seconds string_audio_data = stream.read (num_samples) my_buf.append (string_audio_data) print ('recording ends.') Save_wave_file (FILEPATH, my_buf) stream.close () def get_audio (file): with open (file, 'rb') as f: data = f.read () return datadef speech3text (speech_data, token) Dev_pid=1537): FORMAT = 'wav' RATE =' 16000' CHANNEL = 1 CUID ='* 'SPEECH = base64.b64encode (speech_data) .decode (' utf-8') data = {'format': FORMAT,' rate': RATE, 'channel': CHANNEL,' cuid': CUID, 'len': len (speech_data),' speech': SPEECH 'token': token,' dev_pid':dev_pid} url = 'https://vop.baidu.com/server_api' headers= {' Content-Type': 'application/json'} # r=requests.post (url,data=json.dumps (data), headers=headers) print (' identifying...') R = requests.post (url, json=data, headers=headers) Result = r.json () if 'result' in Result: return Result [' result'] [0] else: return Resultdef openbrowser (text): maps = {'Baidu': ['Baidu', 'baidu'],' Tencent': ['Tencent', 'tengxun'],' NetEase': ['NetEase' 'wangyi']} if text in maps [' Baidu']: webbrowser.open_new_tab ('https://www.baidu.com') elif text in maps [' Tencent']: webbrowser.open_new_tab ('https://www.qq.com') elif text in maps [' NetEase']: webbrowser.open_new_tab ('https://www.163.com/')') Else: webbrowser.open_new_tab ('https://www.baidu.com/s?wd=%s'% text) if _ _ name__ = =' _ main__': flag ='y' while flag.lower () = ='y': print ('Please enter numeric selection language:) devpid = input (' 1536: Putonghua (simple English)) 1537: Putonghua (punctuated), 1737: English, 1637: Cantonese, 1837: Sichuan\ n') my_record () TOKEN = getToken (HOST) speech = get_audio (FILEPATH) result = speech3text (speech, TOKEN, int (devpid)) print (result) if type (result) = str: openbrowser (result.strip (') ')) flag = input (' Continue? (YSEO):')

After testing, the effect of shouting is better.

After reading the above, have you mastered how Python calls Baidu api to achieve speech recognition? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.