Azure Cognitive Services- Spee 07/01 Update SLTechnology News&Howtos

Azure Cognitive Services- Spee

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Speech service is a kind of cognitive service, which provides voice-to-text, text-to-text, voice translation and so on. Today, what we actually fight is voice-to-text (Speech To Text).

STT supports two access methods, 1. It's SDK,2. It's REST API.

Where:

SDK supports the recognition of microphone voice streams and voice files

REST API mode only supports voice files.

Preparation: create a Speech service for cognitive services:

Cdn.nlark.com/yuque/0/2020/png/741540/1580645460436-abeba30d-7098-41ba-a3bf-e932146333c1.png ">

After creation, two important parameters can be viewed on the page:

one。 Convert voice files to text in REST API mode:

For the Speech API endpoint of Azure global, please refer to:

Https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/rest-speech-to-text#regions-and-endpoints

Speech API endpoints in Azure China:

As of February 2020, only two regions in eastern China have activated Speech services, and the service terminations are:

Https://chinaeast2.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1

For Speech To Text, there are two ways to authenticate:

Among them, Authorization Token is valid for 10 minutes.

For simplicity, the Ocp-Apim-Subscription-Key method is used in this paper.

Note: if you want to achieve text-to-speech, according to the above table, you must use Authorization Token for authentication.

Additional considerations for building the request:

File format:

Request header:

It should be noted that Key or Authorization is a relationship that chooses one of the two.

Request parameters:

The example in Postman is as follows:

If you want to use Authorization Token in REST API, you need to get Token first:

Global gets the endpoint of Token:

Https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/rest-speech-to-text#authentication

The end point of obtaining Token in China:

As of 2020.02, only East China 2 has Speech services, and its Token endpoints are:

Https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issuetoken

The reference for obtaining Token for Postman is as follows:

two。 Convert voice files to text in SDK (Python example):

Similar code can be seen on the official website, but it should be noted that this code only works in Azure Global's Speech service and needs to be modified for China (see below).

Import azure.cognitiveservices.speech as speechsdk # Creates an instance of a speech config with specified subscription key and service region. # Replace with your own subscription key and service region (e.g., "chinaeast2"). Speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion" speech_config = speechsdk.SpeechConfig (subscription=speech_key, region=service_region) # Creates an audio configuration that points to an audio file # Replace with your own audio filename. Audio_filename = "whatstheweatherlike.wav" audio_input = speechsdk.AudioConfig (filename=audio_filename) # Creates a recognizer with the given settings speech_recognizer = speechsdk.SpeechRecognizer (speech_config=speech_config, audio_config=audio_input) print ("Recognizing first result...") # Starts speech recognition, and returns after a single utterance is recognized. The end of a # single utterance is determined by listening for silence at the end or until a maximum of 15 # seconds of audio is processed. The task returns the recognition text as result. # Note: Since recognize_once () returns only a single utterance, it is suitable only for single # shot recognition like command or query. # For long-running multi-utterance recognition, use start_continuous_recognition () instead. Result = speech_recognizer.recognize_once () # Checks result. If result.reason = = speechsdk.ResultReason.RecognizedSpeech: print ("Recognized: {}" .format (result.text)) elif result.reason = = speechsdk.ResultReason.NoMatch: print ("No speech could be recognized: {}" .format (result.no_match_details)) elif result.reason = = speechsdk.ResultReason.Canceled: cancellation_details = result.cancellation_details print ("Speech Recognition canceled: {}" .format (cancellation_details.reason)) if cancellation_details.reason = = speechsdk.CancellationReason.Error: print "Error details: {}" .format (cancellation_details.error_details))

The code provides the page:

Https://docs.azure.cn/zh-cn/cognitive-services/speech-service/quickstarts/speech-to-text-from-file?tabs=linux&pivots=programming-language-python#create-a-python-application-that-uses-the-speech-sdk

For China, you need to use custom endpoints in order to use SDK:

Speech_key, service_region = "Your Key", "chinaeast2" template = "wss:// {} .stt.speech.azure.cn / speech/recognition"\ "/ conversation/cognitiveservices/v1?initialSilenceTimeoutMs= {: d} & language=zh-CN" speech_config = speechsdk.SpeechConfig (service_region, int (initial_silence_timeout_ms)

The complete code for China is as follows:

#! / usr/bin/env python# coding: utf-8# Copyright (c) Microsoft. All rights reserved.# Licensed under the MIT license. See LICENSE.md file in the project root for full license information. "" Speech recognition samples for the Microsoft Cognitive Services Speech SDK "" import timeimport wavetry: import azure.cognitiveservices.speech as speechsdkexcept ImportError: print ("" Importing the Speech SDK for Python failed. Refer to https://docs.microsoft.com/azure/cognitive-services/speech-service/quickstart-python for installation instructions. ") Import sys sys.exit (1) # Set up the subscription info for the Speech Service:# Replace with your own subscription key and service region (e.g., "westus") .speech_key, service_region = "your key" "chinaeast2" # Specify the path to an audio file containing speech (mono WAV / PCM with a sampling rate of kHz). Filename = "D:\ FFOutput\ speechtotext.wav" def speech_recognize_once_from_file_with_custom_endpoint_parameters (): "" performs one-shot speech recognition with input from an audio file Specifying an endpoint with custom parameters "" initial_silence_timeout_ms = 15 * 1e3 template = "wss:// {} .stt.speech.azure.cn / speech/recognition/conversation/cognitiveservices/v1?initialSilenceTimeoutMs= {: d} & language=zh-CN" speech_config = speechsdk.SpeechConfig (subscription=speech_key, endpoint=template.format (service_region, int (initial_silence_timeout_ms)) print ("Using endpoint") Speech_config.get_property (speechsdk.PropertyId.SpeechServiceConnection_Endpoint)) audio_config = speechsdk.audio.AudioConfig (filename=filename) # Creates a speech recognizer using a file as audio input. # The default language is "en-us". Speech_recognizer = speechsdk.SpeechRecognizer (speech_config=speech_config Audio_config=audio_config) result = speech_recognizer.recognize_once () # Check the result if result.reason = = speechsdk.ResultReason.RecognizedSpeech: print ("Recognized: {}" .format (result.text)) elif result.reason = = speechsdk.ResultReason.NoMatch: print ("No speech could be recognized: {}" .format (result.no_match_details)) elif result.reason = = speechsdk.ResultReason.Canceled: cancellation _ details = result.cancellation_details print ("Speech Recognition canceled: {}" .format (cancellation_details.reason)) if cancellation_details.reason = = speechsdk.CancellationReason.Error: print ("Error details: {}" .format (cancellation_details.error_details)) speech_recognize_once_from_file_with_custom_endpoint_parameters ()

It is important to note that if we use SDK to recognize the voice in the microphone, we will

Speech_recognizer = speechsdk.SpeechRecognizer (speech_config=speech_config, audio_config=audio_config)

Change it to the following (remove the audio_config parameter):

Speech_recognizer = speechsdk.SpeechRecognizer (speech_config=speech_config)

Official account link: https://mp.weixin.qq.com/s/NA9kQsVDfzTXEqHMTdDExA

Finch address: https://www.yuque.com/seanyu/azure/blwb5i

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.