How to automatically convert voice to text in Python 01/19 Update SLTechnology News&Howtos

How to automatically convert voice to text in Python

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to automatically convert voice to text in Python. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Build a development environment

Change to the directory where the Python virtual environment is saved. I keep my directory in the vens subdirectory under the user's home directory. Use the following command to create a new virtualenv for this project.

Python3-m venv ~ / venvs/pytranscribe

Activate virtualenv with the shell command:

Source ~ / venvs/pytranscribe/bin/activate

After executing the above command, the command prompt changes, so the name of virtualenv starts with the original command prompt format, and if your prompt is only $, it looks like this:

(pytranscribe) $

Keep in mind that you must activate your virtualenv in a new terminal window that uses dependencies in each virtualenv.

We can now install the request package into an activated but empty virtualenv.

Pip install requests==2.24.0

Look for output similar to the following to confirm that the appropriate package is installed correctly from PyPI.

(pytranscribe) $pip install requests==2.24.0Collecting requests==2.24.0 Using cached https://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whlCollecting certifi > = 17 April 2017.4 (from requests==2.24.0) Using cached https://files.pythonhosted.org/packages/5e/c4/6c4fe722df5343c33226f0b4e0bb042e4dc13483228b4718baf286f86d87/certifi-2020.6.20-py2.py3-none-any.whlCollecting urllib3 title 1.25.0 title 1.25.1 = 1.21.1 (from requests==2.24.0) Using cached https://files.pythonhosted.org/packages/9f/f0/a391d1463ebb1b233795cabfc0ef38d3db4442339de68f847026199e69d7/urllib3-1.25.10-py2.py3-none-any.whlCollecting chardet=3.0.2 (from requests==2.24.0) Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whlCollecting idna=2.5 (from requests==2.24.0) Using cached https:// Files.pythonhosted.org/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whlInstalling collected packages: certifi Urllib3, chardet, idna, requestsSuccessfully installed certifi-2020.6.20 chardet-3.0.4 idna-2.10 requests-2.24.0 urllib3-1.25.10

We have installed all the necessary dependencies, so we can start coding the application.

Upload, start, and transcribe audio

We have done everything we need to start building an application that converts audio to text. We will build this application in three files:

1. Upload_audio_file.py: upload your audio files to a secure location on the AssemblyAI service so that they can be processed. If your audio file is already accessible through the public URL, you do not need to perform this step, just follow this quick start (https://docs.assemblyai.com/overview/getting-started)

2. Initial_transcription.py: a file that tells API to transcribe and start immediately

3. Get_transcription.py: if the transcription is still being processed, the transcription status is displayed, or the transcription result is displayed after the processing is completed

Create a new directory called pytranscribe to store the files as we write them. Then go to the new project directory.

Mkdir pytranscibecd pytranscribe

We also need to export the AssemblyAI API key as an environment variable. Register your AssemblyAI account and log in to the AssemblyAI dashboard, and then copy your API token, as shown in the following screenshot:

Export ASSEMBLYAI_KEY=your-api-key-here

Note that the export command must be used in each command line window to ensure that this key is accessible. If you do not export the tag to ASSEMBLYAI_KEY in the environment in which you are running the script, the script we are writing will not be able to access API.

Now that we have created the project directory and set the API key to the environment variable, let's continue writing the code for the first file, which will upload the audio file to the AssemblyAI service.

Upload audio files and transcribe them

Create a new file called upload_audio_file.py and put the following code in it:

Import argparseimport osimport requestsAPI_URL = "https://api.assemblyai.com/v2/"def upload_file_to_api (filename):" Checks for a valid file and then uploads it to AssemblyAI so it can be saved to a secure URL that only that service can access. When the upload is complete we can then initiate the transcription API call. Returns the API JSON if successful, or None if file does not exist. "" If not os.path.exists (filename): return None def read_file (filename, chunk_size=5242880): with open (filename 'rb') as _ file: while True: data = _ file.read (chunk_size) if not data: break yield data headers= {' authorization': os.getenv ("ASSEMBLYAI_KEY")} response = requests.post (".join ([API_URL," upload "]), headers=headers Data=read_file (filename)) return response.json ()

The above code imports the argparse,os and request packages so that we can use them in this script. API_URL is a constant with the basic URL of an AssemblyAI service. We define the upload_file_to_api function with a single parameter, and filename should be a string that contains the absolute path to the file and its file name.

In the function, we check whether the file exists, and then stream the large file to AssemblyAI API using Request's chunk transfer encoding.

The getenv function of the os module reads the API set on the command line using the export command with getenv. Be sure to use the export command in the terminal where the script is running, otherwise the ASSEMBLYAI_ key value will be blank. If in doubt, use echo $ASSEMBLY_AI to see if this value matches your API key.

To use the upload_file_to_api function, add the following line of code to the upload_audio_file.py file so that we can execute this code correctly as a script called with the python command:

If _ name__ = = "_ main__": parser = argparse.ArgumentParser () parser.add_argument ("filename") args = parser.parse_args () upload_filename = args.filename response_json = upload_file_to_api (upload_filename) if not response_json: print ("file does not exist") else: print ("File uploaded to URL: {}" .format (response_json ['upload_url']))

The above code creates an ArgumentParser object that allows the application to take a single parameter from the command line to specify the object we want to access, read and upload to the file of the AssmeblyAI service.

If the file does not exist, the script displays a message that the file cannot be found. We did find the correct file in the path and uploaded the file using the code in the upload_file_to_api function.

Execute the complete upload_audio_file.py script on the command line by running it using the python command. Replace FULL_PATH_TO_FILE with the absolute path to the file you want to upload, for example, / Users/matt/devel/audio.mp3.

Python upload_audio_file.py FULL_PATH_TO_FILE

Assuming that the file is found in the location you specified, when the script finishes uploading the file, it will print a message with a unique URL:

File uploaded to URL: https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638

This URL is not public and can only be used by AssemblyAI services, so no one but you and the API you transcribe can access your file and its contents.

The important part is the last part of URL, which in this example is 463ce27f-0922-4ea9-9ce4-3353d84b5638. Save the unique identifier because we need to pass it to the next script that starts the transcription service.

Initiate transcription

Next, we will write some code to start the transcription. Create a new file called initial_transcription.py. Add the following code to the new file.

Import argparseimport osimport requestsAPI_URL = "https://api.assemblyai.com/v2/"CDN_URL =" https://cdn.assemblyai.com/"def initiate_transcription (file_id): "Sends a request to the API to transcribe a specific file that was previously uploaded to the API. This will not immediately return the transcription because it takes a moment for the service to analyze and perform the transcription, so there is a different function to retrieve the results." Endpoint = ".join ([API_URL," transcript "]) json= {" audio_url ":" .join ([CDN_URL, "upload/ {}" .format (file_id)])} headers= {"authorization": os.getenv ("ASSEMBLYAI_KEY"), "content-type": "application/json"} response = requests.post (endpoint, json=json, headers=headers) return response.json ()

We have the same import as the previous script and added a new constant CDN_URL that matches the separate URL where AssemblyAI stores the uploaded audio file.

The initiate_transcription function essentially sets a HTTP request to AssemblyAI API to start the transcription process for the audio file with the specific URL passed in. This is why file_id delivery is important: after completing the URL of the audio file, we tell AssemblyAI to retrieve it.

Complete the file by attaching this code so that it can be easily invoked with arguments from the command line.

If _ _ name__ = = "_ main__": parser = argparse.ArgumentParser () parser.add_argument ("file_id") args = parser.parse_args () file_id = args.file_id response_json = initiate_transcription (file_id) print (response_json)

Start the script by running the python command on the initiate_transcription file, passing in the unique file identifier you saved in the previous step.

# the FILE_IDENTIFIER is returned in the previous step and will# look something like this: 463ce27f-0922-4ea9-9ce4-3353d84b5638python initiate_transcription.py FILE_IDENTIFIER

API will send back the JSON response that the script prints to the command line.

{'audio_end_at': None,' acoustic_model': 'assemblyai_default',' text': None, 'audio_url':' https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638', 'speed_boost': False,' language_model': 'assemblyai_default',' redact_pii': False, 'confidence': None,' webhook_status_code': None, 'id':' gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067' 'status': 'queued',' boost_param': None, 'words': None,' format_text': True, 'webhook_url': None,' punctuate': True, 'utterances': None,' audio_duration': None, 'auto_highlights': False,' word_boost': [], 'dual_channel': None,' audio_start_from': None}

Note the value of the id key in the JSON response. This is the transcription identifier we need to retrieve the results of transcription. In this example, it is gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067. Copy the transcription identifier into your own response, as we will need it in the next step to check when the transcription process is complete.

Retrieve transcription results

We have uploaded and started the transcription process, so we will get the results as soon as we are ready.

The time it takes to return the results depends on the size of the file, so the next script will send a HTTP request to HTTP and report the status of the transcription, or print the output after completion.

Create a third Python file named get_transcription.py and put the following code in it.

Import argparseimport osimport requestsAPI_URL = "https://api.assemblyai.com/v2/"def get_transcription (transcription_id):" Requests the transcription from the API and returns the JSON response. " Endpoint = ".join ([API_URL," transcript/ {} ".format (transcription_id)]) headers = {" authorization ": os.getenv ('ASSEMBLYAI_KEY')} response = requests.get (endpoint) Headers=headers) return response.json () if _ _ name__ = = "_ _ main__": parser = argparse.ArgumentParser () parser.add_argument ("transcription_id") args = parser.parse_args () transcription_id = args.transcription_id response_json = get_transcription (transcription_id) if response_json ['status'] = "completed": for word in response_json [' words']: print (word ['text']) End= ") else: print (" current status of transcription request: {} ".format (response_json ['status']))

The above code has the same imports object as other scripts. In this new get_transcription function, we just need to call AssemblyAI API with our API key and the transcription identifier (not the file identifier) from the previous step. We retrieve the JSON response and return it.

In the main function, we process the transcription identifier passed in as a command-line argument and pass it to the get_transcription function. If the response JSON from the get_transcription function contains the completed state, we will print the transcription result. Otherwise, print the current status such as queued or processing before completed.

Invoke the script using the transcription identifiers from the command line and the previous section:

Python get_transcription.py TRANSCRIPTION_ID

If the service has not started processing the script, it returns queued, as shown below:

Current status of transcription request: queued

When the service is currently processing an audio file, it returns processing:

Current status of transcription request: processing

When this process is complete, our script will return the transcribed text, as you can see here:

An object relational mapper is a code library that automates the transfer of data stored in relational, databases into objects that are more commonly usedin application code or EMS are useful because they provide a high level... (output abbreviated) about how to automate voice-to-text sharing in Python, I hope the above content can be of some help to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.