Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize speech and word processing based on Python PaddleSpeech

2025-03-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Based on how to realize speech and word processing based on Python PaddleSpeech, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Environmental installation

First, let's take a look at the project structure and installation documentation.

Need Python3.7 or above, C++ environment, requirements installation and so on, I will say it in my order below.

1. Conda installs Python3.9 virtual environment

Use conda to install the python3.9 environment with the following command.

Conda create-n py39 python=3.9

2. Install Visual Studio 2019

Installation address: Microsoft C++ Generation tool-Visual Studio

Note that C++ desktop development needs to be checked when installing.

3. Install requirements.txt

Use the command to install requiremets.txt, as follows:

Pip install-r requirements.txt-I https://pypi.douban.com/simple

Note here that it doesn't matter if the paddlespeech_ctcdecoders installation fails, it can be omitted.

4. Install paddlepaddle and paddlespeech

The command is as follows:

Pip install paddlepaddle-I https://mirror.baidu.com/pypi/simplepip install paddlespeech-I https://pypi.tuna.tsinghua.edu.cn/simple

5. Download nltk_data

Follow the instructions in the project installation documentation.

My local directory address is as follows

Project verification

Let me verify the tts, asr and punctuation recovery functions respectively.

Tts speech synthesis

Use the command as follows:

Paddlespeech tts-- input "Nanjing is very cold now. Let's go to the Confucius Temple next time." -- output C:\ Users\ xxx\ Desktop\ 115.wav

Execution process

(dh_partner) D:\ spyder\ PaddleSpeech > paddlespeech tts-- input "Nanjing is very cold now. Let's go to Confucius Temple next time." -- output C:\ Users\ xxx\ Desktop\ 115.wavphones_dict: None [2022-01-05 17V 23 xxx 43642] [INFO] [log.py] [L57]-File C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh\ fastspeech3_nosil_baker_ckpt_0.4.zip md5 checking... [2022-01-05 17V 23V 44742] [INFO] [log.py] [L57]-Use pretrained model stored In: C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh\ fastspeech3_nosil_baker_ckpt_0.4self.phones_dict: C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh\ fastspeech3_nosil_baker_ckpt_0.4\ phone_id_map.txt [2022-01-05 17ghizua44743] [log.py] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh \ fastspeech3_nosil_baker_ckpt_0.4 [2022-01-05 1723 log.py 44744] [INFO] [log.py] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh\ fastspeech3_nosil_baker_ckpt_0.4\ default.yaml [2022-01-05 17viscous 23displacement 44744] [INFO] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc -zh\ fastspeech3_nosil_baker_ckpt_0.4\ snapshot_iter_76000.pdzself.phones_dict: C:\ Users\ huyi\ .paddlespeech\ models\ fastspeech3_csmsc-zh\ fastspeech3_nosil_baker_ckpt_0.4\ phone_id_map.txt [2022-01-05 1715 17purv 23mov 44745] [INFO] [log.py] [L57]-File C:\ Users\ huyi\ .paddlespeech\ models\ pwgan_csmsc-zh\ pwg_baker_ckpt_0 .4. Zip md5 checking... [2022-01-05 17 log.py 23 Frev 44782] [INFO] [log.py] [L57]-Use pretrained model stored in: C:\ Users\ huyi\ .paddlespeech\ models\ pwgan_csmsc-zh\ pwg_baker_ckpt_0.4 [2022-01-05 1723 log.py 44783] [INFO] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ pwgan_csmsc-zh\ Pwg_baker_ckpt_0.4 [2022-01-05 17V 23V 44783] [INFO] [log.py] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ pwgan_csmsc-zh\ pwg_baker_ckpt_0.4\ pwg_default.yaml [2022-01-05 17VR 23V 24785] [INFO] [L57]-C:\ Users\ huyi\ .paddlespeech\ models\ pwgan_csmsc-zh\ Pwg_baker_ckpt_0.4\ pwg_snapshot_iter_400000.pdzvocab_size: 268frontend roomencoderroomtype is transformerdecoder_type is transformerC:\ Users\ huyi\ .conda\ envs\ dh_partner\ lib\ site-packages\ paddle\ framework\ io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from' collections.abc' is deprecated since Python 3.3 And in 3.10 it will stop working if isinstance (obj, collections.Iterable) and not isinstance (obj [2022-01-05 17:23:51] [DEBUG] [_ _ init__.py:113] Building prefix dict from the default dictionary... Loading model from cache C:\ Users\ huyi\ AppData\ Local\ Temp\ jieba.cache [2022-01-05 17:23:51] [DEBUG] [_ _ init__.py:132] Loading model from cache C:\ Users\ huyi\ AppData\ Local\ Temp\ jieba.cacheLoading model cost 0.659 seconds. [2022-01-05 17:23:52] [DEBUG] [_ _ init__.py:164] Loading model cost 0.659 seconds.Prefix dict has been built successfully. [2022-01-05 17:23:52] [DEBUG] [_ _ init__.py:166] Prefix dict has been built successfully.C:\ Users\ huyi\ .conda\ envs\ dh_partner\ lib\ site-packages\ paddle\ fluid\ dygraph\ math_ Op_patch.py:251: UserWarning: The dtype of left and right variables are not the same Left dtype is paddle.int64, but right dtype is paddle.int32, the right dtype will convert to paddle.int64 warnings.warn ([2022-01-05 17purse 2315 58811] [INFO] [log.py] [L57]-Wave file has been generated: C:\ Users\ xxx\ Desktop\ 115.wav

The generated audio is as follows

Asr speech recognition

I use the audio generated by tts for asr recognition to see the effect. The command is as follows:

Paddlespeech asr-lang zh-input C:\ Users\ xxx\ Desktop\ 115.wav

The execution result is as follows

You can see that the last printed content is unpunctuated text output, or relatively accurate.

Punctuation recovery

Try punctuation recovery with this sentence. The command is as follows:

Paddlespeech text-- task punc-- input Nanjing is very cold now. Go to the Confucius Temple next time.

Execution result

This is the answer to the question about how to implement voice and word processing based on Python PaddleSpeech. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report