How to write a speech synthesis system based on Python 07/02 Update SLTechnology News&Howtos

How to write a speech synthesis system based on Python

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of how to write a speech synthesis system based on Python, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something after reading this article on how to write a speech synthesis system based on Python. Let's take a look.

Background

I have always been interested in the speech synthesis system, and I always want to be able to synthesize some content for myself, such as synthetic novels, broadcasting the e-books I downloaded, and so on.

Speech synthesis system

In fact, it is a speech synthesis tool, but this thing because many manufacturers have provided the form of API, so the difficulty of development is greatly reduced, only need to call a few API to achieve their own speech synthesis tool; although the sparrow is small, five organs are complete. At a large level, this is a small speech synthesis system.

Preparatory work

First of all, we need to install it on our computer

Anaconda

Python 3.7

Visual studio code

Steps

Here we choose the WebAPI interface of the iFLYTEK open platform.

First, let's go to the console to create an application.

After the creation, click on the application to enter, there is a detailed section of the application.

Click the speech synthesis on the left, and then go to the next level of online speech synthesis (streaming version)

On the upper right side, we need to get three things:

APPID

APISecret

APIKey

Code implementation

All right, let's do the code implementation, first install the two libraries we need.

Pip install websocket-clientpip install playsound

Next, we define a class play that contains four functions.

Class play: def _ _ init__ (self): # initialization function def play_sound (self): # playback audio function def select_vcn (self,*arg): # Select the drop-down box to set the speaker def xfyun_tts (self): # for speech synthesis

Here, you need to enter the appid, appkey and appsecret you just obtained from the iFLYTEK open platform console.

Def _ _ init__ (self): self.APP_ID = 'xxx' # Please fill in your own appid self.API_KEY =' xxx' # Please fill in your own appkey self.SECRET_KEY = 'xxx' # Please fill in your own appsecret self.root=tk.Tk () # initialization window self.root.title ("speech synthesis system") # window name Self.root.geometry ("600x550") # sets the window size self.root.resizable (0P0) # self.root.resizable (width=True Height=True) # sets whether the window can be changeable in width and height. Default is True self.lb=tk.Label (self.root,text=' Please choose a voice pronouncer') # tag self.tt=tk.Text (self.root,width=77,height=30) # multiline text box self.cb=ttk.Combobox (self.root) Width=12) # drop-down list box # set the contents of the drop-down list box self.cb ['values'] = ("Sweet Girl-Xiaoyan", "kind male Voice-for a long time", "intellectual female Voice-Xiao Ping", "lovely Child Voice-Xu Xiaobao", "cordial female Voice-Xiao Jing") self.cb.current (0) # sets the current selection status to 0 That is, the first item self.cb.bind (", self.select_vcn) self.tk_tts_file=tk.Label (self.root,text=' generates file name') self.b1=tk.Button (self.root,text=' for speech synthesis', width=10,height=1,command=self.xfyun_tts) # button self.tk_play=tk.Button (self.root,text=' playback', width=10,height=1) Command=self.play_sound) # Button # the location of each component self.tk_tts_file.place (Xerox 30 recording 500) self.b1.place (Xerox 300 meme 500) self.tk_play.place (Xerox 400 Encyclopedia 500) self.lb.place (Xerox 30 pencils 30) self.cb.place (Xerox 154 memorials 30) self.tt.place (xylene 30 pencils 60) self.root.mainloop ()

When the drop-down list is selected, set the corresponding speaker

Def select_vcn (self * arg): if self.cb.get () = 'sweet female voice-Xiaoyan': self.vcn= "xiaoyan" elif self.cb.get () = 'kind male voice-for a long time': self.vcn= "aisjiuxu" elif self.cb.get () = 'intellectual female voice-Xiao Ping': self.vcn= "aisxping" elif self.cb.get ( ) = 'lovely Children's Voice-Xu Xiaobao': self.vcn= "aisbabyxu" elif self.cb.get () = 'kind female Voice-Xiao Jing': self.vcn= "aisjinger" print (self.vcn)

Next, let's change the Python demo brought by Fei to make it more convenient to use.

#-*-coding:utf-8-*-# author: iflytek## this demo test runs in the following environment: Windows + Python3.7# the third-party library installed when this demo test runs successfully and its version are as follows: # cffi==1.12.3# gevent==1.4.0# greenlet==0.4.15# pycparser==2.19# six==1.12.0# websocket==0.2.1# websocket-client==0.56.0# synthesis small language needs to transfer small language text, Use small languages to pronounce vcn, Tte=unicode and modify the text encoding method # error code link: https://www.xfyun.cn/document/error-code (required when code returns error code) # # # # import websocketimport datetimeimport hashlibimport base64import hmacimport jsonfrom urllib.parse import urlencodeimport timeimport sslfrom wsgiref.handlers import format_date_timefrom datetimeimport datetimefrom timeimport mktimeimport _ thread as threadimport osimport waveSTATUS_FIRST_FRAME = 0 # Identification of the first frame STATUS_CONTINUE_FRAME = 1 # Intermediate frame Identification STATUS_LAST_FRAME = 2 # most Identification of the next frame PCM_PATH = ". / demo.pcm" class Ws_Param (object): # initialize def _ _ init__ (self): pass def set_tts_params (self) Text, vcn): if text! = ": self.Text = text if vcn! =": self.vcn = vcn # Service parameters (business) For more personalized parameters, see self.BusinessArgs = {"bgs": 1, "aue": "raw", "auf": "audio/L16" on the official website. Rate=16000 "," vcn ": self.vcn," tte ":" utf8 "} # you must use the following ways to use a small language, where unicode refers to the coding method of the small end of utf16. That is, "UTF-16LE"# self.Data = {" status ": 2," text ": str (base64.b64encode (self.Text.encode ('utf-16'))," UTF8 ")} self.Data = {" status ": 2," text ": str (base64.b64encode (self.Text.encode (' utf-8'))," UTF8 ")} def set_params (self, appid, apiSecret ApiKey): if appid! = "": self.APPID = appid # Common parameters (common) self.CommonArgs = {"app_id": self.APPID} if apiKey! = "": self.APIKey = apiKey if apiSecret! = "": self.APISecret = apiSecret # generate url Def create_url (self): url = 'wss://tts-api.xfyun.cn/v2/tts' # generate timestamp in RFC1123 format now = datetime.now () date = format_date_time (mktime (now.timetuple () # concatenated string signature_origin = "host:" + "ws-api.xfyun.cn" + "\ n" Signature_origin + = "date:" + date + "\ n" signature_origin + = "GET" + "/ v2/tts" + "HTTP/1.1" # encrypt with hmac-sha256 signature_sha = hmac.new (self.APISecret.encode ('utf-8')) Signature_origin.encode ('utf-8'), digestmod=hashlib.sha256). Digest () signature_sha = base64.b64encode (signature_sha) .decode (encoding='utf-8') authorization_origin = "api_key=\"% s\ ", algorithm=\"% s\ ", headers=\"% s\ ", signature=\"% s\ "% (self.APIKey," hmac-sha256 ") "host date request-line", signature_sha) authorization = base64.b64encode (authorization_origin.encode ('utf-8')) .decode (encoding='utf-8') # combine the requested authentication parameters into a dictionary v = {"authorization": authorization, "date": date "host": "ws-api.xfyun.cn"} url = url +'? + urlencode (v) return urldef on_message (ws, message): try: # print (message) try: message = json.loads (message) except Exception as e: print E) code = message ["code"] sid = message ["sid"] audio = message ["data"] ["audio"] audio = base64.b64decode (audio) status = message ["data"] ["status"] print (code, sid Status) if status = = 2: print ("ws is closed") ws.close () if code! = 0: errMsg = message ["message"] print ("sid:%s call error:%s code is:%s"% (sid, errMsg, code)) else: with open (PCM_PATH 'ab') as f: f.write (audio) except Exception as e: print ("receive msg,but parse exception:", e) # received websocket error handling def on_error (ws, error): print ("# error:" Error) # received websocket closed processing def on_close (ws): print ("# closed #") # received websocket connection establishment processing def on_open (ws): def run (* args): d = {"common": wsParam.CommonArgs, "business": wsParam.BusinessArgs, "data": wsParam.Data } d = json.dumps (d) print ("- > start sending text data") ws.send (d) if os.path.exists (PCM_PATH): os.remove (PCM_PATH) thread.start_new_thread (run, ()) def text2pcm (appid, apiSecret, apiKey, text, vcn, fname): wsParam.set_params (appid, apiSecret) ApiKey) wsParam.set_tts_params (text, vcn) websocket.enableTrace (False) wsUrl = wsParam.create_url () ws = websocket.WebSocketApp (wsUrl, on_message=on_message, on_error=on_error, on_close=on_close) ws.on_open = on_open ws.run_forever (sslopt= {"cert_reqs": ssl.CERT_NONE}) pcm2wav (PCM_PATH, fname) def pcm2wav (fname, dstname): with open (fname 'rb') as pcmfile: pcmdata = pcmfile.read () print (len (pcmdata)) with wave.open (dstname, "wb") as wavfile: wavfile.setparams ((1, 2, 16000, 0,' NONE', 'NONE') wavfile.writeframes (pcmdata) wsParam = Ws_Param ()

Finally, a speech synthesis system is realized.

This is the end of the article on "how to write a speech synthesis system based on Python". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to write a speech synthesis system based on Python". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.