Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Java to implement speech engine

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "how to use Java to achieve voice engine", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to use Java to achieve voice engine" this article.

Try out the voice engine

To use this voice engine, you must add the javatalk.jar file provided in this article to CLASSPATH, and then run (or call from the Java program) the com.lotontech.speech.Talker class from the command line. If you run from the command line, the command is:

Java com.lotontech.speech.Talker "h | e | l | oo"

If called from a Java program, the code is:

Com.lotontech.speech.Talker talker=new com.lotontech.speech.Talker ()

Talker.sayPhoneWord ("h | e | l | oo")

Now, you may be confused about the "h | e | l | oo" string provided on the command line (or when the sayPhoneWord () method is called). Let me explain.

The speech engine works by connecting small sound samples, each of which is the smallest unit of human language pronunciation (English). These sound samples are called allophone. Each factor corresponds to one, two, or three letters. From the previous phonetic representation of "hello", we can see that the pronunciation of some letter combinations is obvious, while others are not:

H-the pronunciation is obvious

E-pronunciation is obvious

L-the pronunciation is obvious, but notice that the two "l" are reduced to one "l".

OO-should be pronounced as "hello", not as "bot" or "too".

Here is a list of valid phonemes:

A: such as cat

B: such as cab

C: such as cat

D: such as dot

E: such as bet

F: such as frog

G: such as frog

H: such as hog

I: such as pig

J: such as jig

K: such as keg

L: such as leg

M: such as met

N: such as begin

O: such as not

P: such as pot

R: such as rot

S: such as sat

T: such as sat

U: such as put

V: such as have

W: such as wet

Y: such as yet

Z: such as zoo

Aa: such as fake

Ay: such as hay

Ee: such as bee

Ii: such as high

Oo: such as go

Bb: the changing form of b with different stress

The changing form of dd: d with different stress

Ggg: the changing form of g with different stress

The changing form of hh: h, with different stress.

The changing form of ll: l, with different stress

The changing form of nn: n, with different stress

Rr: the changing form of r, with different stress

Tt: the changing form of t, with different stress.

The changing form of yy: y, with different stress

Ar: such as car

Aer: such as care

Ch: such as which

Ck: such as check

Ear: such as beer

Er: such as later

Err: such as later (long tone)

Ng: such as feeding

Or: such as law

Ou: such as zoo

Ouu: such as zoo (long tone)

Ow: such as cow

Oy: such as boy

Sh: such as shut

Th: such as thing

Dth: such as this

The changing form of uh: U

Wh: such as where

Zh: such as Asian

When a person speaks, his pronunciation rises and falls throughout the sentence. The change of intonation makes the pronunciation more natural and infectious, which makes the question and the statement different from each other. Please consider the following two sentences:

It is fake-- f | aa | k

Is it fake?-- f | AA | k

As you may have guessed, the way to improve your tone is to use uppercase letters.

That's what you need to know when using the software. If you are interested in the details of its background implementation, please read on.

Second, realize the voice engine

The implementation of the speech engine includes only one class and four methods. It takes advantage of the Java Sound API included in J2SE 1.3. I'm not going to give a comprehensive introduction to this API here, but you can learn its usage through examples. Java Sound API is not a particularly complex API, and the comments in the code will tell you what you must know.

The following is the basic definition of the Talker class:

Package com.lotontech.speech

Import javax.sound.sampled.*

Import java.io.*

Import java.util.*

Import java.net.*

Public class Talker

{

Private SourceDataLine line=null

}

If you execute Talker from the command line, the following main () method runs as an entry point. The main () method takes the first command-line argument and passes it to the sayPhoneWord () method:

/ *

* read out the string that represents the pronunciation specified on the command line

, /

Public static void main (String args [])

{

Talker player=new Talker ()

If (args.length > 0) player.sayPhoneWord (args [0])

System.exit (0)

}

The sayPhoneWord () method can be called either through the main () method above or directly in the Java program. On the face of it, the sayPhoneWord () method is complex, but in fact it is not. In fact, it simply traverses the phonetic elements of all words (separated by "|" in the input string) and plays them element by element through a sound output channel. To make the sound more natural, I merged the end of each sound sample with the beginning of the next sound sample:

/ *

* read the specified voice string

, /

Public void sayPhoneWord (String word)

{

/ / Analog byte array constructed for the previous sound

Byte [] previousSound=null

/ / split the input string into separate phonemes

StringTokenizer st=new StringTokenizer (word, "|", false)

While (st.hasMoreTokens ())

{

/ / construct the corresponding file name for the phoneme

String thisPhoneFile=st.nextToken ()

ThisPhoneFile= "/ allophones/" + thisPhoneFile+ ".au"

/ / read data from a sound file

Byte [] thisSound=getSound (thisPhoneFile)

If (previousSoundproof null)

{

/ / if possible, merge the previous phoneme with the current phoneme

Int mergeCount=0

If (previousSound.length > = 500 & & thisSound.length > = 500)

MergeCount=500

For (int item0; I

{

PreviousSound [previousSound.length-mergeCount+i]

= (byte) (previousSound [previousSound.length)

-mergeCount+i] + thisSound [I]) / 2)

}

/ / play the previous phoneme

PlaySound (previousSound)

/ / take the truncated current phoneme as the previous phoneme

Byte [] newSound=new byte [thisSound.length-mergeCount]

For (int ii=0; ii

NewSound [ii] = ThisSound [II + mergeCount]

PreviousSound=newSound

}

Else

PreviousSound=thisSound

}

/ / play the last phoneme to clear the sound channel

PlaySound (previousSound)

Drain ()

}

After sayPhoneWord (), you can see that it calls playSound () to output a single sound sample (that is, a phoneme), and then calls drain () to clean up the sound channel. Here is the code for playSound ():

/ *

* this method plays a sound sample

, /

Private void playSound (byte [] data)

{

If (data.length > 0) line.write (data, 0, data.length)

}

Here is the code for drain ():

/ *

* this method cleans up the sound channel

, /

Private void drain ()

{

If (lineage matching null) line.drain ()

Try {Thread.sleep (100);} catch (Exception e) {}

}

Looking back at sayPhoneWord (), there is another method that we haven't analyzed, namely, the getSound () method.

The getSound () method reads a pre-recorded sound sample as byte data from an au file. For detailed procedures for reading data, converting audio formats, initializing sound output lines (SouceDataLine), and constructing byte data, refer to the comments in the following code:

/ *

* this method reads a phoneme from a file

* and convert it to an byte array

, /

Private byte [] getSound (String fileName)

{

Try

{

URL url=Talker.class.getResource (fileName)

AudioInputStream stream = AudioSystem.getAudioInputStream (url)

AudioFormat format = stream.getFormat ()

/ / convert an ALAW/ULAW sound to PCM for playback

If ((format.getEncoding ()) = = AudioFormat.Encoding.ULAW) | |

(format.getEncoding () = = AudioFormat.Encoding.ALAW))

{

AudioFormat tmpFormat = new AudioFormat (

AudioFormat.Encoding.PCM_SIGNED

Format.getSampleRate (), format.getSampleSizeInBits () * 2

Format.getChannels (), format.getFrameSize () * 2

Format.getFrameRate (), true)

Stream = AudioSystem.getAudioInputStream (tmpFormat, stream)

Format = tmpFormat

}

DataLine.Info info = new DataLine.Info (

Clip.class, format

((int) stream.getFrameLength () * format.getFrameSize ()

If (line==null)

{

/ / the output line has not been instantiated yet

/ / is it possible to find a suitable output line type?

DataLine.Info outInfo = new DataLine.Info (SourceDataLine.class

Format)

If (! AudioSystem.isLineSupported (outInfo))

{

System.out.println ("output lines that do not support matching" + outInfo + ")

Throw new Exception ("output lines that do not support matching" + outInfo + ")

}

/ / Open the output line

Line = (SourceDataLine) AudioSystem.getLine (outInfo)

Line.open (format, 50000)

Line.start ()

}

Int frameSizeInBytes = format.getFrameSize ()

Int bufferLengthInFrames = line.getBufferSize () / 8

Int bufferLengthInBytes = bufferLengthInFrames * frameSizeInBytes

Byte [] data=new byte [bufferLengthInBytes]

/ / read byte data and count

Int numBytesRead = 0

If ((numBytesRead = stream.read (data))! =-1)

{

Int numBytesRemaining = numBytesRead

}

/ / cut the byte data to the appropriate size

Byte [] newData=new byte [numBytesRead]

For (int item0; I

NewData [I] = data [I]

Return newData

}

Catch (Exception e)

{

Return new byte [0]

}

}

This is all the code, including comments, a speech synthesizer of about 150 lines of code.

III. Text-to-speech conversion

It seems too complex to specify words to be read aloud in the format of phonetic elements, and if you want to build an application that can read text (such as Web pages or Email), we want to be able to specify the original text directly.

After an in-depth analysis of this problem, I provide an experimental text-to-speech class in the ZIP file later in this article. Run this class, and it will display the analysis results. The text-to-speech class can be executed from the command line, as follows:

Java com.lotontech.speech.Converter "hello there"

Output result classes such as:

Hello-> h | e | l | oo

There-> dth | aer

If you run this command:

Java com.lotontech.speech.Converter "I like to read JavaWorld"

The output is as follows:

I-> ii

Like-> l | ii | k

To-> t | ouu

Read-> r | ee | a | d

Java-> j | a | v | a

World-> w | err | l | d

How does this conversion class work? In fact, my method is quite simple, and the conversion process is to apply a set of text replacement rules in a certain order. For example, for the words "ant", "want", "wanted", "unwanted" and "unique", the substitution rules we want to apply might be:

Replace "* unique*" with "| y | ou | n | ee | k |"

Replace "* want*" with "| w | o | n | t |"

Replace "* a *" with "| a |"

Replace "* e*" with "| e |"

Replace "* d*" with "| d |"

Replace "* n*" with "| n |"

Replace "* u*" with "| u |"

Replace "* t*" with "| t |"

For "unwanted", the output sequence is:

Unwanted

Un [| w | o | n | t |] ed (Rule 2)

[| u |] [| n |] [| w | o | n | t |] [| e |] [| d |] (rules 4, 5, 6, 7)

U | n | w | o | n | t | e | d (after deleting redundant characters)

You will see that the word containing the letter "wont" is pronounced differently from the word containing the letter "ant". You will also see that under the special case rules, "unique" takes precedence over other rules as a complete word, so that the word "unique" is pronounced as "y | ou...". Instead of "u | n.".

The above is all the contents of the article "how to use Java to implement a voice engine". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report