In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you "how to use Java to achieve voice engine", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to use Java to achieve voice engine" this article.
Try out the voice engine
To use this voice engine, you must add the javatalk.jar file provided in this article to CLASSPATH, and then run (or call from the Java program) the com.lotontech.speech.Talker class from the command line. If you run from the command line, the command is:
Java com.lotontech.speech.Talker "h | e | l | oo"
If called from a Java program, the code is:
Com.lotontech.speech.Talker talker=new com.lotontech.speech.Talker ()
Talker.sayPhoneWord ("h | e | l | oo")
Now, you may be confused about the "h | e | l | oo" string provided on the command line (or when the sayPhoneWord () method is called). Let me explain.
The speech engine works by connecting small sound samples, each of which is the smallest unit of human language pronunciation (English). These sound samples are called allophone. Each factor corresponds to one, two, or three letters. From the previous phonetic representation of "hello", we can see that the pronunciation of some letter combinations is obvious, while others are not:
H-the pronunciation is obvious
E-pronunciation is obvious
L-the pronunciation is obvious, but notice that the two "l" are reduced to one "l".
OO-should be pronounced as "hello", not as "bot" or "too".
Here is a list of valid phonemes:
A: such as cat
B: such as cab
C: such as cat
D: such as dot
E: such as bet
F: such as frog
G: such as frog
H: such as hog
I: such as pig
J: such as jig
K: such as keg
L: such as leg
M: such as met
N: such as begin
O: such as not
P: such as pot
R: such as rot
S: such as sat
T: such as sat
U: such as put
V: such as have
W: such as wet
Y: such as yet
Z: such as zoo
Aa: such as fake
Ay: such as hay
Ee: such as bee
Ii: such as high
Oo: such as go
Bb: the changing form of b with different stress
The changing form of dd: d with different stress
Ggg: the changing form of g with different stress
The changing form of hh: h, with different stress.
The changing form of ll: l, with different stress
The changing form of nn: n, with different stress
Rr: the changing form of r, with different stress
Tt: the changing form of t, with different stress.
The changing form of yy: y, with different stress
Ar: such as car
Aer: such as care
Ch: such as which
Ck: such as check
Ear: such as beer
Er: such as later
Err: such as later (long tone)
Ng: such as feeding
Or: such as law
Ou: such as zoo
Ouu: such as zoo (long tone)
Ow: such as cow
Oy: such as boy
Sh: such as shut
Th: such as thing
Dth: such as this
The changing form of uh: U
Wh: such as where
Zh: such as Asian
When a person speaks, his pronunciation rises and falls throughout the sentence. The change of intonation makes the pronunciation more natural and infectious, which makes the question and the statement different from each other. Please consider the following two sentences:
It is fake-- f | aa | k
Is it fake?-- f | AA | k
As you may have guessed, the way to improve your tone is to use uppercase letters.
That's what you need to know when using the software. If you are interested in the details of its background implementation, please read on.
Second, realize the voice engine
The implementation of the speech engine includes only one class and four methods. It takes advantage of the Java Sound API included in J2SE 1.3. I'm not going to give a comprehensive introduction to this API here, but you can learn its usage through examples. Java Sound API is not a particularly complex API, and the comments in the code will tell you what you must know.
The following is the basic definition of the Talker class:
Package com.lotontech.speech
Import javax.sound.sampled.*
Import java.io.*
Import java.util.*
Import java.net.*
Public class Talker
{
Private SourceDataLine line=null
}
If you execute Talker from the command line, the following main () method runs as an entry point. The main () method takes the first command-line argument and passes it to the sayPhoneWord () method:
/ *
* read out the string that represents the pronunciation specified on the command line
, /
Public static void main (String args [])
{
Talker player=new Talker ()
If (args.length > 0) player.sayPhoneWord (args [0])
System.exit (0)
}
The sayPhoneWord () method can be called either through the main () method above or directly in the Java program. On the face of it, the sayPhoneWord () method is complex, but in fact it is not. In fact, it simply traverses the phonetic elements of all words (separated by "|" in the input string) and plays them element by element through a sound output channel. To make the sound more natural, I merged the end of each sound sample with the beginning of the next sound sample:
/ *
* read the specified voice string
, /
Public void sayPhoneWord (String word)
{
/ / Analog byte array constructed for the previous sound
Byte [] previousSound=null
/ / split the input string into separate phonemes
StringTokenizer st=new StringTokenizer (word, "|", false)
While (st.hasMoreTokens ())
{
/ / construct the corresponding file name for the phoneme
String thisPhoneFile=st.nextToken ()
ThisPhoneFile= "/ allophones/" + thisPhoneFile+ ".au"
/ / read data from a sound file
Byte [] thisSound=getSound (thisPhoneFile)
If (previousSoundproof null)
{
/ / if possible, merge the previous phoneme with the current phoneme
Int mergeCount=0
If (previousSound.length > = 500 & & thisSound.length > = 500)
MergeCount=500
For (int item0; I
{
PreviousSound [previousSound.length-mergeCount+i]
= (byte) (previousSound [previousSound.length)
-mergeCount+i] + thisSound [I]) / 2)
}
/ / play the previous phoneme
PlaySound (previousSound)
/ / take the truncated current phoneme as the previous phoneme
Byte [] newSound=new byte [thisSound.length-mergeCount]
For (int ii=0; ii
NewSound [ii] = ThisSound [II + mergeCount]
PreviousSound=newSound
}
Else
PreviousSound=thisSound
}
/ / play the last phoneme to clear the sound channel
PlaySound (previousSound)
Drain ()
}
After sayPhoneWord (), you can see that it calls playSound () to output a single sound sample (that is, a phoneme), and then calls drain () to clean up the sound channel. Here is the code for playSound ():
/ *
* this method plays a sound sample
, /
Private void playSound (byte [] data)
{
If (data.length > 0) line.write (data, 0, data.length)
}
Here is the code for drain ():
/ *
* this method cleans up the sound channel
, /
Private void drain ()
{
If (lineage matching null) line.drain ()
Try {Thread.sleep (100);} catch (Exception e) {}
}
Looking back at sayPhoneWord (), there is another method that we haven't analyzed, namely, the getSound () method.
The getSound () method reads a pre-recorded sound sample as byte data from an au file. For detailed procedures for reading data, converting audio formats, initializing sound output lines (SouceDataLine), and constructing byte data, refer to the comments in the following code:
/ *
* this method reads a phoneme from a file
* and convert it to an byte array
, /
Private byte [] getSound (String fileName)
{
Try
{
URL url=Talker.class.getResource (fileName)
AudioInputStream stream = AudioSystem.getAudioInputStream (url)
AudioFormat format = stream.getFormat ()
/ / convert an ALAW/ULAW sound to PCM for playback
If ((format.getEncoding ()) = = AudioFormat.Encoding.ULAW) | |
(format.getEncoding () = = AudioFormat.Encoding.ALAW))
{
AudioFormat tmpFormat = new AudioFormat (
AudioFormat.Encoding.PCM_SIGNED
Format.getSampleRate (), format.getSampleSizeInBits () * 2
Format.getChannels (), format.getFrameSize () * 2
Format.getFrameRate (), true)
Stream = AudioSystem.getAudioInputStream (tmpFormat, stream)
Format = tmpFormat
}
DataLine.Info info = new DataLine.Info (
Clip.class, format
((int) stream.getFrameLength () * format.getFrameSize ()
If (line==null)
{
/ / the output line has not been instantiated yet
/ / is it possible to find a suitable output line type?
DataLine.Info outInfo = new DataLine.Info (SourceDataLine.class
Format)
If (! AudioSystem.isLineSupported (outInfo))
{
System.out.println ("output lines that do not support matching" + outInfo + ")
Throw new Exception ("output lines that do not support matching" + outInfo + ")
}
/ / Open the output line
Line = (SourceDataLine) AudioSystem.getLine (outInfo)
Line.open (format, 50000)
Line.start ()
}
Int frameSizeInBytes = format.getFrameSize ()
Int bufferLengthInFrames = line.getBufferSize () / 8
Int bufferLengthInBytes = bufferLengthInFrames * frameSizeInBytes
Byte [] data=new byte [bufferLengthInBytes]
/ / read byte data and count
Int numBytesRead = 0
If ((numBytesRead = stream.read (data))! =-1)
{
Int numBytesRemaining = numBytesRead
}
/ / cut the byte data to the appropriate size
Byte [] newData=new byte [numBytesRead]
For (int item0; I
NewData [I] = data [I]
Return newData
}
Catch (Exception e)
{
Return new byte [0]
}
}
This is all the code, including comments, a speech synthesizer of about 150 lines of code.
III. Text-to-speech conversion
It seems too complex to specify words to be read aloud in the format of phonetic elements, and if you want to build an application that can read text (such as Web pages or Email), we want to be able to specify the original text directly.
After an in-depth analysis of this problem, I provide an experimental text-to-speech class in the ZIP file later in this article. Run this class, and it will display the analysis results. The text-to-speech class can be executed from the command line, as follows:
Java com.lotontech.speech.Converter "hello there"
Output result classes such as:
Hello-> h | e | l | oo
There-> dth | aer
If you run this command:
Java com.lotontech.speech.Converter "I like to read JavaWorld"
The output is as follows:
I-> ii
Like-> l | ii | k
To-> t | ouu
Read-> r | ee | a | d
Java-> j | a | v | a
World-> w | err | l | d
How does this conversion class work? In fact, my method is quite simple, and the conversion process is to apply a set of text replacement rules in a certain order. For example, for the words "ant", "want", "wanted", "unwanted" and "unique", the substitution rules we want to apply might be:
Replace "* unique*" with "| y | ou | n | ee | k |"
Replace "* want*" with "| w | o | n | t |"
Replace "* a *" with "| a |"
Replace "* e*" with "| e |"
Replace "* d*" with "| d |"
Replace "* n*" with "| n |"
Replace "* u*" with "| u |"
Replace "* t*" with "| t |"
For "unwanted", the output sequence is:
Unwanted
Un [| w | o | n | t |] ed (Rule 2)
[| u |] [| n |] [| w | o | n | t |] [| e |] [| d |] (rules 4, 5, 6, 7)
U | n | w | o | n | t | e | d (after deleting redundant characters)
You will see that the word containing the letter "wont" is pronounced differently from the word containing the letter "ant". You will also see that under the special case rules, "unique" takes precedence over other rules as a complete word, so that the word "unique" is pronounced as "y | ou...". Instead of "u | n.".
The above is all the contents of the article "how to use Java to implement a voice engine". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.