In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article for you to show the RTVC ASV and TTS module combined use of research sample analysis, concise and easy to understand, absolutely can make you shine, through the detailed introduction of this article I hope you can gain something.
0. description
I don't know how to overcome the Unseen Speaker problem when ASV's output SV Vector is applied to TTS.
Background Description:
Whether it's M2VoC or a transmuted version of Cross-lingual TTS, you can
ASV is used to get the timbre vector.
This vector doesn't necessarily have to represent timbre, it just needs to be concentrated on the same person.
Then this vector is combined with text to participate in TTS training, so that TTS is familiar with the vector
But if you haven't seen the speaker, you need ASV to extract more accurate, and TTS places to see more people.
So ASV takes the vector and finds the nearest one and replaces it with that one.
The extraction vector is the vector of the current sentence at training time, but the Inference can take 20 sentences randomly and then take the average.
So, look at the literature and discuss it.
1. summary phenomenon
SVV leads to Good cases
SVV causes Bad cases
are recorded, observed and binarized.
2. Pre-survey thoughts 2.1. Increased data
Don't change your thinking, increase VCTK similar thinking, train carefully
The main contribution can be seen in
Collection of public data sets
processing
and using
Construction of Final Test Set
2.2. SVV Find nearest
Instead of extracting SVV itself, look for his nearest one.
2.3. Multiple ASVs
One catch is not enough, reference is few, use multiple
Many of them can be in Chinese or English
2.4. GST
SVV is obtained using ASV, and then SVV is expressed as a weighted sum of several GSTs through Attention instead of directly using SVV, and then TTS is involved.
2.5. ASV Fine-Tune
Allow ASV to modify gradient backpropagation during training
However, this method TTS corpus is only 100 speaker level, while ASV corpus is 7000 level, so it is not easy to train.
3. LibriSpeech TTS
But there's been good cross-language work done before, and it hasn't involved this many speakers.
But use it first, see if it works.
The above content is an example of the research analysis of the combination of ASV and TTS modules in RTVC. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserves, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.