Yun Zhisheng's four papers were selected into the International Top Conference INTERSPEECH 2023. 05/08 Update SLTechnology News&Howtos

Yun Zhisheng's four papers were selected into the International Top Conference INTERSPEECH 2023.

2025-05-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Recently, INTERSPEECH 2023, an international speech and language science and technology event, was held in Dublin, Ireland. Four papers published by Yun Zhisheng and Shanghai normal University were successfully included in the conference, and the results covered speech enhancement, speech recognition, anti-attack voiceprint and other research directions. This is after ACM MM 2023, Yunzhisheng AGI technical strength will be recognized by the international top club again in 2023.

INTERSPEECH enjoys a high reputation in the world and has extensive academic influence. It is a flagship international conference founded by the International Association for Voice Communications (ISCA). It is one of the top international conferences in the field of speech and language science and technology. There are strict entry barriers for participating enterprises and units. Previous INTERSPEECH conferences have been widely concerned by people in the field of speech research all over the world.

The recognition by the International Top Conference is not only the result of the cooperation between Yunzhisheng and Shanghai normal University to continuously explore intelligent voice technology, but also the strong support of Yunzhisheng AGI technology architecture.

Yun Zhisheng: creating a World of connected intuition through General artificial Intelligence (AGI)

Yunzhisheng AI Technology system and Utility X Strategy

As one of the pioneers in the industrialization of AGI technology in China, Yunzhisheng built the Atlas artificial intelligence infrastructure in 2016 and built the company's UniBrain technology platform, with the general cognitive model of UniGPT as the core, including intelligent components such as multimodal perception and generation, knowledge graph, couplet of things platform, and enhanced capabilities through the domain to provide efficient product support for Yunzhisheng, Wisdom, IoT and other businesses. Promote the implementation of the "U (Yunzhi brain) + X (application scenario)" strategy and implement the company's mission of "creating a world of interconnected intuition through general artificial intelligence (AGI)".

As an important component of Yunzhi brain (UniBrain), intelligent speech technology includes speech recognition, voiceprint recognition, speech synthesis and so on. It has been widely used in home, car, customer service and other fields. Taking the vehicle scene as an example, with the blessing of Yunzhisheng intelligent voice technology, powerful voice capabilities such as multi-tone recognition, continuous voice interaction, personalized voice broadcasting, WYSIWYG and fuzzy instruction matching can be realized, bringing users a more intelligent and natural interactive experience. With the continuous development of Yunzhisheng intelligent speech technology, its landing application in various scenes will be further accelerated. The inclusion of this paper fully confirms Yunzhisheng's technological innovation strength in the field of intelligent voice, at the same time, it will further consolidate its AGI technology base and accelerate the intelligent upgrading of Qianhang Baiye.

Next, Yun Zhisheng will continue to practice the "Utility X" strategy, work with Shanghai normal University and other university institutions to strengthen the research and development of basic theory and key technologies of AI, constantly expand AGI application scenarios, provide broader and more in-depth artificial intelligence solutions for the two fields of Wisdom and Wisdom, and strive to realize the bright vision of empowering all walks of life with artificial intelligence.

The following is an overview of the selected papers:

● research direction: speech enhancement

Paper title: A Mask Free Neural Network for Monaural Speech Enhancement

At present, the mainstream time-frequency speech enhancement systems use complex frequency spectrum as input, but the training tools do not support complex number, the complex number modeling method is not easy to train, and the masking-based method can not recover clean speech completely in theory. In order to solve the above problems, this paper proposes a speech enhancement system without masking. The system uses short-time discrete cosine transform (STDCT) as its feature, which not only has the same information completeness as STFT, but also is a real feature. On the basis of MetaFomer, we build a global and local module based on the lightweight architecture of MobileNet block and the design concept of NAFNet, and the whole network is stacked with these modules. The results show that, compared with other networks, the performance of MFNet reaches the level of SOTA, and the amount of computation is superior.

● research direction: speech recognition

Paper title: Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Speech recognition with low resource and heavy accent is one of the important challenges in the practical application of ASR technology. In this study, we propose a Conformer-based architecture called Aformer to make use of acoustic information from a large number of non-accent and limited accent training data. An ordinary encoder and an accent encoder are designed in Aformer to extract complementary acoustic information. In addition, we use a multi-channel approach to train Aformer and study three cross-information fusion methods to effectively combine information from general encoders and accent encoders. The results show that the proposed method is better than the Conformer baseline on the six intra-domain and extraterritorial accent test sets, and the word / word error rate is relatively reduced by 10.2% to 24.5%.

● research direction: speech recognition

Paper title: Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

In end-to-end automatic speech recognition (ASR), it is very important to develop an effective target modeling unit, and it is also an issue that everyone has been paying close attention to. We propose a speech-aided multi-objective unit (PMU) modeling method to enhance the Conformer-Transducer ASR system by progressive representation learning. Specifically, PMU first uses speech-aided subword modeling (PASM) and byte pair coding (BPE) to generate speech-induced and text-induced target units respectively. On this basis, we propose three frameworks for enhanced acoustic encoders, including basic PMU, paraCTC and paCTC, which integrate different levels of PASM and BPE units for CTC and transducer multitasking training. Experimental results on LibriSpeech and accent ASR test sets show that, compared with traditional BPE, the proposed PMU method significantly reduces the WER of LibriSpeech clean, other and six accented ASR test sets by 12.7%, 6.0% and 7.7%, respectively.

● research direction: anti-attack voiceprint

Paper title: Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection

Automatic speaker verification systems are usually vulnerable to spoofing attacks, especially invisible attacks. Due to the diversity of speech synthesis and speech conversion algorithms, how to improve the generalization ability of synthetic speech detection system is a challenging problem. In order to solve this problem, we propose an improved RawNet2, which improves RawNet2 by introducing an attention-based channel masking module, which includes three main components: SE, channel masking and global-local feature aggregation. The effectiveness of the system is evaluated on ASVspoof 2019 and ASVspoof 2021 datasets. Among them, ARawNet2 reached 4.61% on the ASVspoof 2019 LA task, 8.36% and 19.03% on the ASVspoof 2021 LA and DF tasks, respectively, 12.00% and 14.97% lower than the RawNet2 baseline.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.