In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)12/24 Report--
CTOnews.com December 14 news, according to the new generation of artificial intelligence alliance official news, recently, AVS3P10 real-time speech coding standard has made important progress.
On December 14, 2023, the 87th AVS working Group meeting opened in Chengdu. At the meeting, "Intelligent Media coding part 10 Real-time Voice" (hereinafter referred to as AVS3P10) WD 1.0 was deliberated by the plenary meeting, and the technical proposal submitted by Tencent was selected as the RM0 baseline of AVS3P10 real-time voice coding.
Real-time voice communication technology (CTOnews.com Note: RTC,Real-time Communication) has been widely used in collaborative office, interactive entertainment, social and other fields. The above diverse and rich application scenarios pose a variety of technical challenges to real-time voice communication technology, among which high-quality, low-delay, low-bandwidth and high-resistance speech coding is a very important part.
Traditional speech coders, including standard speech coders such as AVS and ITU-T, can recover high-quality wideband speech at the left and right bit rates of 16-20kbps, and high-quality UWB or even full-band speech at 30-35kbps. However, when the bit rate is further reduced (for example, below 10kbps), the recovery quality of traditional speech coders decreases significantly, which affects the user experience.
Based on the above application demands, at the 84th AVS meeting in March this year, Tencent proposed to launch a low bit rate and high quality voice system project for real-time voice communication scenarios in the AVS audio group. After demand analysis, at the 85th AVS meeting, AVS formally established the AV3P10 real-time voice coding project and issued a technical solicitation document through the AVS audio group. The AVS3P10 real-time voice coding project will be promoted and maintained by Xiao Wei from Tencent Conference Teana Lab.
At the 86th AVS meeting, the audio group considered the proposal of M7886 "AVS3P10 speech coding reference model candidate technology proposal" submitted by Tencent Conference Teana Lab.
The review noted that the programme has the following four characteristics:
It deeply integrates artificial intelligence technologies such as classical signal processing and deep neural network technology, and belongs to AI Codec.
Support for low bit rate, high quality coding, real-time coding and decoding, and multi-rate coding
Based on subband coding and multi-mode coding architecture, low-frequency signals use depth neural network to extract features, high-frequency signals use band expansion scheme to extract features, combined with scalar quantization and entropy coding to complete feature compression.
With the technical characteristics of open coding neural network architecture, the coding neural network can be re-modified and optimized on the basis of ensuring the forward compatibility of the code stream.
On November 1 this year, Tencent Teana Lab submitted the executable documents of AVS3P10 RM0 candidates, which were subjectively tested and cross-validated by the China Institute of Electronic Technology Standardization and Huawei respectively. Cross-verification strives to be comprehensive. Based on the ITU-T P.800 DCR subjective quality evaluation system, the subjective test covers many scenarios such as pure voice, packet loss speech and mixed speech under different bandwidth, and the test scenario after 3A processing is introduced into the source encoder test for the first time to test the performance of the new generation AI Codec technology close to the real scene.
In the above test scenarios, the quality advantage of AVS3P10 RM0 is obvious. Subjective test results show that AVS3P10 RM0 achieves more than 4.0 MOS scores in many major test scenarios such as broadband and ultra-wideband, showing obvious advantages, and the lowest bit rate can reach 5.9kbps. AVS3P10 RM0 adopts deep neural network technology and has its own packet loss damage ability, which effectively improves the quality of the encoder when the network is poor.
In addition, AVS3P10 RM0 also shows significant advantages in the objective quality evaluation experiment of ITU-T P.863. First, in all 8 test bit rates, AVS3P10 RM0 exceeds 4.0MOS, and the highest is 4.45MOS. The quality of AVS3P10 RM0 can align the performance of traditional signal processing encoders such as OPUS and EVS in medium and high bit rate, and achieve operation-level quality. In the field of AI Codec, the quality advantage of AVS3P10 RM0 is higher than 0.6MOS at similar bit rates. The above test results all reflect that AVS3P10 RM0 represents the highest level of AI Codec at present.
The new generation of artificial intelligence alliance said that AVS3P10 real-time speech coding, as a new generation of speech codec technology standard, is an important supplement to the AVS series standards.
In the future, the AVS3P10 real-time voice coding project will be carried out according to the established plan, and the standardization work is expected to be completed by mid-2024.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.