Comparison of Core Technologies between H.264 and AVS Video Standard 04/20 Update SLTechnology News&Howtos

Comparison of Core Technologies between H.264 and AVS Video Standard

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

One of the most important developments in video coding technology in the past few years is the development of H.264/MPEG-4 AVC [8] standard by the joint video team (JVT) of ITU and ISO/IEC. In the process of development, the industry has chosen many different names for this new standard. ITU began to use important new coding tools to deal with H.26L (long-term) in 1997, and the results were encouraging, so ISO decided to join hands with ITU to form JVT and adopt a common standard. As a result, you sometimes hear people call this standard JVT, although it is not an official name. ITU approved the new H.264 standard in May 2003. ISO approved the standard in October 2003 under the name MPEG-4 Part 10, Advanced Video coding, or AVC.

Improvements in H.264 implementation create new market opportunities

H.264/AVC has made a great breakthrough in compression efficiency, which is about 2 times higher than that of MPEG-2 and MPEG-4 simplified classes. In the formal test conducted by JVT, H.264 achieved more than 1.5 times the coding efficiency in 78% of the 85 test cases, more than 2 times in 77 cases, and even as high as 4 times in some cases. The improvements implemented in H.264 have created new market opportunities, such as 600Kbps's VHS-quality video can be video-on-demand through ADSL lines, and high-definition movies can adapt to ordinary DVD without a new laser head.

H.264 standardization supports three categories: basic class, main class and extended class. A later revision called High Fidelity extension (FRExt) introduced four additional classes called advanced classes. In the early days, it was mainly the basic class and the main class that aroused everyone's interest. The basic class reduces the computing and system memory requirements, and is optimized for low latency. Because of the inherent delay of B frame and the computational complexity of CABAC, it does not include both. Basic classes are ideal for videophone applications and other applications that require low-cost real-time coding.

The main class provides the highest compression efficiency, but it also requires much higher processing power than the basic class, so it is difficult to be used in low-cost real-time coding and low-delay applications. Broadcast and content storage applications are most interested in the main class in order to achieve the highest video quality at the lowest bit rate possible.

Although H.264 uses the same main coding functions as the old standard, it also has many new functions that are different from the old standard, which together improve the coding efficiency. The main differences are summarized as follows:

Intra prediction and coding: H.264 uses spatial intra prediction technology to predict the pixels in the Intra-MB of adjacent pixels of adjacent blocks. It encodes the prediction residual signal and the prediction mode instead of the actual pixels in the coding block. This can significantly improve the efficiency of intra coding.

Interframe prediction and coding: interframe coding in H.264 not only adopts the main functions of the old standard, but also increases flexibility and maneuverability, including several block size options for a variety of functions, such as motion compensation, 1/4 pixel motion compensation, multiple reference frames, universal (generalized) bidirectional prediction and adaptive loop deblocking.

Variable vector block size: allows motion compensation to be performed with different block sizes. A single motion vector can be transmitted for blocks as small as 4 (4), so up to 32 motion vectors can be transmitted for a single MB in the case of two-way prediction. 16 (8, 8 (16, 8), 8 (4, and 4 (8) block sizes are also supported. Reducing the block size can improve the processing ability of motion details, thus improving the subjective quality perception, including the elimination of large fragmentation distortion.

1/4 pixel motion estimation: motion compensation can be improved by allowing half-pixel and 1/4-pixel motion vector resolution.

Multi-reference frame prediction: 16 different reference frames can be used for interframe coding, which can improve the subjective perception of video quality and improve the coding efficiency. Providing multiple reference frames also helps to improve the fault tolerance of H.264 bitstreams. It is worth noting that this feature increases the memory requirements of encoders and xxx because multiple reference frames must be kept in memory.

Adaptive loop deblocking filter: H.264 uses an adaptive deblocking filter, which processes the horizontal and vertical block edges in the prediction loop to eliminate the distortion caused by block prediction errors. This kind of filtering is usually based on 4 (4-block boundaries), in which 3 pixels on each side of the boundary can be updated by a 4-stage filter.

Integer transformation: the early standard of DCT must define the tolerance range of rounding error for the fixed point implementation of inverse transformation. The drift caused by the IDCT precision mismatch between the encoder and the encoder is the source of quality loss. H.264 solves this problem by using the integer 4 (4) spatial transform, which is an approximation of DCT. The small blocks of 4 (4) also help to reduce blocking and ring distortion.

Quantization and transform coefficient scanning: the transform coefficient is quantized by scalar quantization without increasing the dead zone. Similar to the previous standard, each MB can choose a different quantization step size, but the step size increases at a compound rate of about 12.5%, rather than a fixed increment. At the same time, a finer quantization step can also be used for chromaticity components, especially in the case of coarse quantization photometric coefficients.

Entropy coding: unlike previous standards that provide multiple static VLC tables based on the data type involved, H.264 uses context adaptive VLC for transform coefficients and a unified VLC (UniversalVLC) method for all other symbols. The main class also supports the new context adaptive binary arithmetic encoder (CABAC). CAVLC is superior to previous VLC implementations, but at a higher cost than VLC.

CABAC uses the probability model of encoders and decoders to process all syntax elements (syntax elements), including transform coefficients and motion vectors. In order to improve the coding efficiency of arithmetic coding, the basic probability model adapts to the statistics of constantly changing video frames through a method called context modeling. Context modeling analysis provides conditional probability estimates of coded symbols. As long as the appropriate context model is used, we can switch between different probability models according to the encoded symbols around the symbols to be encoded, and then make full use of the redundancy between symbols. Each syntax element can maintain a different model (for example, motion vectors and transform coefficients have different models). Compared with VLC entropy coding method (UVLC/CAVLC), CABAC can save 10%bit rate more.

Weighted prediction: it uses the weighted sum of forward and backward prediction to establish the prediction of bi-directional interpolation macro module, which can improve the coding efficiency when the scene changes, especially in the case of fading.

Fidelity range extension: in July 2004, the H.264 standard added a new revision called fidelity range extension (FRExt) [11]. This extension adds a complete set of tools to H.264 and allows for additional color gamut, video format, and bit depth. In addition, support for lossless interframe coding and stereoscopic display video is added. The revised version of FRExt introduces four new classes in H.264, namely:

High Profile (HP): for standard 4:2:0 chromaticity sampling, 8 bits per component. This class introduces new tools-- described in more detail later.

High 10 Profile (Hi10P): standard 4:2:0 chromaticity sampling, 10-bit color for higher definition video display.

High 4:2:2 10 bit color profile (H422P): for source editing.

High 4:4:4 12 bit color profile (H444P): highest quality source editing and color fidelity, support for lossless coding of video regions and conversion with new integer gamut (from RGB to YUV and black).

In the new application field, H.264 HP is particularly beneficial to broadcasting and DVD. Some experiments show that the performance of H.264 HP is 3 times higher than that of MPEG2. The main additional tools introduced in H.264 HP are described below.

Adaptive residual block size and integer 8 (8 transform: the residual block used for transform coding can be switched between 8 (8 and 4 (4). A new 16-bit integer transform for 8 (8 blocks) is introduced. Smaller blocks can still use the previous 4 (4) transformation.

8 (8 luminance intra prediction: 8 modes have been added to enable the luminance internal macro module to make intra prediction of 8 (8 blocks) in addition to the previous 16 (16 and 4 blocks).

Quantization weighting: a new quantization weighting matrix for quantizing 8 (8 transform coefficients).

Monochrome: black / white video coding is supported.

AVS

In 2002, the Audio and Video Technology Standard (AVS) working Group established by the Ministry of Information Industry of China announced that it was preparing to write a national standard for mobile multimedia, broadcasting, DVD and other applications. The video standard, called AVS [14], consists of two related parts: AVS-M for mobile video applications and AVS1.0 for broadcast and DVD. The AVS standard is similar to H.264.

AVS1.0 supports both interlaced and progressive modes. P frame in AVS can make use of forward reference frame of 2 frames, while B frame is allowed to use one frame before and after each frame. In interlaced mode, four fields can be used as a reference. Frame / field coding in interlaced mode can be performed only at the frame level, unlike H.264, which allows MB-level adaptation of this option. AVS has a loop filter similar to H.264 and can be turned off at the frame level. In addition, the B frame does not need a loop filter. Intra prediction is carried out in units of 8 (8 blocks). MC allows 1 stroke 4 pixel compensation for luminance blocks. The block size of a ME can be 16 (16, 16 (8, 8) or 8 (8). The transformation is based on a 16-bit 8 (8-integer) transform (similar to WMV9). VLC is context-based adaptive 2D run / level coding. Four different Exp-Golomb codes are used. The coding for each quantized coefficient is adapted to the same 8 (the preceding symbol in the 8 block). Because the Exp-Golomb table is parameterized, the table is small. The video quality of AVS 1.0 for progressive video sequences is slightly lower than that of the H.264 main class at the same bit rate.

AVS-M is mainly aimed at mobile video applications and overlaps with the basic specifications of H.264. It only supports line-by-line video, I and P frames, not B frames. The main AVS-M coding tools include intra prediction based on 4 blocks, 1Z4 pixel motion compensation, integer transform and quantization, context adaptive VLC and highly simplified loop filter. Similar to the H.264 basic specification, the motion vector block size in AVS-M is reduced to 4 (4), so MB can have up to 16 motion vectors. Multi-frame prediction is used, but only 2 reference frames are supported. In addition, a subset of H.264 HRD/SEI messages is defined in AVS-M. The coding frequency of AVS-M is about 0.3dB, which is slightly lower than the H.264 basic specification under the same setting, while the complexity of × × is reduced by about 20%.

Background of H.264 and AVS

H.264/MPEG-4AVC is a new generation video coding standard jointly developed by VCG (Video Coding Experts Group) of ITU-T and MPEG (Moving Picture Experts Group) of ISO/IEC. Applications include videophone, video conference and so on. The main feature of H.264 is that it greatly improves the compression ratio, which is more than twice the compression efficiency of MPEG-2 and MPEG-4. The core technology of H.264 is the same as the previous standard, and it still uses the hybrid coding framework based on predictive transformation, but there is a great difference in the implementation of details, that is, the improvement in details leads to a great improvement in compression efficiency. And the new generation video coding standard H.264 has the characteristics of good network adaptability and fault tolerance.

The birth of AVS can be said to be a historical opportunity. Facing the high patent fees of H.264 and MPEG-2, China's digital video industry is facing serious challenges. In addition, China is committed to improving the core competitiveness of the domestic digital audio and video industry, and the Department of Science and Technology of the Ministry of Information Industry approved the establishment of the Digital Audio and Video Codec Technical Standards working Group in June 2006. in conjunction with domestic scientific research institutions and enterprises engaged in the research and development of digital audio and video codec technology, to meet the needs of China's audio and video industry. This paper puts forward the source coding standard of China's independent intellectual property rights-"Information Technology Advanced Audio and Video coding" series of standards, referred to as AVS (audio video coding standard). The independent AVS standard is in the international advanced level in technology and performance. If we seize this opportunity, our country may have a comprehensive initiative in the industry chain of technology-patent-standard-chip-system-industry.

Analysis and comparison of Core Technologies of H.264 and AVS

Like previous standards, H.264 still uses the framework of hybrid coding. AVS video standard adopts a technical framework similar to H.264, including transform, quantization, entropy coding, intra prediction, interframe prediction, loop filtering and other modules. The differences in their core technologies include the following:

I. Transformation and quantization

H.264 adopts block-based transform coding for residual data, which removes the spatial redundancy of the original image, so that the image ability is concentrated on a small part of the coefficients, and the DC coefficient value is generally the largest, which can improve the compression ratio and enhance the anti-jamming ability. The previous standard generally uses DCT transform, but the disadvantage of this transformation is that there will be a mismatch. After the original data is restored by transformation and inverse transformation, there will be a difference. Because it is a real number, the amount of calculation is relatively large. H.264 adopts integer transform based on 4 × 4 blocks.

AVS uses an 8 × 8 integer transform and can be implemented without mismatch on a 16-bit processor. The de-correlation of high-resolution video image is more effective than 4 × 4 transform, and 64-stage quantization is adopted, which can meet the requirements of code stream and quality of different applications and services.

Second, intra prediction

Both H.264 and AVS technologies use intra prediction, using adjacent pixels to predict the current block, and using a variety of prediction modes that represent the texture in the spatial domain. The brightness prediction of H.264 has two prediction modes: 4 × 4 blocks and 16 × 16 blocks. For 4 × 4 blocks, there are 9 prediction directions from-135 degrees to + 22.5 degrees plus a DC prediction, and for 16 × 16 blocks, there are four prediction directions. The chromaticity prediction is 8 × 8 blocks, and there are four prediction modes, which are similar to the four modes of intra-frame 16 × 16 prediction, in which DC is mode 0, horizontal is mode 1, vertical is mode 2, and plane is mode 3.

Third, inter-frame prediction

H.264 inter-frame prediction is a prediction mode based on encoded video frames and block-based motion compensation. it is different from the previous standard inter-frame prediction in a wider range of block size, the use of sub-pixel motion vectors and the use of multiple reference frames.

H.264 has 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8 and 4 × 4 macroblocks and sub-macroblocks, while AVS has only 16 × 16, 16 × 8, 8 × 16 and 8 × 8 macroblock divisions.

H.264 supports the prediction of interframe macroblocks and slices using several different reference frames. In AVS, P frames can use at most 2 forward reference frames, and B frames use one reference frame before and after each frame.

4. Entropy coding

H.264 establishes the entropy coding efficiency based on the amount of information, one is unified variable length coding (UVLC) for all symbols to be encoded, and the other is content-based adaptive binary arithmetic coding (CABAC, Context-Adaptive Binary Arithmetic Coding), which greatly reduces the correlation redundancy of block coding and improves the coding efficiency. UVLC has low computational complexity, which is mainly aimed at applications with strict coding time, but its disadvantages are low efficiency and high bit rate. CABAC is an efficient entropy coding method, and its coding efficiency is 50% higher than that of UVLC coding.

AVS entropy coding adopts adaptive variable length coding technology. In the process of AVS entropy coding, all syntax elements and residual data are mapped into binary bit streams in the form of exponential Columbus codes.

The advantage of using exponential Columbus code lies in: on the one hand, its hardware complexity is relatively low, and the code can be parsed according to the closed formula without looking up the table; on the other hand, it can be flexibly determined to code with K-order exponential Columbus code according to the probability distribution of coding elements. if K is selected properly, the coding efficiency can approach information entropy.

For the block transform coefficients of predicted residuals, (level, run) pairs are formed by scanning. Level and run are not independent events, but have a strong correlation. In AVS, level and run adopt two-dimensional joint coding, and the order of exponential Columbus codes is adaptively changed according to the current different probability distribution trends of level and run.

In addition, there are no SI or SP frames in AVS. It can be said that AVS is developed on the basis of H.264 and absorbs the essence of H.264, but in order to avoid the trouble of patents, it has to give up some core algorithms of H.264. The price is that when the coding efficiency is slightly reduced, the complexity is greatly reduced.

AVS is the standard of independent intellectual property rights in China, but it has not been used on a large scale and is in its infancy. Most enterprises are in a wait-and-see state, there is no large amount of capital investment, facing many difficulties, but its broad prospects can not be ignored, and with the strong support of the state, it will develop more perfectly.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.