In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Doing audio and video development for many years, I have shared many blog articles, such as "FFmpeg Tips" series released a few years ago, "Android audio development" series, "live troubleshooting" series and so on. Recently, I want to share my experience in developing and optimizing the player over the years, and also consider opening up the ffmpeg-based player kernel I developed in my spare time, hoping to help beginners in the audio and video field. The content to be launched in the first issue of the article is mainly related to several technical points at the core of the player. The approximate directory is as follows:
Player Technology sharing (1): architecture Design
Player technology sharing (2): buffer management
Player technology sharing (3): audio and picture synchronization
Player technology sharing (4): opening time
Player technology sharing (5): delay optimization
This is the first in a series of articles, mainly talking about the architecture design of the player.
1 Overview
First of all, what is the definition of player?
"player refers to software that can play video or audio files stored in the form of digital signals, as well as electronic devices with the function of playing video or audio files." Baidu Encyclopedia
My interpretation is as follows: "player refers to the software that can read, parse and render audio and video files stored locally or on the server, or electronic products."
To sum up, it mainly has the following three functional features:
Read (IO): "get" content-> get from "local" or "server"
Parser: "understand" content-> refer to "format & Protocol" to "understand" content
Rendering (Render): "display" content-> "display" content through speakers / screens
These three functions are strung together to form the data flow of the entire player, as shown in the figure:
IO: responsible for reading data. There are many standard protocols for reading data from data sources, such as File,HTTP (s), RTMP,RTSP, etc.
Parser & Demuxer: responsible for data parsing. The encapsulation formats of audio and video data have various industry standards. You only need to refer to these industry standard documents to parse various encapsulation formats, such as common formats such as mp4,flv,m3u8,avi, etc.
Decoder: in fact, it also belongs to a kind of data parsing, but it is more responsible for decoding compressed audio and video data to get the original YUV and PCM data. Common video compression formats such as H.264, MPEG4, VP8/VP9, audio compression formats such as G.711, AAC, Speex, etc.
Render: responsible for rendering and rendering video data, is a platform-related feature, different platforms have different rendering API and methods, such as: Windows DDraw/DirectSound,Android SurfaceView/AudioTrack, cross-platform such as OpenGL and ALSA, etc.
Let's analyze the input and output of each module of the whole data stream of the player one by one, and design the interface API of each module together.
2 module design
2.1 IO Modul
Input to the IO module: the address of the data source (URL). This URL can be a local file path or a network stream address.
Output of IO module: binary data, that is, audio and video binary data read through IO protocol.
An example of URL for a video data source is as follows:
File:///c:/WINDOWS/clock.avi
Rtmp://live.hkstv.hk.lxdns.com/live/hks
Http://www.w3school.com.cn/i/movie.mp4
Http://devimages.apple.com/iphone/samples/bipbop/bipbopall.m3u8
To sum up, the interface design of the player IO module is as follows:
The Open/Close method is mainly used to open / close the video stream. The player kernel can know which IO protocol to pull the stream (e.g. FILE/RTMP/HTTP) through the URL header (Schemes), and then complete the actual protocol parsing and data reading by inheriting the subclasses of this API.
When the IO module reads the data, two methods are defined. The Read method is used to read the data sequentially, and ReadAt is used to read the data from the specified Offset offset. The latter is mainly used for file or video-on-demand to provide Seek capability for the player.
For network flows, disconnection may occur, so a separate Reconnect interface is created to provide the ability to reconnect.
2.2 parsing module
In fact, the audio and video binary data read from IO module are encapsulated in formats such as mp4, flv, avi and so on. If you want to separate audio and video packets, you need to parse them through a Parser & Demuxer module.
Input of parsing module: bytes binary data read by IO module
The output of the parsing module: audio and video media information, undecoded audio packets, undecoded video packets
The media information of audio and video mainly includes the following:
for video duration, bit rate, frame rate, etc.
Audio format: coding algorithm, sampling rate, number of channels, etc.
Video format: coding algorithm, width-height, aspect ratio, etc.
In summary, the interface design of the parsing module is shown in the following figure:
After creating the parsing object, input audio and video data to parse the basic audio and video media information through the Parse function, read the separate audio and video data packets through the Read function, and then send them to audio and video × ×, respectively, and obtain various audio and video parameter information through the Get method.
2.3 Decoding module
After the parsing module separates the audio and video packets, it can be assigned to send to audio and video.
Input of decoding module: undecompressed audio / video packet
Output of the decoding module: decompressed audio / image raw data, namely PCM and YUV
Since audio and video decoding often does not necessarily output one frame of data for each frame of data input, but often needs to cache several reference frames to get the output, the interface design of the encoder often uses a "producer-consumer" model to concatenate the "producer-consumer" through a common buffer queue, as described in the following figure (intercepted from the design of the Android MediaCodec codec library):
In summary, the interface design of the decoding module is as follows:
The media information output by the parsing module includes what type of audio / video × ×, which can be used to complete the initialization of × ×. The rest of the process is to constantly interact with × × through Queue and Dequeue, send in the undecoded data, and get the decoded data.
2.4 rendering module
After outputting the original image and audio data, the next step is to send it to the rendering module for image rendering and audio playback.
In general, the video data is rendered to the video card and displayed on the window, while the audio data is sent to the sound card to be played by loudspeakers. Although the system layer API for window drawing and loudspeaker playback is different for different platforms, the flow at the interface level is also similar, as shown in the figure:
For video rendering, the process is as follows: Init initialization-> SetView Settings window object-> SetParam Settings rendering parameters-> Render perform rendering / rendering
For audio playback, the process is as follows: Init initialization-> SetParam set playback parameters-> Render performs playback operation
2.5 string the modules together
As shown in the figure, after each module is concatenated in this way, the entire data stream of the player will go, but this is a single-threaded structure. After reading the data from IO, it is immediately sent to parse-> decode-> render. The player design with such a single-threaded structure will have the following problems:
After audio and video separation-> Decoding-> playback, logic cannot be inserted to synchronize audio and picture.
Numerous data buffers, once the network / decoding jitter-> causes frequent stutters
Single-threaded operation, not making full use of CPU multicore
To solve the problem of single-thread structure, you can add a data buffer with the "producer-consumer" of the data as the boundary, and transform the single-thread model into a multi-thread model (IO thread, decoding thread, rendering thread), as shown in the figure:
After being transformed into a multithreaded model, the advantages are as follows:
Frame queue (Packet Queue): can resist network jitter
Display queue (Frame Queue): resistant to decoding / rendering jitter
Rendering thread: add AV Sync logic to support sound and picture synchronization
Work in parallel, efficiently, and make full use of multicore CPU
Note: in the next article, we will talk about how these two new buffers should be designed and managed.
3 player SDK interface design
The key architecture design and data flow of the player are described in detail. If you expect to use the player kernel as a SDK to provide low-level capabilities for APP, you also need to design an easy-to-use API interface. This set of API interfaces can be abstracted into the following five parts:
Create / destroy player
Configuration parameters (such as window handle, video URL, loop playback, etc.)
Send commands (e.g. initialize, start playback, pause playback, drag, stop, etc.)
Audio and video data callback (such as decoded audio and video data callback)
Message / status message callback (e.g. buffer start / end, playback complete, etc.)
To sum up, the list of common APIs for players is as follows:
Create/Release/Reset
SetDataSource/SetOptions/SetView/SetVolume
Prepare/Start/Pause/Stop/SeekTo
SetXXXListener/OnXXXCallback
4 the state model of player
Generally speaking, the player is actually a state machine. After it is created, it will switch between states according to the commands sent to it by the application layer and the events generated by itself, which can be shown in the following figure:
The player has a total of nine states, of which Idle is the initial state of arrival after creation / reset, End and Error are the final states that actively destroy the player and enter after an error occurs (the Idle state can be restored after reset reset)
Other state switching and attainment methods have been clearly marked in the diagram, so I will not repeat them here.
5 Summary
So much for the architectural design of the player. Some of the content has not been discussed, but the key points should be basically explained clearly. If you have any questions, you are welcome to write to lujun.hust@gmail.com for communication. In addition, you are welcome to follow my Sina Weibo @ Lu _ Jun or Wechat official account @ Jhuster for the latest articles and information.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.