How to analyze the source code of ffplay player 04/15 Update SLTechnology News&Howtos

How to analyze the source code of ffplay player

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to analyze the source code of the ffplay player. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

In fact, the principle of video player is pretty much the same, which is the control of audio and video frame sequence. It's just that some players may have done more complex frame prediction techniques in audio and video synchronization to ensure better audio and video synchronization.

Ffplay is a player that comes with FFMpeg. It uses the ffmpeg decoding library and the sdl library for video rendering and display. It is also the design standard originally referenced by players in the industry. This paper analyzes the ffplay source code and tries to unlock the audio and video synchronization of the player and the control principle of playback / pause, fast forward / backward with a more basic and systematic method.

Because of the cross-platform characteristics of FFMpeg, it is much more efficient and faster to use VS to view and debug the code and analyze the player principle than watching audio and video code on mobile.

Because the ffmplay provided by FFMpeg is not intuitive enough to be used in console, this paper directly analyzes the code (ffplay for MFC) that transplants ffplay to VC on CSDN.

Article catalogue:

First, a preliminary study of mp4 files

Second, start with the simplest player: FFmpeg decoding + SDL display

3. Throw out five questions first

IV. The overall structure of ffplay code

5. Operation control of video player

VideoState, the key structure defined by 5.1 ffplay

5.2 additional basic knowledge-PTS and DTS

5.2 how to control audio and video synchronization

5.4 how to control the playback and pause of the video?

5.5 how is frame by frame playback done?

5.6 Fast forward and backward

6. Introspection and summary of this analysis of ffplay code

First, a preliminary study of mp4 files

In order to give you a preliminary understanding of the video file, let's first look at a simple analysis of a MP4 file, as shown in figure 1.

Fig. 1 solve the parameters of MP4 file

We know from figure 1 that each video file has a specific encapsulation format, bit rate, duration and other information. After demultiplexing, the video is divided into video_stream and audio_stream, corresponding to video stream and audio stream respectively.

The demultiplexed audio and video has its own independent parameters, the video parameters include coding mode, sampling rate, picture size and so on, and the audio parameters include sampling rate, coding mode and channel number and so on.

After decoding the demultiplexed audio and video Packet, it becomes the original audio (PWM) and video (YUV/RGB) data before it can be displayed and played.

In fact, this has almost involved, most of the video decoding and playback process, the whole video playback process is shown in figure 2.

Fig. 3 player flow chart (see watermark for source)

The flow chart is illustrated as follows:

The code of 1.FFmpeg initialization is relatively fixed, and the main purpose is to set the value of the relevant member variables in the AVFormatContext instance and call functions such as av_register_all, avformat_open_input av_find_stream_info and avcodec_find_decoder.

As shown in figure 4, for the specific value in the initialized AVFormatContext instance, calling av_find_stream_info is to find the audio and video stream data in the file and initialize the streams (including audio and video stream) variables.

Figure 4 AVFormatContext initialization example

2.av_read_frame continuously reads the next frame in the stream, demultiplexes it to get the video AVPacket, and then calls the avcodec_decode_video2 to decode the video frame AVPacket to get the image frame AVFrame.

3. After getting the AVFrame, the next step is to put it into SDL for rendering and display, which is also very simple. For more information, please see the following code comments:

SDL_Overlay * bmp;// copies the parsed AVFrame data to the SDL_Overlay instance SDL_LockYUVOverlay (bmp); bmp- > pixels [0] = pFrameYUV- > data [0]; bmp- > pixels [2] = pFrameYUV- > data [1]; bmp- > pixels [1] = pFrameYUV- > data [2]; bmp- > pitches [0] = pFrameYUV- > linesize [0]; bmp- > pitches [2] = pFrameYUV- > linesize [1]; bmp- > pitches [1] = pFrameYUV- > linesize [2]; SDL_UnlockYUVOverlay (bmp) / / set the SDL_Rect, which is represented by rect because it involves the starting point and the display size. SDL_Rect rect;rect.x = 0; rect.y = 0; rect.w = pCodecCtx- > width; rect.h = pCodecCtx- > height; / / displays the SDL_Overlay data into the SDL_Surface. SDL_DisplayYUVOverlay (bmp, & rect); / / delay 40ms, leaving enough time for ffmpeg to fetch the next frame and decode it, and then continue to read the next frame SDL_Delay (40)

According to the above principle, the AVPacket is obtained from the frame stream, and the AVFrame is decoded and rendered to the SDL window.

Figure 5 Video playback status diagram

A summary of the process of video playback is as follows: read the next frame-> decode-- > play-- > go back and forth, and the state diagram is shown in figure 5.

3. Throw out five questions first

This paper is still based on the idea of throwing questions, in order to gradually analyze the principle of each problem, and deepen the understanding of audio and video decoding and playback. The following questions are also the basic problems and principles that every player needs to face:

1. When watching the movie, we found that the film can change different subtitles, or even different audio, such as Chinese and English subtitles and dubbing, and finally display in the same picture. How is the video about the picture, subtitle and sound combined?

In fact, each video file, read out and found, will be distinguished from different streams. To give you a more specific understanding, take the code in FFMpeg as an example, AVMediaType defines specific flow types:

Enum AVMediaType {AVMEDIA_TYPE_VIDEO, / / video stream AVMEDIA_TYPE_AUDIO, / / audio stream AVMEDIA_TYPE_SUBTITLE, / / subtitle stream}

After reading out the audio and video frame using av_read_frame, then use avcodec_decode_video2 to decode the video Jetstar, or call avcodec_decode_audio4 to decode the audio to get the audio and video raw data that can be rendered and displayed.

Images and subtitles will be in the form of Surface or texture, just like SurfaceFlinger in Android, which combines the display of different modules of the picture to generate a new image to be displayed in the video screen.

two。 Since video has the concept of frame rate and audio has the concept of sampling rate, can the synchronization of audio and video be controlled by using frame rate directly?

Each video frame and audio frame corresponds to a point in time. According to reason, as long as the playback time of each audio and video frame is controlled, synchronization can be achieved.

But in fact, it is difficult to accurately control the time of each frame, not to mention that the decoding time of audio and video is different, which can easily cause audio and video out of sync in time.

So, how exactly does the player synchronize audio and video?

3. Are the audio streams, video streams and subtitle streams of video continuous or discrete in time? Is the number of frames for different streams the same?

Since computers can only digitally simulate a discrete world, they must be discrete in time. Since it is discrete, do they have the same number of frames?

Video can be understood as the time sequence of many audio frames, video frames and subtitle frames, and their time length is the same as the total video time, but because the decoding time of each frame is different, it will inevitably cause them to have different time intervals in each frame.

The audio raw data itself is sampled data, so there is a fixed clock cycle. However, if the video wants to synchronize with the audio, frame skipping may occur. The playback time difference of each video frame will fluctuate, not a constant period.

So the conclusion is that the number of frames played in the total duration of the video is definitely different.

4. Video playback is a series of continuous frames that are constantly rendered. The control operations of the video include pausing and playing, fast forward and backward. Have you ever thought that it is better to measure the amplitude of each fast forward / backward in terms of time or the number of frames per jump, that is, how long or how many frames each fast forward? Number of time VS frames?

From the analysis of the above problems, we know that video is streaming on the basis of audio stream, video stream and subtitle stream. If it is based on the number of frames, because the number of frames of different streams is not necessarily the same, it is easy to cause inconsistent playback of the three streams.

Therefore, it is better to use time as a measure. Directly search for the mp4 file stream, the seek time point of the forward or backward time of the current playback time, and then re-analyze the file stream to achieve audio and video synchronization after fast forward and backward.

We can see that most players, fast forward / backward are based on the length of time, we can see what ffplay is like and how it is implemented.

5. In the previous section, the simple player implemented, decoding and playback are all in the same thread, and the decoding speed directly affects the playback speed, which will directly cause the problem of unsmooth playback. So how to play the video smoothly when the decoding speed may be uneven?

It is easy to think of introducing a buffer queue, using video image rendering display and video decoding as two threads, the video decoding thread writes data to the queue, and the video rendering thread reads data from the queue for display. This ensures that the video can be played in a stream.

Therefore, it is necessary to use three buffer queues of audio frame, video frame and subtitle frame, so how to ensure the synchronization of audio and video playback?

PTS is the display time stamp of video frame or audio frame. How on earth is it used to control the display time of video frame, audio frame and subtitle frame?

Then we can explore exactly how ffplay does buffer queue control.

We will gradually find more specific answers to all of the above five questions in our exploration of the ffplay source code.

IV. The overall structure of ffplay code

Figure 6 overall flow of ffplay code

Someone on the Internet has made the overall flow chart of ffplay, as shown in figure 6. With this picture, the code looks a lot easier. The details included in the process are as follows:

1. Start timer Timer, timer 40ms is refreshed once, and use SDL event mechanism to trigger reading data from image frame queue for rendering display

In the 2.stream_componet_open function, av_read_frame () reads the AVPacket and then puts it into the audio, video, or subtitle Packet queue

3.video_thread, get the AVPacket from the video packet queue and decode it, get the AVFrame image frame, and put it into the VideoPicture queue.

4..audio_thread thread, same as video_thread, decodes the audio Packet

The 5.subtitle_thread thread, like video_thread, decodes the subtitle Packet.

5. Operation control of video player

The operation of video player includes playback / pause, fast forward / backward, frame by frame, and so on. What is the implementation principle of these operations? let's analyze them one by one from the code level.

VideoState, the key structure defined by 5.1 ffplay

Similar to FFmpeg decoding, an AVFormatContext structure is defined, which is used to store file name, audio and video stream, decoder and other fields for global access.

Ffplay also defines a structure VideoState. Through the analysis of VideoState, you can know the basic implementation principle of the player.

Typedef struct VideoState {/ / Demux demultiplexes the thread, reads the video file stream thread, gets the AVPacket, and stacks the packet into the SDL_Thread * read_tid; / / video decoding thread. Reading AVPacket,decode can be crawled into an AVFrame to join the team SDL_Thread * video_tid; / / video playback refresh thread, and the next frame SDL_Thread * refresh_tid; int paused is played regularly. / / Control video pause or playback flag int seek_req; / / Progress control flag int seek_flags; AVStream * audio_st; / / Audio stream PacketQueue audioq; / / Audio packet queue double audio_current_pts; / / current audio frame display time AVStream * subtitle_st; / / subtitle stream PacketQueue subtitleq;// subtitle packet queue AVStream * video_st / / Video stream PacketQueue videoq;// video packet queue double video_current_pts; / current video frame pts double video_current_pts_drift; VideoPicture pictq [video _ PICTURE_QUEUE_SIZE]; / / decoded image frame queue}

As can be seen from the VideoState structure:

1. Demultiplexing, video decoding and video refresh playback are controlled in parallel in three threads.

two。 Audio stream, video stream, and subtitle stream all have their own buffer queues for different threads to read and write, and have their own PTS of the current frame

3. The decoded image frames are separately placed in the pictq queue and displayed by SDL.

What is PTS, which is a very important concept in audio and video, which directly determines the display time of video frame or audio frame, which is described in detail below.

5.2 additional basic knowledge-PTS and DTS

Fig. 7 Audio and video decoding analysis

Figure 7 shows the output sequence of audio and video frames, each with PTS and DTS tags. What exactly do these two tags mean?

Both DTS (Decode Time Stamp) and PTS (Presentation Time Stamp) are timestamps, the former is the decoding time, the latter is the display time, and both are time tags for video frames and audio frames, in order to support the synchronization mechanism of upper layer applications more effectively.

In other words, when a video frame or audio is decoded, the decoding time is recorded, and the playback time of the video frame depends on the PTS.

For sound, the two time tags are the same, but for some video coding formats, due to the use of two-way prediction technology, DTS will set a certain timeout or delay to ensure the synchronization of audio and video, which will cause inconsistency between DTS and PTS.

5.3 how to control audio and video synchronization

We already know that the playback time of video frames actually depends on the pts field, and audio and video have their own separate pts. But how on earth is pts generated? if audio and video are out of sync, does pts need to be dynamically adjusted to ensure audio and video synchronization?

Let's first analyze how to control the display time of video frames:

Static void video_refresh (void * opaque) {/ / get VideoPicture VideoPicture * vp = & is- > pictq [is-> pictq_rindex] according to the index; if (is- > paused) goto display; / / only play the image in the case of paused / / subtract the pts of the previous frame from the pts of the current frame to get the intermediate time difference last_duration = vp- > pts-is- > frame_last_pts / / check whether the difference is within a reasonable range, because the time difference between two consecutive frames pts should not be too large or too small if (last_duration > 0 & & last_duration)

< 10.0) { /* if duration of the last frame was sane, update last_duration in video state */ is->

Frame_last_duration = last_duration;} / / since you want to synchronize audio and video, you must use video or audio as the reference standard, and then control the delay to ensure the synchronization of audio and video. / / this function will do this. We will analyze how to do it. Delay = compute_target_delay (is- > frame_last_duration, is); / / get the current time time= av_gettime () / 1000000.0; / / if the current time is less than frame_timer + delay, that is, if the frame change time is ahead of time, return if (time) directly

< is->

Frame_timer + delay) return; / / according to the audio clock, as long as the delay is needed, that is, delay is greater than 0, updates need to be added to the frame_timer. If (delay > 0) / Update frame_timer,frame_time is the cumulative value of delay is- > frame_timer + = delay * FFMAX (1, floor ((time-is- > frame_timer) / delay); SDL_LockMutex (is- > pictq_mutex); / / updates the pts of the current frame in is, such as video_current_pts, video_current_pos and other variables update_video_pts (is, vp- > pts, vp- > pos); SDL_UnlockMutex (is- > pictq_mutex) Display: / * display picture * / if (! display_disable) video_display (is);}

The function compute_target_delay recalculates the delay according to the audio clock signal, so as to adjust the video display time according to the audio, thus achieving the effect of audio and video synchronization.

Static double compute_target_delay (double delay, VideoState * is) {double sync_threshold, diff; / / because audio is sampled data, has a fixed adoption cycle and depends on the master system clock, it is difficult to adjust the delay of audio playback. So in practical situations, video synchronization audio is easier to achieve than audio synchronization video. If (is- > av_sync_type = = AV_SYNC_AUDIO_MASTER & & is- > audio_st) | | is- > av_sync_type = = AV_SYNC_EXTERNAL_CLOCK) {/ / gets the playback time of the current video frame, which is subtracted from the system master clock time to get the difference diff = get_video_clock (is)-get_master_clock (is); sync_threshold = FFMAX (AV_SYNC_THRESHOLD, delay) / / if the playback time of the current frame, that is, pts, lags behind the main clock if (fabs (diff)

< AV_NOSYNC_THRESHOLD) { if (diff = sync_threshold) delay = 2 * delay; } } return delay;}

Fig. 8 Audio and video frame display sequence

So the process here is very simple. Figure 8 simply draws a sequence of audio and video frames, which means that the number of audio frames and the number of video frames are not necessarily equal. In addition, the display time of each audio frame is almost equal in time, and the display time of each video frame will have a delay display according to the specific situation. This delay is calculated by the above compute_target_delay function.

After the calculation delay, the code to update the pts is as follows:

Static void update_video_pts (VideoState * is, double pts, int64_t pos) {double time = av_gettime () / 1000000.0; / * update current video pts * / is- > video_current_pts = pts; is- > video_current_pts_drift = is- > video_current_pts-time; is- > video_current_pos = pos; is- > frame_last_pts = pts;}

The whole process can be summarized as follows:

Display the first video image

According to the audio signal, the delay time of the second frame is calculated, and the pts of the frame is updated.

When the pts arrives, the second video image is displayed.

Repeat the above steps to the last frame

It may still be confusing here, why the delay needed to play the next frame can be played according to the main clock alone?

In fact, video has a certain length of playback stream, which can be divided into audio stream, video stream and subtitle stream, which are played together to form a video. Of course, their total playback time is the same as that of video files.

Because the audio stream itself is pwm sampled data and played at a fixed frequency, which is the same as the main clock or its frequency division, from the point of view of time, each audio frame passes naturally and evenly.

So for audio, just follow the main clock or its frequency division.

Video, according to its own display time, that is, pts, is compared with the current time of the main clock to determine whether it is ahead or behind the system clock, so as to determine the delay, and then play it accurately, so as to ensure the synchronization of audio and video.

Then, there is another question: after calculating the delay, do you need sleep to do the delay display?

In fact, this is not the case. In the above analysis, we know that the delay will be updated to the pts (video_current_pts) of the current video frame. Before displaying the current AVFrame, check its pts time first. If it has not arrived, it will not be displayed, just return. Detect again until the next refresh (40ms timing refresh used by ffplay).

The code is as follows: direct return before the updated pts time (is- > frame_timer + dela) is reached:

If (av_gettime () / 1000000.0)

< is->

Frame_timer + delay) return

The next step is to analyze how to play video frames, which is very simple, but an additional subtitle stream is added here:

Static void video_image_display (VideoState * is) {VideoPicture * vp; SubPicture * sp; AVPicture pict; SDL_Rect rect; int i; vp = & is- > pictq [is-> pictq_rindex] If (vp- > bmp) {/ / subtitle processing if (is- > subtitle_st) {}} / / calculate the display area of the image calculate_display_rect (& rect, is- > xleft, is- > ytop, is- > width, is- > height, vp); / / display image SDL_DisplayYUVOverlay (vp- > bmp, & rect) / / move the pointer of the pic queue forward one position pictq_next_picture (is);}

VIDEO_PICTURE_QUEUE_SIZE is only set to 4 and will soon run out. How to update when the data is full?

Once it is detected that the queue size limit is exceeded, it will wait until the pictq is taken out and consumed, so as to avoid opening the player, and decode the entire file, which will eat up the memory.

Static int queue_picture (VideoState * is, AVFrame * src_frame, double pts1, int64_t pos) {/ * keep the last already displayed picture in the queue * / while (is- > pictq_size > = VIDEO_PICTURE_QUEUE_SIZE-2 & &! is- > videoq.abort_request) {SDL_CondWait (is- > pictq_cond, is- > pictq_mutex);} SDL_UnlockMutex (is- > pictq_mutex);} 5.4 how to control video playback and pause? Static void stream_toggle_pause (VideoState * is) {if (is- > paused) {/ / because frame_timer records the time the video is played from the beginning to the current frame, after pausing, you must add up the paused time (is- > video_current_pts_drift-is- > video_current_pts) and add the drift time. Is- > frame_timer + = av_gettime () / 1000000.0 + is- > video_current_pts_drift-is- > video_current_pts; if (is- > read_pause_return! = AVERROR (ENOSYS)) {/ / and update video_current_pts is- > video_current_pts = is- > video_current_pts_drift + av_gettime () / 1000000.0 } / / drift is actually the time difference between the pts of the current frame and the current time is- > video_current_pts_drift = is- > video_current_pts-av_gettime () / 1000000.0;} / / paused is reversed, and the paused flag bit will also control the display of the image frame. Press the space bar once to pause, and then press to play again. Is- > paused =! is- > paused;}

Special note: the paused flag bit controls whether the video is played or not. When you need to continue playing, be sure to update the pts time of the current playback frame, because the paused time is added.

5.5 how is frame by frame playback done?

In the video decoding thread, continuously control the pause and display of the video through stream_toggle_paused, so as to achieve frame-by-frame playback:

Static void step_to_next_frame (VideoState * is) {/ / when playing frame by frame, be sure to continue the playback first, and then set the step variable to control the frame-by-frame playback of if (is- > paused) stream_toggle_pause (is); / / the paused will continue to reverse is- > step = 1;}

The principle is to play it continuously, then pause it, so as to play it frame by frame:

Static int video_thread (void * arg) {if (is- > step) stream_toggle_pause (is); … If (is- > paused) goto display;// displays video}} 5.6 fast forward and rewind

With regard to fast forward / backward, first two questions are raised:

1. Does fast forward control the playback progress in terms of time or the number of frames?

two。 Once the progress has changed, does the current frame, as well as the AVFrame queue, need to be zeroed, and does the entire flow of stream need to be re-controlled?

The control method with time as dimension is adopted in ffplay. The control of fast forward and backward is controlled by setting seek_req, seek_pos and other variables of VideoState.

Do_seek:

/ / it actually calculates is- > audio_current_pts_drift + av_gettime () / 1000000.0 to determine the time value of the current frame to be played

Pos = get_master_clock (cur_stream)

Pos + = incr; / / incr is the step value of each fast forward, and the time point after fast forward can be obtained by adding it.

Stream_seek (cur_stream, (int64_t) (pos AV_TIME_BASE), (int64_t) (incr AV_TIME_BASE), 0)

The code for stream_seek is as follows, which actually sets the relevant variables of VideoState to control the fast forward or backward flow in read_tread:

/ * seek in the stream * / static void stream_seek (VideoState * is, int64_t pos, int64_t rel, int seek_by_bytes) {if (! is- > seek_req) {is- > seek_pos = pos; is- > seek_rel = rel; is- > seek_flags & = ~ AVSEEK_FLAG_BYTE; if (seek_by_bytes) is- > seek_flags | = AVSEEK_FLAG_BYTE; is- > seek_req = 1;}}

If you set the seek_req flag in stream_seek, you will directly enter the forward / backward control flow. The principle is to call the avformat_seek_file function to control the index point according to the timestamp, thus controlling the next frame that needs to be displayed:

Static int read_thread (void * arg) {/ / after adjusting the playback schedule, if (is- > seek_req) {int64_t seek_target = is- > seek_pos; int64_t seek_min = is- > seek_rel > 0? Seek_target-is- > seek_rel + 2: INT64_MIN; int64_t seek_max = is- > seek_rel

< 0 ? seek_target - is->

Seek_rel-2: INT64_MAX; / / spot check the location of the index point according to time. After locating the index point, the reading of the next frame starts directly from here, and the fast forward / backward operation ret = avformat_seek_file (is- > ic,-1, seek_min, seek_target, seek_max, is- > seek_flags) is realized; if (ret)

< 0) { fprintf(stderr, "s: error while seeking\n", is->

Ic- > filename);} else {/ / after the search is successful, you need to clear the current PAcket queues, including audio, video and subtitle if (is- > audio_stream > = 0) {packet_queue_flush (& is- > audioq); packet_queue_put (& is- > audioq, & flush_pkt) } if (is- > subtitle_stream > = 0) {/ / handles subtitles stream packet_queue_flush (& is- > subtitleq); packet_queue_put (& is- > subtitleq, & flush_pkt);} if (is- > video_stream > = 0) {packet_queue_flush (& is- > videoq); packet_queue_put (& is- > videoq, & flush_pkt);} is- > seek_req = 0 Eof = 0;}}

In addition, it is found from the above code that flush zeroing is performed on audioq, videoq and subtitleq after each fast forward and retreat, which is also equivalent to a fresh start to ensure the correctness of the data in the buffer queue.

For audio, at first there is still some confusion, because in the pause, do not see the control of the audio, how to control it?

Later, it was found that the is- > paused variable was set during the pause, and the demultiplexing and audio decoding and playback depended on the is- > paused variable, so audio and video playback stopped.

6. The introspection summary of this analysis of ffplay code:

1. The accumulation of basic concepts and principles, the first contact with FFmpeg, because it involves a lot of concepts, there seems to be a feeling that there is no way to start. At this time, we must start from the basic modules, gradually understand more, a certain amount of accumulation, there will be some qualitative changes, a better understanding of the video codec mechanism.

two。 Be sure to first understand the overall structure and process of the code, and then conduct an in-depth analysis of each detail, which will greatly improve the efficiency of looking at the code. It is very important to draw some block diagrams, such as the one below, so a brief flow chart is much more convenient than a detail-oriented uml diagram

3. Look at the FFmpeg code, debug on the PC side, it will be much faster. If you want to call jni to look at the code on Android, it will be very inefficient.

After reading the above, do you have any further understanding of how to analyze the source code of the ffplay player? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.