What is SDP like in WebRTC? 04/20 Update SLTechnology News&Howtos

What is SDP like in WebRTC?

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

What is SDP in WebRTC? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

WebRTC is Web Real-Time Communication, the abbreviation of web real-time communication, is a Web implementation of RTC protocol, the project is open source by Google, and has established industry standards with IETF and W3C. In China, WebRTC has been supported by more and more manufacturers, and the application prospect becomes broader, so we also set up a column to share the WebRTC research work within Aliyun.

Overview

In a narrow sense, WebRTC refers to the browser side. How do the browser side exchange data directly? It must not be done completely independently, and you have to rely on the server. Generally rely on several types of servers:

The Signaling signaling server, that is, the media information of the exchange room and meeting, as well as the message during the meeting, the media description uses SDP protocol, which is the focus of this article.

ICE server can be divided into STUN server that helps two clients to make holes to establish P2P connection, and TURN server that forwards directly if the connection is not available. The information in ICE is called Candidate and can be exchanged through SDP or through Trickle.

For SFU or MCU servers, if multiple people have a meeting, each end sends data directly to the other attendees called MESH, but MESH has obvious limitations. SFU allows clients to only flow upstream all the way to other clients, while MCU is more powerful, and there is only one stream for upstream and downlink.

Note: in addition to transmission, another important feature of WebRTC is security, that is, DTLS, while some of the information in DTLS is transmitted through SDP, and there will be related technical articles to introduce DTLS.

Next, we formally introduce the SDP protocol.

What's SDP

The key attribute diagrams of SDP at the beginning of this article have helped us to get a glimpse of SDP from a global perspective. SDP describes media sessions, network information, security features, transmission strategies and so on. Each SDP attribute in the diagram plays a different role in different application scenarios, which should not be underestimated.

Next, we further give the official definition of SDP: SDP (Session Description Protocol) is a session description protocol, based on text, it does not belong to the transport protocol itself, and needs to rely on other transport protocols (such as SIP and HTTP) to exchange necessary media information for media negotiation between two conversation entities.

WebRTC's Offer and Answer contain SDP. Related RFC includes:

1998, RFC2327

2006, RFC4566

A good SDP example analysis of WebRTC

Offer and Answer

WebRTC uses Offer-Answer models to exchange SDP,Offer, as well as in SDP,Answer. For example, Alice and Bob communicate over WebRTC:

/ / Alice Offerv=0o=- 2397106153131073818 2 IN IP4 127.0.0.1s=-t=0 0a=group:BUNDLE videoa=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LySm=video 9 UDP/TLS/RTP/SAVPF 96 97c=IN IP4 0.0.0.0a=rtcp:9 IN IP4 0.0.0.0a=ice-ufrag:l5KUa=ice-pwd:+Sxmm3PoJUERpeHYL0HW4/T9a=ice-options:tricklea=fingerprint:sha-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E:61:CB:9D:91:9B:44:F6:C9:AC:3B: 37:1C:00:15:4C:5A:B5:67:74a=setup:actpassa=mid:videoa=sendrecva=rtcp-muxa=rtcp-rsizea=rtpmap:96 VP8/90000a=rtcp-fb:96 goog-remba=rtcp-fb:96 transport-cca=rtcp-fb:96 ccm fira=rtcp-fb:96 nacka=rtcp-fb:96 nack plia=rtpmap:97 rtx/90000a=fmtp:97 apt=96a=ssrc-group:FID 2527104241a=ssrc:2527104241 cname:JPmKBgFHH5YVFyaJa=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56a=ssrc:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LySa=ssrc:2527104241 label:c7072509-df47-4828-ad03-7d0274585a56// Bob Answerv=0o=- 5443219974135798586 2 IN IP4 127.0.0.1s=-t=0 0a=group:BUNDLE videoa=msid-semantic: WMS uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVsm=video 9 UDP/TLS/RTP/SAVPF 96 97c=IN IP4 0.0.0.0a=rtcp:9 IN IP4 0.0.0.0a=ice-ufrag:MUZfa=ice-pwd:4QhikLcmGXnCfAzHDB++ZjM5a=ice-options:tricklea=fingerprint:sha-256 2A:5A:B8:43:66:05:B3:6A:E9:46:36:DF:DF:20:11:6A:F6:11:EA:D9:4E:26:E3:CE:5A:3A:C6:8D:03 : 49:7B:DEa=setup:activea=mid:videoa=sendrecva=rtcp-muxa=rtcp-rsizea=rtpmap:96 VP8/90000a=rtcp-fb:96 goog-remba=rtcp-fb:96 transport-cca=rtcp-fb:96 ccm fira=rtcp-fb:96 nacka=rtcp-fb:96 nack plia=rtpmap:97 rtx/90000a=fmtp:97 apt=96a=ssrc-group:FID 3587783331a=ssrc:3587783331 cname:INxZnBV2Sty1zlmNa=ssrc:3587783331 msid:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs a3b297e7-cdbe-464e-a32c-347465ace055a=ssrc:3587783331 mslabel:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVsa=ssrc:3587783331 label:a3b297e7-cdbe-464e-a32c-347465ace055

Remark: using the Chrome browser, first open webrtc-internals, then open the Alice page point Share button, then open the Bob page point Share, and see the Offer and Answer above.

After the SDP is exchanged, the Candidate is exchanged:

/ / Alice Candidatecandidate: candidate:1912876010 1 udp 2122260223 30.2.220.94 52832 typ host generation 0 ufrag l5KU network-id 1 network-cost 10candidate: candidate:1015535386 1 tcp 1518280447 30.2.220.94 9 typ host tcptype active generation 0 ufrag l5KU network-id 1 network-cost 10// Bob Candidatecandidate:1912876010 1 udp 2122260223 30.2.220.94 51551 typ host generation 0 ufrag MUZf network-id 1 network-cost 10

Finally, the Candidate pair that Alice communicates with Bob chooses the UDP channel:

Video message sent by Alice:

Video message received by Alice (from Bob):

Generally speaking, the pusher initiates the Offer and the receiver sends it to the Answer. For example, the client pushes the stream to SFU, the client initiates the Offer push, the SFU sends the push to the client Answer, and the client pushes the stream to SFU,SFU and forwards it to other clients. Both Licode and Janus do this. In this way, if a client needs to pull streams from other clients, it generally needs to use another PeerConnection to receive the Offer of SFU, generate an Answer and respond to SFU.

However, it is not necessary for the pusher to initiate the Offer. The receiver can also give it to the Offer and the pusher to the Answer. For example, in SFU such as MediaSoup, the client first gives an Offer to SFU,SFU just to check the media features in this Offer, and then SFU generates Offer (including the stream of other clients in the meeting, if there is no one, there is no SSRC) to the client, and the client sends Answer to SFU. The advantage of this method is that other clients join, as well as stream changes (such as closing video to open video), you can use Reoffer, that is, a new Offer is initiated by SFU, and there is only one interaction mode between SFU and the client.

SDP Structure

SDP description is divided into two parts, namely session-level description (session level) and media-level description (media level). Its specific composition can be found in RFC4566, and the asterisk (*) is optional. The common contents are as follows:

Session description (session level description) v = (protocol version) o = (originator and session identifier) s = (session name) cations * (connection information-- not required if included in all media) One or more Time descriptions ("t =" and "r =" lines " See below) asides * (zero or more session attribute lines) Zero or more Media descriptionsTime description t = (time the session is active) Media description (media level description), if present m = (media name and transport address) cations * (connection information-- optional if included at session level) asides * (zero or more media attribute lines)

Compare with the Offer of Alice (only video and no audio are enabled):

/ / Session descriptionv=0o=- 2397106153131073818 2 IN IP4 127.0.0.1s=-c=IN IP4 0.0.0.0// Time descriptiont=0 0// Session Attributesa=group:BUNDLE videoa=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS// Media descriptionm=video 9 UDP/TLS/RTP/SAVPF 96 97c=IN IP4 0.0.0.0a=rtcp:9 IN IP4 0.0.0.0a=ice-ufrag:l5KUa=ice-pwd:+Sxmm3PoJUERpeHYL0HW4/T9a=ice-options:tricklea=fingerprint:sha-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E: 61:CB:9D:91:9B:44:F6:C9:AC:3B:37:1C:00:15:4C:5A:B5:67:74a=setup:actpassa=mid:videoa=sendrecva=rtcp-muxa=rtcp-rsizea=rtpmap:96 VP8/90000a=rtcp-fb:96 goog-remba=rtcp-fb:96 transport-cca=rtcp-fb:96 ccm fira=rtcp-fb:96 nacka=rtcp-fb:96 nack plia=rtpmap:97 rtx/90000a=fmtp:97 apt=96a=ssrc-group:FID 2527104241a=ssrc:2527104241 cname:JPmKBgFHH5YVFyaJa=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56a=ssrc:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LySa=ssrc:2527104241 label:c7072509-df47-4828-ad03-7d0274585a56

SDP Line is sequential, for example, a=rtpmap:96 is followed by its related settings until the next line is a=rtpmap or other properties.

SDP Line does not have a unified Schema description, that is, there is no fixed rule that can resolve all Line,SDP Grammer only describes the attributes related to SDP. The expression of each attribute needs to be defined in RFC 4566 according to the attribute definition, for example:

A=rtpmap: / [/]

When parsing SDP, each SDP Line is parsed as key=... Form. After resolving that key is a, there may be two ways. For more information, please see RFC4566:

Axiatherapy:

For example, c=IN IP4 0.0.0. 0. 0. key is c. For example, the key is a rtcp-mux attribute and there is no value. For example, a=rtpmap:96 VP8/90000,key is a rtpmap,value=96 VP8/90000 attribute.

Sometimes it doesn't have to be a colon (:). In fact, there are colons in value, such as:

A=fingerprint:sha-256 7C:93:85:40:01:07:91:BEa=extmap:2 urn:ietf:params:rtp-hdrext:toffseta=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-timea=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LySSession Level Field

The SDP description fields at the session level include: v, o, s, c, b, t.

V (version) SDP protocol version, with a fixed value of 0.

O (origin) represents the initiator of the session.

The name of the s (session name) session, there can be only one s description in each SDP, and its value cannot be empty.

C (connection data) carries the connection information of the session, which is actually the IP address. The session level description of SDP can contain this field, and each media level description can also contain this field. If both session level and media level have c line, then the media level c line shall prevail. Because WebRTC uses ICE candidate to exchange address information, c line will not be used, but this does not mean that c line is useless. In SIP video conference scenarios, c line is indispensable. This field will be introduced again at the end of the article.

B (bandwidth) represents the recommended bandwidth used by the session or media.

T (timing) specifies the start and end times of the session, which means that the session is permanent if both start and end times are 0.

For a more detailed description of the session level field, refer to RFC 4566.

Media Codecs

When the session level description is complete, it is followed by zero or more media level descriptions, such as:

/ / Session Descriptionv=0.// Audio Media Descriptionm=audio 9 UDP/TLS/RTP/SAVPF 111.// Video Media Descriptionm=video 9 UDP/TLS/RTP/SAVPF 96 97.

This SDP describes an audio and a video, and its format is referred to RFC4566:

M =.

Among them, the following string of numbers 111and 9697 is fmt, which represents audio and video Media Codec respectively, followed by rtpmap, rtcp-fb, fmtp and other attributes to make a further detailed description.

M=audio 9 UDP/TLS/RTP/SAVPF 111a=mid:audioa=rtpmap:111 opus/48000/2a=rtcp-fb:111 transport-cca=fmtp:111 minptime=10;useinbandfec=1m=video 9 UDP/TLS/RTP/SAVPF 96 97a=mid:videoa=rtpmap:96 VP8/90000a=rtcp-fb:96 goog-remba=rtcp-fb:96 transport-cca=rtcp-fb:96 ccm fira=rtcp-fb:96 nacka=rtcp-fb:96 nack plia=rtpmap:97 rtx/90000a=fmtp:97 apt=96

Remark: of course, the type of M line is not only audio and video, but also application (bfcp), text and other media types.

The Remark: a=mid attribute can be thought of as a unique ID for each M description. For example, a=mid:audio, then the string audio is the ID of the M description. Sometimes the value of the mid attribute can also be expressed as a number, such as a=mid:0, so 0 is also the ID described by this M. The mid value is generally used in conjunction with the BUNDLE policy of the grouping transport attribute, such as a=group:BUNDLE audio video, which means that this session will multiplex the M description of mid as audio and video.

Remark: the number 9 of M line represents the transmission port of this media type. In RTC scenarios, the address information of ICE candidate is used for data transmission, so the port of M line is not used. However, in the SIP scenario, the port of M line is very important, where port represents RTP ports and must be an even number. Combined with the IP address in C line in the SDP session level description, we can know the transport address of this media stream of SIP.

Remark: RTX means a retransmission, such as video's 97, which is a retransmission of apt=96. In other words, if the 97 encoding format is used, it adds retransmission function on the basis of 96 (VP8).

The total number of media streams is specified through SSRC:

M=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 90 8 106 13 110 112 113 126a=ssrc:2582129002 cname:8Y1pmIKBijmWeALua=ssrc:2582129002 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H bab38910-40cd-4581-9a20-e3f558abb397a=ssrc:2582129002 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9Ha=ssrc:2582129002 label:bab38910-40cd-4581-9a20-e3f558abb397m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 101 102 127 125 107 109 124a=ssrc:565530905 cname:8Y1pmIKBijmWeALua=ssrc:565530905 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H 2c533cfe-b6bf-41a8-93f0-1ca031436702a=ssrc:565530905 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9Ha=ssrc:565530905 label:2c533cfe-b6bf-41a8-93f0-1ca031436702

Remark: SSRC contains the media streams that need to be sent, and SSRC can be included in both Offer and Answer. For example, when the client communicates with MediaSoup, MediaSoup always sends the Offer of Offer,MediaSoup to the client including the SSRC of the media stream to be sent by MediaSoup (forwarding the stream from other clients to the client). At the same time, the Answer of the client also contains the SSRC stream to be pushed, and their types are all sendrecv.

Remark: msid corresponds to NetStream.id, which represents different media sources, and these SSRC can be different media sources.

How to determine the final code? The other party will give it in Answer. For example, multiple codes are given in Offer above, and one of them is selected in Answer:

M=audio 9 UDP/TLS/RTP/SAVPF 111m=video 9 UDP/TLS/RTP/SAVPF 100 102 127 125 108 124a=rtpmap:100 H264/90000a=rtpmap:102 H264/90000a=rtpmap:127 H264/90000a=rtpmap:125 H264/90000a=rtpmap:108 red/90000a=rtpmap:124 ulpfec/90000

Although Video encodings range from 100 to 125, they are H.264, while 108and 124are FEC, based on H.264.

PlanB and UnifiedPlan

In the MediaCodecs above, there is no provision for how to specify multiple streams. In fact, both Audio and Video have multiple SSRC, and the encoding of each SSRC may be the same or different. For example, when Internet video conferencing is accessed with a mobile terminal, the encoding may all be H.264, but there may be other encodings when accessing with other terminals.

If the coding of SSRC is different, it will be a problem to put these SSRC in the same M description, which is the key to PlanB and UnifiedPlan. For PlanB, there is only one M (audio) and M (video), and their codes should be the same. When there are multiple media streams, they are distinguished according to SSRC. UnifiedPlan can have multiple M (audio) and M (video), and each stream has its own M description, so that different encodings can be supported.

PlanB and UnifiedPlan are actually two different SDP negotiation methods of WebRTC in multiple media sources (multi media source) scenarios. If the concepts of Stream and Track are introduced, then a Stream may contain AudioTrack and VideoTrack, when there are multiple Stream, there will be more Track, if each Track uniquely corresponds to its own M description, then this is UnifiedPlan, if each M line describes multiple Track (track id), then this is Plan B.

Note: when there is only one audio stream and one video stream, the formats of Plan B and UnifiedPlan are compatible.

Remark: PlanB was supported by Chrome in the early days, and UnifiedPlan is also supported in the latest version. Refer to Need to implement WebRTC "UnifiedPlan" for multistream.

PlanB refers to the following figure:

UnifiedPlan refers to the following figure:

Candidate

Candidate is the candidate for transmission. The client will generate multiple Candidate, such as host, relay, UDP and TCP, as shown in the following figure:

SdpMid: audio, sdpMLineIndex: 0, candidate:2213672593 1 udp 2122260223 30.2.228.19 51068 typ hostsdpMid: video, sdpMLineIndex: 1, candidate:2213672593 1 udp 2122260223 30.2.228.19 55061 typ hostsdpMid: audio, sdpMLineIndex: 0, candidate:3446803041 1 tcp 1518280447 30.2.228.19 9 typ hostsdpMid: video, sdpMLineIndex: 1, candidate:3446803041 1 tcp 1518280447 30.2.228.19 9 typ hostsdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 udp 41885439 182.92.80.26 54400 typ relay raddr 42.120.74.91 rport 37714sdpMid: audio SdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 59241 typ relay raddr 42.120.74.91 rport 49618

Remark: we have removed the following attributes, such as generation 0 ufrag kce9 network-id 1 network-cost 10, which belong to the description of Candidate and are related to connectivity checking.

The client itself generated 6 Candidates,3 Audio and 3 Video,2 TCP and 4 UDP,4 host and 2 relay. Of course, the other party will also have a lot of Candidate, followed by their own Candidates and each other's Candidates matching connection (ICE Connectivity Checks), forming a CandidatePair, that is, the transmission channel. Candidate also comes with network properties, such as network-cost will be used in ICE Connectivity Checks.

Remark: about the types of Candidate, as well as srflx and prflx, the definition and distinction of these two Candidate types will be described later in ICE-related technical articles.

Remark: we will give a detailed analysis of ICE Connectivity Checks later, involving the STUN protocol. The SDP information related to ICE is summarized below.

Both SDP and Candidate are exchanged by signaling. If the other party only gives relay the Candidate, for example:

SdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 51542 typ relay raddr 42.120.74.91 rport 56380

In this case, the last CandidatePair that must be connected is Relay to Relay, as shown in the following figure:

From this diagram, we can see the sending and receiving code rate of this transmission channel, the number of packets, RTT and packet loss rate and other information.

In fact, since our client also has a Candidate of type host, it will try to connect directly to the other party's relay using the Candidate of host:

SdpMid: audio, sdpMLineIndex: 0, candidate:2213672593 1 udp 2122260223 30.2.228.19 51068 typ hostStatistics Conn-audio-1-1googActiveConnection false

Of course, this CandidatePair is not available because there is no connectivity.

Remark: WebRTC has the ability to switch between multiple Candidate, which we will analyze in ICE Connectivity Checks.

The above Candidates generates two Candidates for Relay, one for audio and one for video, why only audio is used? This is what the following BUNDLE deals with.

Bundle and RTCP-MUX

During transmission, media channels can be multiplexed, one is the multiplexing of audio and video, the other is the multiplexing of RTCP and RTP.

RTCP and RTP multiplexing, which means that Sender uses one transport channel (single port) to send RTP and RTCP:

M=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 08 106 105 13 110 112 113 126a=rtcp-muxm=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124a=rtcp-mux

At this point, the Receiver must be ready to receive RTCP data on the RTP port and need to reserve some resources, such as RTCP bandwidth.

When audio and video are multiplexed, only one Candidate will be used in the end, such as the client's own SDP Offer and two relay Candidates:

A=group:BUNDLE audio videom=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 08 106 105 13 110 112 113 126a=mid:audiom=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124a=mid:videosdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 udp 41885439 182.92.80.26 54400 typ relay raddr 42.120.74.91 rport 37714sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 59241 typ relay raddr 42.120.74.91 rport 49618

This means that in the end, although audio and video may have separate Candidate, if the other is also a BUNDLE, then only one Candidate will be used in the end. For example, if the other person's Answer is:

A=group:BUNDLE audio videom=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 08 106 105 13 110 112 113 126a=mid:audiom=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124a=mid:videosdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 51542 typ relay raddr 42.120.74.91 rport 56380

In the end, they will only be transmitted using one Candidate. As shown in the following figure:

Rtcp-mux multiplexes RTP and RTCP to a single port for transmission, which simplifies NAT traversal, while BUNDLE multiplexes multiple media streams to the same port for transmission, which not only simplifies ICE-related SDP attributes such as candidate harvesting, but also further simplifies NAT traversal.

Rtcp-mux is an important SDP attribute related to RTC transport, and the principles for SDP negotiation about it are as follows:

If the Offer carries the rtcp-mux attribute, and the Answer party wants to reuse RTP and RTCP to a single port, then the Answer must also carry this attribute.

If the Offer does not carry the rtcp-mux attribute, then the Answer must not carry the rtcp-mux attribute, and the Answer side forbids RTP and RTCP from multiplexing a single port.

The negotiation and use of rtcp-mux must be two-way.

For instance. If the client goes to the subscriber's stream and the client's Offer does not carry the rtcp-mux attribute, the server will think that the client does not support rtcp-mux and will not follow the rtcp reuse process. On the contrary, the server will create two transport channels, RTP and RTCP, respectively. Only when the ICE and DTLS of both channels are successful, will the transport channel of this subscription be considered to be established successfully, and then send streams to the client.

Just imagine, if Offer omits the rtcp-mux attribute because of your carelessness, you will never wait for the day when the server Ready. Therefore, SDP seems to be just some text, very simple, but only when we encounter a few more pits in the actual combat of the project, can we have a deeper understanding of the meaning of SDP attributes and how these properties play a role in the RTC scene.

Remark: for more detailed negotiation details of rtcp-mux, please refer to RFC 8035.

Remark: for more information on how to distinguish rtp from rtcp through header fields in rtcp-mux scenarios, please refer to RFC 5761.

ICE Connectivity

Here we will only explain the information related to ICE Connectivity Checks in SDP, and we will analyze the specific process separately in other articles.

Information related to ICE in SDP includes:

M=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 08 106 105 13 110 112 113 126a=ice-ufrag:kce9a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCEa=ice-options:tricklem=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124a=ice-ufrag:kce9a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCEa=ice-options:trickle

Ufrag and pwd are the usernames and passwords used by the ICE short-term authentication algorithm. Trickle shows that SDP does not contain candidate information, and Candidate is exchanged separately through signaling, so that Connectivity checks and Candidate harvesting can be processed in parallel and the speed of session establishment can be improved.

DTLS

Here we will only explain the information about DTLS in SDP, and the specific DTLS handshake process will be analyzed separately in DTLS-related technical articles.

Information related to DTLS in SDP includes:

M=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 08 106 105 13 110 112 113 126B0:A2:B3:AB:0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BDa=setup:actpassm=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124a=fingerprint:sha-256 B0:A2:B3:AB : 0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BDa=setup:actpass

Fingerprint is the signature of the Certificate certificate in the DTLS process to prevent the certificate of the client and the server from being tampered with.

In addition, setup refers to the role of DTLS, that is, who is DTLS Client (active) and who is DTLS Server (passive). If you can do both, it is actpass. Here we are actpass, so it is up to the other party to determine the final DTLS role in Answer:

M=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126a=fingerprint:sha-256 B1:FD:D6:2D:94:4E:33:A1:8C:9D:EF:ED:EB:AC:CC:2D:E2:37:15:9B:24:8C:BF:F2:7D:6A:B3:81:23:AA:13:54a=setup:active

If the other party is active, that is, DTLS Client, then you can only be DTLS Server, and the other party will initiate DTLS ClientHello to start the DTLS process.

Stream Direction

There are four directions of media streams, namely, sendonly, recvonly, sendrecv, and inactive, which can appear in both session-level and media-level descriptions.

Sendonly means that if only data is sent, for example, if the client pushes to SFU, it will carry the senonly attribute in its own Offer (or Answer).

Revonly means that only data is received. For example, if the client subscribes to the SFU stream, it will carry the recvonly attribute in its own Offer (or Answer).

Sendrecv means that it can be transmitted in both directions. For example, if a client joins a video conference to publish its own stream and subscribe to someone else's stream, then it needs to carry the sendrecv attribute in its own Offer (or Answer).

Inactive forbids sending data. For example, in a RTP-based video conference, if the host temporarily forbids user A's voice, then user A's media-level description of audio should carry the inactive attribute, indicating that audio data can no longer be sent.

NOTE: RFC 4566: senonly and recvonly attributes apply only to media, not to protocols related to media control. For example, in a RTP-based media session, RTCP packets are still sent even in recvonly mode, and RTCP packets are still received and processed normally even in senonly mode.

The four attributes of the media stream direction are very important, which should be carefully checked when assembling the SDP to ensure the correctness of the stream direction.

For example, a stream from a client to a subscriber. If the Offer of the client does not carry recvonly but sendonly, then even if it is indeed the semantics of subscription at the signaling level, because some servers verify the attributes of SDP very comprehensively and strictly (as it should be), in this scenario, the server will not send media streams to the client, and the Answer replied by the server may not carry SSRC at all.

RTCP Feedback

Next, let's talk about rtcp-fb, the media-level SDP attribute, which tells us which RTCP messages the media session can feedback on. It is an important SDP attribute related to QoS.

M=video 9 UDP/TLS/RTP/SAVPF 96a=mid:videoa=rtpmap:96 VP8/90000a=rtcp-fb:96 transport-cca=rtcp-fb:96 ccm fira=rtcp-fb:96 nack pli

As the SDP information above, this is the M description of a video, VP8 encoding, payload type is 96. The last three rtcp-fb attributes show that in terms of network congestion control, twcc; supports nack processing in ARQ, retransmits lost RTP packets, and supports fir and pli processing in key frames, and has the ability to send key frames.

When I was doing SIP, I encountered a pit: after sending a PLI request to a certain type of SIP device, I did not receive a key frame. After a lot of trouble, I finally found that the rtcp-fb description of this device is as follows:

M=video 16402 RTP/AVP 34a=rtpmap:34 H263/90000a=fmtp:34 CIF4=1;CIF=1;QCIF=1;SQCIF=1a=sendrecva=rtcp-fb:* ccm tmmbra=rtcp-fb:* ccm fir

In other words, the device only supports FIR requests and does not have the ability to handle PLI requests (PS: why not check SDP's rtcp feedback earlier, tearful eyes). I would also like to emphasize here: for some very professional and rigorous systems or devices, SDP fully reflects the capabilities they have, and also allows us to find the capabilities they do not have. Every attribute of SDP has its existence meaning and can not be ignored.

Note: rtcp-fb cannot be used in session-level descriptions, only media-level descriptions, and the proto field of its M description must specify AVPF.

Note: there is such a format, aprogresrtcp color media codec * ccm fir, where the asterisk is a wildcard, indicating that all types of media codec under the M description support fir processing and Keyframe feedback.

Compare with SIP SDP

The difference of SDP description between RTC scenario and SIP scenario is shown in three aspects: transmission, media and signaling.

Transmission Level

The process of establishing a company. The audio / video media streaming connection process in RTC scenario is generally ICE + DTLS, but there is no such process in SIP scenario, so there are no ICE/DTLS-related SDP attributes, such as ufrag, pwd, setup, fingerprint and so on.

Port reuse. In RTC scenarios, audio and video streams and RTP/RTCP are multiplexed into a single port. Each stream is distinguished by SSRC and RTP/RTCP is distinguished by the field value of the header of the packet. In SIP scenario, the port is not multiplexed, so there are no rtcp-mux attributes and grouping-related attributes, such as BUNDLE, and audio and video RTP and RTCP are transmitted through independent ports. There are four ports in total, so ports can be used to distinguish streams and RTP/RTCP. Therefore, there is no SSRC attribute.

Link detection. In RTC scenarios, the STUN detection link of ICE is generally used to find the egress address of the peer mapped by NAT, which is called srflx. In SIP scenario, you need to implement the peer address discovery function to obtain the egress address of SIP devices mapped by NAT.

Address information. In RTC scenario, peer address information is exchanged through the candidate of SDP, while in SIP scenario, peer address information is exchanged through the ip of C line and the port of M line.

/ / RTC scene a=candidate:1 1 udp 2013266431 30.136.138 14306 typ host// SIP scene c=IN IP4 30.41.5.131m=audio 2352 RTP/AVP 107114104105 9 18 8 0101 123m=video 2374 RTP/AVP 97126 9634 123Media Level

Screen sharing. The negotiation of screen sharing is carried out through BFCP protocol in SIP scenario, and the mainstream (main) and shared flow (slides) are distinguished by a=content attribute, while in RTC scenario, the negotiation of screen sharing is carried out through external / business signaling. The SDP description of mainstream and shared flow is the same and will not be distinguished.

Media Codec . At present, the audio and video coding in RTC scenarios is generally Opus + H.264Universe VP8. For audio coding, many SIP devices do not support Opus, but use older audio coding, such as G722, PCMA, PCMU. For video coding, H.264 is generally supported, but VP8 is not generally supported.

Signaling Level

SDP exchange. They are all Offer/Answer models. In RTC scenarios, SDP is mainly exchanged through HTTP/TCP protocol, and SDP information is usually carried in HTTP body. In SIP scenario, SDP can be exchanged through UDP/TCP/TLS protocol, and SDP information can be carried in INVITE and 200OK.

Summary

In fact, the SDP text-based protocol format itself is very simple, and its difficulty lies in the numerous and complicated attributes and their meanings extended under different application scenarios (such as traditional SIP videoconference or RTC scenarios). These SDP attributes are scattered among a large number of RFC and drafts, so it is difficult to fully understand and master them without a certain amount of effort. (PS: when I say this, my heart is always full of ten thousand horses. There are too many RFC in WebRTC and they are related to each other. After reading these RFC, you should be prepared for vision loss of 0.2C).

This is the answer to the question about what SDP is in WebRTC. I hope the above content can be of some help to you. If you still have a lot of doubts to solve, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.