In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how HTTP/2 implements header compression. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
As the function of web becomes more and more complex, the number of requests becomes more and more, followed by more and more traffic in the header, and the link after the establishment of the initial link also sends user-agent and other information, which is a waste. Therefore, http2 proposes to compress the headers of the request and response, that is, instead of just compressing the topic part, this compression method is HAPCK.
Image-20210818200104438
Why compress?
In HTTP/1, the HTTP request and response are composed of three parts: the status line, the request / response header, and the message body. Generally speaking, the message body is compressed by gzip, or the message itself transmits compressed binaries (such as pictures, audio), but the status line and header are transmitted directly in plain text without any compression. As the Web function becomes more and more complex, there are more and more requests per page. According to HTTP Archive, there are hundreds of requests per page on average. As a result of more and more requests, more and more traffic is consumed in the header, especially the transmission of content such as UserAgent and Cookie which does not change frequently every time, which is a complete waste.
The following is the bag grab result of a page I opened casually. As you can see, the network overhead of the transport header exceeds that of 100kb and more than that of HTML:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
The following is the details of one of the requests. As you can see, in order to get 58 bytes of data, several times the traffic was spent on the header transfer:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
In the era of HTTP/1, in order to reduce the traffic consumed by headers, there are many optimization schemes that can be tried, such as merging requests, enabling Cookie-Free domain names, etc., but these solutions will more or less introduce some new problems, which will not be discussed here.
The effect of compression
First of all, go straight to the picture. The Stream selected in the following figure is the first time to visit this site, and the request header issued by the browser:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
You can see from the picture that the length of the HEADERS stream is 206bytes, while the length of the decoded header is 451 bytes. It can be seen that the size of the compressed head has been reduced by more than half.
But is that all? One more picture. The Stream selected in the following figure is the request header issued by the browser after clicking the link on this site:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
You can see that this time, the length of the HEADERS stream is only 49 bytes, but the length of the decoded header is 470 bytes. This time, the compressed head size is almost as small as the original size of 1max 10.
Why is there such a big gap between the two times? Let's expand the header information twice to see the number of bytes occupied by two transmissions of the same field:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
After comparison, it can be found that the header of the second request is very small because most of the key-value pairs occupy only one byte. In particular, headers such as UserAgent and Cookie need to take up a lot of bytes in the first request and only one byte in subsequent requests.
Technical principle
The following screenshot, taken from "HTTP/2 is here, let's optimize!" shared by Google performance expert Ilya Grigorik at the Velocity 2015 SC conference, intuitively describes the principle of header compression in HTTP/2:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
Let me explain in popular language that header compression requires the maintenance of the same static dictionary (Static Table) between HTTP/2-enabled browsers and servers, including common header names and particularly common combinations of header names and values.
Maintain the same static dictionary (Static Table), including common header names, and particularly common combinations of header names and values, maintain the same dynamic dictionary (Dynamic Table), and dynamically add content to support Huffman coding based on static Huffman tables (Huffman Coding)
Static dictionaries serve two purposes: 1) for perfectly matched header key-value pairs, such as method: GET, one character can be used to represent them; and 2) for key-value pairs whose header names can be matched, such as cookie: xxxxxxx, the name can be represented with one character. The static dictionary in HTTP/2 is as follows
IndexHeader NameHeader Value1:authority
2:methodGET3:methodPOST4:path/5:path/index.html6:schemehttp7:schemehttps8:status200... 32cookie
. 60via
61www-authenticate
Show details
At the same time, the browser can tell the server to add cookie: xxxxxxx to the dynamic dictionary so that the entire subsequent key-value pair can be represented by one character. Similarly, the server can update each other's dynamic dictionary. It should be noted that dynamic dictionaries are context sensitive and different dictionaries need to be maintained for each HTTP/2 connection, and using dictionaries can greatly improve compression, where static dictionaries can be used in the first request. Huffman coding can also be used to reduce the size of content that does not exist in static and dynamic dictionaries. HTTP/2 uses a static Huffman code table (see for details), which also needs to be built into the client and server. By the way, the status line information of HTTP/1 (Method, Path, Status, etc.) is split into key-value pairs in HTTP/2 and put into headers (those starting with colons), which can also enjoy dictionary and Huffman compression. In addition, all header names in HTTP/2 must be lowercase.
Implementation details
After understanding the basic principles of HTTP/2 header compression, let's take a look at the specific implementation details. The header key-value pair of HTTP/2 has the following situations:
1) the whole header key value pair is in the dictionary
0 1 2 3 4 5 67 1 1 | Index (7 +) | +-- +-- +
This is the simplest case, using one byte to represent the header, with the leftmost bit fixed at 1, followed by seven bits that store the index of the key-value pair in a static or dynamic dictionary. For example, in the following figure, the header index value is 2 (0000010), which can be queried in the static dictionary: method: GET.
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
2) the header name is in the dictionary, update the dynamic dictionary
0 1 2 3 4 5 67 0 1 | 1 | Index (6 +) | +-- + | H | Value Length (7 +) | +-+- -+ | Value String (Length octets) | +-- +
In this case, you first need to use one byte to represent the header name: the left two bits are fixed at 01, and the next six bits hold the index of the header name in a static or dynamic dictionary. The next byte, the first bit H, indicates whether the header value uses Huffman coding, the remaining seven bits represent the length L of the header value, and the next L bytes are the specific content of the header value. For example, in the following figure, the index value is 32 (100000). The query in the static dictionary shows that the cookie; header value uses Huffman encoding (1) with a length of 28 (0011100). The next 28 bytes are cookie values, which can be decoded by Huffman decoding.
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
The client or server sees the header key-value pair in this format and adds it to its own dynamic dictionary. Subsequent transmission of such content will be in line with the first situation.
3) the header name is not in the dictionary. Update the dynamic dictionary
0 1 2 3 4 5 67 0 1 | 1 | 0 | 0 | 0 | +-- + | H | Name Length (7 +) | +-+- -+ | Name String (Length octets) | +-+ | H | Value Length (7 +) | +-+-- + | Value String (Length octets) | +- -- +
This situation is similar to the second case, except that because the header name is not in the dictionary, the first byte is fixed at 01000000; then declare whether the name uses Huffman encoding and length, and put the specific content of the name; then declare whether the value uses Huffman coding and length, and finally put the specific content of the value. For example, in the following figure, the length of the name is 5 (0000101) and the length of the value is 6 (0000110). After Huffman decoding, pragma: no-cache can be obtained.
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
The client or server sees the header key-value pair in this format and adds it to its own dynamic dictionary. Subsequent transmission of such content will be in line with the first situation.
4) the header name is in the dictionary. Updating the dynamic dictionary is not allowed.
0 1 2 3 4 5 67 0 | 0 | 0 | 0 | 1 | Index (4 +) | +-- + | H | Value Length (7 +) | +-+- -+ | Value String (Length octets) | +-- +
This situation is very similar to the second case, except that the left four bits of the first byte are fixed at 0001, and only four bits are left to store the index, as shown in the following figure:
Uncover HTTP/2 head Compression Technology uncover HTTP/2 head Compression Technology
Here is another point of knowledge: the decoding of integers. The first byte in the figure above is 00011111, which does not mean that the index of the header name is 15 (1111). The first byte removes the fixed 0001, leaving only four bits available, denoting the number of bits as N, which can only be used to represent an integer I less than "2 ^ N-1 = 15". For I, you need to evaluate according to the following rules (pseudocode in RFC 7541, via):
If if I return I # I is less than 2 ^ N-1, return else M = 0 repeat B = next octet # so that B equals the next octet I = I + (B & 127) * 2 ^ M # I = I + (B lower seven bits * 2 ^ M) M = M + 7 while B & 128 = 128B highest bit = 1, otherwise return I return I
For the data in the figure above, the index value calculated according to this rule is 32 (00011111 00010001, 15 + 17), which represents cookie. It is important to note that all numbers written as (N+) in the protocol, such as Index (4 +) and Name Length (7 +), need to be encoded and decoded in accordance with this rule.
Header key-value pairs in this format are not allowed to be added to the dynamic dictionary (but Huffman coding can be used). For some very sensitive headers, such as Cookie used for authentication, this can improve security.
5) the header name is not in the dictionary, so updating the dynamic dictionary is not allowed.
0 1 2 3 4 5 67 0 | 0 | 0 | 0 | 1 | 0 | +-- + | H | Name Length (7 +) | +-+- -+ | Name String (Length octets) | +-+ | H | Value Length (7 +) | +-+-- + | Value String (Length octets) | +- -- +
This situation is very similar to case 3, except that the first byte is fixed at 00010000. This kind of situation is relatively rare, there is no screenshot, you can make up for it. Similarly, the header key-value pairs in this format are not allowed to be added to the dynamic dictionary and can only be reduced by Huffman coding.
In fact, the protocol also specifies two other formats that are very similar to 4 and 5: change the fourth bit of the first byte in the 4 and 5 format from 1 to 0. It means "do not update the dynamic dictionary this time", while 4 and 5 mean "absolutely not allowed to update the dynamic dictionary". The difference is not very big. We will skip it here.
Knowing the technical details of header compression, it is theoretically easy to write a HTTP/2 header decoding tool. I am rather lazy, so go directly to compressor.js in node-http2 to verify:
Var Decompressor = require('./compressor').Decompressor;var testLog = require('bunyan').createLogger({name: 'test'});var decompressor = new Decompressor(testLog, 'REQUEST');var buffer = new Buffer('820481634188353daded6ae43d3f877abdd07f66a281b0dae053fad0321aa49d13fda992a49685340c8a6adca7e28102e10fda9677b8d05707f6a62293a9d810020004015309ac2ca7f2c3415c1f53b0497ca589d34d1f43aeba0c41a4c7a98f33a69a3fdf9a68fa1d75d0620d263d4c79a68fbed00177febe58f9fbed00177b518b2d4b70ddf45abefb4005db901f1184ef034eff609cb60725034f48e1561c8469669f081678ae3eb3afba465f7cb234db9f4085aec1cd48ff86a8eb10649cbf', 'hex');console.log(decompressor.decompress(buffer));decompressor._table.forEach(function(row, index) {console.log(index + 1, row[0], row[1]);});
The original data of the header comes from the third screenshot of this article, and the running result is as follows (only part of the static dictionary is captured):
{': method': 'GET',': path':'/',': authority': 'imququ.com',': scheme': 'https',' user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:41.0) Gecko/20100101 Firefox/41.0', accept:' text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'accept-language':' en-US,en Qcow 0.5, 'accept-encoding':' gzip, deflate', cookie: 'vault 47; u=6f048d6e-adc4-4910-8e69-797c 399ed456mm, pragma:' no-cache'} 1': authority' 2': method' 'GET'3': method' 'POST'4': path''/'5': path''/ index.html'6': scheme' 'http'7': scheme' 'https'8': status' '200. ... 32 'cookie'... 60 'via' 61' www-authenticate''62 'pragma'' no-cache'63 'cookie'' u=6f048d6e-adc4-4910-8e69-797c399ed456'64 'accept-language'' en-US,en;q=0.5'65 'accept'' text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'66 'user-agent'' Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11 Rv:41.0) Gecko/20100101 Firefox/41.0'67': authority' 'imququ.com'
As you can see, the header data copied from Wireshark can be decoded normally, and the dynamic dictionary has been updated (62-67).
Summary
It is important to "use as few connections as possible" when optimizing the performance of HTTP/2 sites. The header compression mentioned in this article is one of the important reasons: the more requests and responses generated on the same connection, the more dynamic dictionaries are accumulated, and the better the header compression will be. So, for HTTP/2 sites, the best practice is not to merge resources and not to hash domain names.
By default, browsers use the same connection for these situations:
Resources under the same domain name; resources under different domain names, but two conditions are met: 1) resolve to the same IP;2) use the same certificate
The first point above is easy to understand and the second point is easy to ignore. In fact, Google has already done this, and a series of Google websites share the same certificate, which can be verified like this:
$openssl s_client-connect google.com:443 | openssl x509-noout-text | grep DNSdepth=2 C = US, O = GeoTrust Inc., CN = GeoTrust Global CAverify error:num=20:unable to get local issuer certificateverify return:0 DNS:*.google.com, DNS:*.android.com, DNS:*.appengine.google.com, DNS:*.cloud.google.com, DNS:*.google-analytics.com, DNS:*.google.ca, DNS:*.google.cl, DNS:*.google.co.in DNS:*.google.co.jp, DNS:*.google.co.uk, DNS:*.google.com.ar, DNS:*.google.com.au, DNS:*.google.com.br, DNS:*.google.com.co, DNS:*.google.com.mx, DNS:*.google.com.tr, DNS:*.google.com.vn, DNS:*.google.de, DNS:*.google.es, DNS:*.google.fr, DNS:*.google.hu, DNS:*.google.it DNS:*.google.nl, DNS:*.google.pl, DNS:*.google.pt, DNS:*.googleadapis.com, DNS:*.googleapis.cn, DNS:*.googlecommerce.com, DNS:*.googlevideo.com, DNS:*.gstatic.cn, DNS:*.gstatic.com, DNS:*.gvt1.com, DNS:*.gvt2.com, DNS:*.metric.gstatic.com, DNS:*.urchin.com, DNS:*.url.google.com, DNS:*.youtube-nocookie.com DNS:*.youtube.com, DNS:*.youtubeeducation.com, DNS:*.ytimg.com, DNS:android.com, DNS:g.co, DNS:goo.gl, DNS:google-analytics.com, DNS:google.com, DNS:googlecommerce.com, DNS:urchin.com, DNS:youtu.be, DNS:youtube.com, DNS:youtubeeducation.com
Deploying Web services with multiple domain names and the same IP and certificate has a special significance: terminals that support HTTP/2 can only establish one connection, while terminals that only support HTTP/1.1 will establish multiple connections to achieve the goal of more concurrent requests at the same time. This is also a good choice before HTTP/2 is fully popularized.
Thank you for reading! This is the end of the article on "how to achieve head compression in HTTP/2". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.