A preliminary study of HTTP Protocol-- Historical Evolution and Design ideas 07/16 Update SLTechnology News&Howtos

A preliminary study of HTTP Protocol-- Historical Evolution and Design ideas

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

HTTP protocol is not only the basic protocol of the Internet, but also the necessary knowledge of web page development, and the latest version of HTTP 2 makes it a technical hotspot.

This paper introduces the historical evolution and design ideas of HTTP protocol.

Introduction to HTTP

HTTP is an application layer protocol based on TCP/IP protocol. It does not involve data packet (packet) transmission, mainly specifies the communication format between the client and the server, and uses port 80 by default.

HTTP is the abbreviation of Hyper Text Transfer Protocol (Hypertext transfer Protocol), which is a transfer protocol used to transfer hypertext from a WWW server to a local browser. It can make browsers more efficient and reduce network transmission.

HTTP not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the transferred document and which part of the content is displayed first (such as text before graphics), and so on. HTTP is an application layer protocol, which consists of requests and responses, and is a standard client-server model. HTTP is a stateless protocol.

-- position in the TCP/IP stack

The HTTP protocol is usually carried on the TCP protocol, and sometimes on the TLS or SSL protocol layer. At this time, it becomes what we often call HTTPS. As shown in the following figure:

The port number of the default HTTP is 8010 HTTPS, and the port number of HTTPS is 443.

-- HTTP's request response model

In the HTTP protocol, the client always initiates the request and the server sends back the response. See the following figure:

This limits the use of the HTTP protocol, and it is impossible for the server to push the message to the client when the client does not initiate a request. The HTTP protocol is a stateless protocol, and there is no corresponding relationship between this request and the last request from the same client.

-- Workflow

A HTTP operation is called a transaction, and its working process can be divided into four steps:

1) first, a connection needs to be established between the client and the server. Just click on a hyperlink and HTTP's work begins.

2) after the connection is established, the client sends a request to the server in the format of uniform Resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information and possible content.

3) after receiving the request, the server gives the corresponding response information in the format of a status line, including the protocol version number of the information, a success or error code, followed by MIME information, including server information, entity information and possible content.

4) the information returned by the client receiving server is displayed on the user's screen through the browser, and then the client is disconnected from the server. If an error occurs at one of the above steps, the information that produced the error will be returned to the client with a display output. For users, these processes are done by HTTP himself, and users only need to click with the mouse and wait for the information to be displayed.

The main features of the HTTP protocol can be summarized as follows:

1. Support client / server mode.

two。 Simple and fast: when a customer requests a service from the server, it only needs to send the request method and path. The commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the program scale of the HTTP server is small, so the communication speed is very fast.

3. Flexibility: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-Type.

4. Connectionless: connectionless means limiting the processing of only one request per connection. After the server processes the customer's request and receives the customer's reply, it disconnects. Transmission time can be saved in this way.

5. Stateless: the HTTP protocol is stateless. Statelessness means that the protocol has no memory ability for transactions. The lack of state means that if the previous information is required for subsequent processing, it must be retransmitted, which may result in an increase in the amount of data transmitted per connection. On the other hand, the server responds faster when it does not need previous information.

The following is the part of "Historical Evolution and Design ideas of HTTP Protocol"

1. HTTP/0.9

The earliest version was version 0.9, released in 1991. This version is extremely simple, with only one command, GET.

GET / index.html11

The above command indicates that after the TCP connection (connection) is established, the client requests (request) the web page index.html from the server.

According to the protocol, the server can only respond to strings in HTML format, not in other formats.

Hello World11

When the server has finished sending, close the TCP connection.

II. Brief introduction of HTTP/1.02.1

In May 1996, the HTTP/1.0 version was released and the content was greatly increased.

First of all, content in any format can be sent. This makes it possible for the Internet to transmit not only text, but also images, videos and binary files. This has laid the foundation for the great development of the Internet.

Secondly, in addition to the GET command, the POST command and HEAD command are introduced to enrich the interaction between the browser and the server.

Third, the format of HTTP requests and responses has also changed. In addition to the data part, each communication must include header information (HTTP header) to describe some metadata.

Other new features include status code (status code), multi-character set support, multi-part transmission (multi-part type), permissions (authorization), cache (cache), content encoding (content encoding), and so on.

2.2 request format

Here is an example of a version 1. 0 HTTP request.

GET / HTTP/1.0User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10 / 10 / 5) Accept: * / * 11

As you can see, this format is a big change from version 0.9.

The first line is the request command, and the protocol version (HTTP/1.0) must be added at the end. Followed by multi-line information, describing the situation of the client.

2.3 response format

The server's response is as follows.

HTTP/1.0 200 OK Content-Type: text/plainContent-Length: 137582Expires: Thu, 05 Dec 1997 16:00:00 GMTLast-Modified: Wed, 5 August 1996 15:55:28 GMTServer: Apache 0.84Hello World12341234

The format of the response is "header information + a blank line (\ r\ n) + data". The first line is "Protocol version + status Code (status code) + status description".

2.4 Content-Type Field

With regard to character coding, version 1.0 stipulates that the header information must be an ASCII code, and the subsequent data can be in any format. Therefore, when the server responds, it must tell the client what the data format is, which is what the Content-Type field is for.

Here are some common values for the Content-Type field.

Text/plaintext/htmltext/cssp_w_picpath/jpegp_w_picpath/pngp_w_picpath/svg+xmlaudio/mp4video/mp4application/javascriptapplication/pdfapplication/zipapplication/atom+xml12345678910111213141516171819202122231234567891011121314151617181920212223

These data types are collectively called MIME type, and each value includes a primary type and a secondary type, separated by a slash.

In addition to predefined types, vendors can also customize types.

Application/vnd.debian.binary-package11

The above type indicates that a binary packet from the Debian system is being sent.

MIME type can also use a semicolon at the end to add parameters.

Content-Type: text/html; charset=utf-811

The above type indicates that the page is sent and the code is UTF-8.

When the client requests, you can use the Accept field to declare which data formats you can accept.

Accept: * / * 11

In the above code, the client declares that it can accept data in any format.

MIME type is used not only in the HTTP protocol, but also in other places, such as HTML pages.

2.5 Content-Encoding field

Because the data sent can be in any format, it can be compressed and then sent. The Content-Encoding field describes how the data is compressed.

Content-Encoding: gzipContent-Encoding: compressContent-Encoding: deflate11

When requesting, the client uses the Accept-Encoding field to indicate which compression methods it can accept.

Accept-Encoding: gzip, deflate112.6 disadvantages

The main disadvantage of the HTTP/1.0 version is that only one request can be sent per TCP connection. When the data is sent, the connection is closed, and if additional resources are requested, a new connection must be created.

New TCP connections are expensive because three-way handshakes between the client and server are required and the initial send rate is slow (slow start). As a result, the performance of HTTP version 1.0 is poor. As more and more external resources are loaded on the web page, this problem becomes more and more prominent.

To solve this problem, some browsers use a non-standard Connection field when requesting.

Connection: keep-alive11

This field requires the server not to close the TCP connection so that other requests can be reused. The server also responds to this field.

Connection: keep-alive11

A reusable TCP connection is established until the client or server actively closes the connection. However, this is not a standard field, and the behavior of different implementations may be inconsistent, so it is not the fundamental solution.

III. HTTP/1.1

In January 1997, the HTTP/1.1 version was released, only half a year later than version 1.0. It further perfected the HTTP protocol, which has been in use until 20 years later and is still the most popular version.

3.1 persistent connection

The biggest change in version 1.1 is the introduction of persistent connections (persistent connection), that is, TCP connections are not closed by default and can be reused by multiple requests without declaring Connection: keep-alive.

When the client and server find that each other is not active for a period of time, they can actively close the connection. However, the standard practice is that the client sends Connection: close on the last request, explicitly asking the server to close the TCP connection.

Connection: close

Currently, for the same domain name, most browsers allow six persistent connections to be established at the same time.

3.2 Pipeline mechanism

Version 1.1 also introduces a plumbing mechanism (pipelining), in which clients can send multiple requests at the same time within the same TCP connection. This further improves the efficiency of the HTTP protocol.

For example, the client needs to request two resources. In the past, in the same TCP connection, send an A request, then wait for the server to respond, and then send a B request after receiving it. The pipeline mechanism allows the browser to send both A request and B request, but the server still responds to A request first, and then responds to B request after completion.

3.3 Content-Length field

A TCP connection can now send multiple responses, and there must be a mechanism to distinguish which responses the packets belong to. This is what the Content-length field does, declaring the data length of this response.

Content-Length: 349511

The above code tells the browser that the length of this response is 3495 bytes, and the following bytes belong to the next response.

In version 1. 0, the Content-Length field is not required because the browser finds that the server has closed the TCP connection, indicating that all packets have been received.

3.4 Block transmission coding

The prerequisite for using the Content-Length field is that the server must know the data length of the response before sending it.

For some time-consuming dynamic operations, this means that the server has to wait until all operations are completed before sending data, which is obviously inefficient. A better approach is to generate a piece of data, send a piece, and use "stream" instead of "buffer".

As a result, version 1.1 specifies that you can use Block transfer coding (chunked transfer encoding) instead of the Content-Length field. As long as the header information of the request or response has a Transfer-Encoding field, it indicates that the response will consist of an undetermined number of blocks.

Transfer-Encoding: chunked

Before each non-empty block, there is a hexadecimal value indicating the length of the block. Finally, there is a block of size 0, which means that the data for this response has been sent. Here is an example.

HTTP/1.1 200OKContent-Type: text/plainTransfer-Encoding: other features of chunked25This is the data in the first chunk1Cand this is the second one3con8sequence012345678910123456789103.5

Many new verb methods have been added to version 1.1: PUT, PATCH, HEAD, OPTIONS, DELETE.

In addition, the Host field is added to the header information requested by the client, which is used to specify the domain name of the server.

Host: www.example.com11

With the Host field, requests can be sent to different Web sites on the same server, laying the foundation for the rise of virtual hosting.

3.6 shortcomings

Although version 1. 1 allows multiplexing of TCP connections, all data communication within the same TCP connection takes place sequentially. The server will not make the next response until it has processed one response. If the previous response is particularly slow, there will be a lot of requests waiting in line. This is called head-of-line blocking (Head-of-line blocking).

To avoid this problem, there are only two ways: one is to reduce the number of requests, and the other is to open more persistent connections at the same time. This leads to a lot of page optimization techniques, such as merging scripts and stylesheets, embedding images into CSS code, domain name fragmentation (domain sharding), and so on. This extra work could have been avoided if the HTTP protocol had been better designed.

IV. SPDY protocol

In 2009, Google released the self-developed SPDY protocol, which mainly solves the problem of low efficiency of HTTP/1.1.

After this protocol has been proved to be feasible in Chrome browsers, it is used as the basis of HTTP/2, and the main features are inherited in HTTP/2.

5. HTTP/2

In 2015, HTTP/2 was released. It is not called HTTP/2.0 because the Standards Committee is not going to issue a cloth version, and the next new version will be HTTP/3.

5.1 binary protocol

The header information of the HTTP/1.1 version must be text (ASCII encoded), and the data body can be text or binary. HTTP/2 is a thorough binary protocol, header information and data body are binary, and collectively referred to as "frame": header information frame and data frame.

One of the benefits of binary protocols is that additional frames can be defined. HTTP/2 defines nearly ten frames, laying the foundation for future advanced applications. If you use text to achieve this function, parsing the data will become very troublesome, and binary parsing will be much more convenient.

5.2 Multiplexing

HTTP/2 reuses TCP connections, in which both the client and the browser can send multiple requests or responses at the same time, and do not have to correspond one by one in order, thus avoiding "head-of-line congestion".

For example, in an TCP connection, the server receives both the A request and the B request, so it responds to the A request and finds that the processing process is very time-consuming, so it sends the part of the A request that has been processed, then responds to the B request, and then sends the rest of the A request when it is finished.

Such two-way, real-time communication is called Multiplexing.

5.3 data flow

Because the packets of HTTP/2 are sent out of order, successive packets in the same connection may belong to different responses. Therefore, the packet must be marked to indicate which response it belongs to.

HTTP/2 calls all packets for each request or response a data stream (stream). Each data stream has a unique number. When a packet is sent, the data stream ID must be marked to distinguish which data stream it belongs to. In addition, it is also stipulated that the data streams sent by the client are all odd in ID and even in ID sent by the server.

When the data stream is halfway through, both the client and the server can send a signal (RST_STREAM frame) to cancel the data stream. The only way to cancel data flow in version 1. 1 is to close the TCP connection. That is, HTTP/2 can cancel a request while ensuring that the TCP connection is still open and can be used by other requests.

The client can also specify the priority of the data flow. The higher the priority, the sooner the server will respond.

5.4 header information compression

The HTTP protocol has no status, and all information must be attached to each request. Therefore, many fields of the request are duplicated, such as Cookie and User Agent, and the exact same content must be attached to each request, which wastes a lot of bandwidth and affects speed.

HTTP/2

This point is optimized and the header information compression mechanism (header compression) is introduced. On the one hand, the header information is compressed by gzip or compress and then sent; on the other hand, both the client and the server maintain a header information table, all fields are stored in this table and an index number is generated, and the same field is not sent in the future, only the index number is sent, which increases the speed.

5.5 Server push

HTTP/2 allows the server to actively send resources to the client without request, which is called server push (server push).

The common scenario is that the client requests a web page that contains a lot of static resources. Normally, the client must receive the web page, parse the HTML source code, find that there are static resources, and then issue a static resource request. In fact, the server can expect that after the client requests a web page, it is likely to request static resources again, so it takes the initiative to send these static resources to the client along with the web page.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.