What are the basics of HTTP 07/19 Update SLTechnology News&Howtos

What are the basics of HTTP

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article is to share with you about the basic knowledge of HTTP, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

What is HTTP?

HTTP is a hypertext transfer protocol, which is used to complete a series of operation processes, such as client-side and server-side. The agreement refers to the agreement of the rules. It can be said that Web is based on the HTTP protocol for communication.

The birth of HTTP

I'm sure everyone will learn about the history of a technology before learning it. Let's take a look at the history of HTTP.

HTTP was born in March 1989. It was put forward by a friend named Tim Berners-Lee, and the basic idea was to connect the hypertext formed by the interrelation of multiple documents into a WWW (World Wide Web) that can be referred to each other. Web for short.

HTTP 0.9 came out in 1990. At that time, HTTP had not yet been established as a formal standard.

HTTP 1.0 was officially adopted as a standard in May 1996. The protocol standard is still widely used on the server side.

HTTP 1.1 was announced as the current mainstream version of HTTP protocol in January 1997.

HTTP 2.0 solicited suggestions in March 2012.

HTTP 2.0 released its first draft in September of the same year.

HTTP 2.0 was standardized in November 2014.

Learn about TCP/IP

Before we understand HTTP, let's take a brief look at the TCP/IP protocol family. The commonly used network operates on the basis of the TCP/IP protocol, and HTTP belongs to a subset of it.

TCP/IP protocol family

When computers and network devices communicate with each other, both sides must be based on the same method. For example, some rules should be determined first, such as how to detect the communication target, which side initiates the communication first, what language to communicate, how to end the communication, and so on. Communication between different hardware and operating systems, all of which requires a rule. And this rule is called protocol.

The protocol includes: from the specification of the cable to the method of selecting the IP address, the method of finding remote users, the order in which the two parties establish communication, and the steps to be processed by displaying the Web page, and so on. The aggregation of these associated protocols is always called TCP/IP.

The function of each layer of TCP/IP model

The important point of TCP/IP is layering. There are four layers: application layer, transport layer, network layer and data link layer.

TCP layer 4. PNG

Let's introduce the role of each layer.

Application layer: the application layer determines the activity of communication when providing application services to users. For example, FTP (File transfer Protocol) and DNS (Domain name Resolution system). The HTTP protocol is also at this layer.

Transport layer: the transport layer to the upper application layer provides data transfer between two computers in a network connection. There are two different protocols in this layer: TCP transmission control protocol and UDP user data protocol.

Network layer: the network layer is used to process packets on the network. A packet is the smallest unit of data transmitted over a network. The function of the network layer is to select a transmission route among multiple routes for data transmission.

Link layer: used to deal with the hardware part of the network. Including what operating system, hardware equipment, what router and so on, all belong to this layer.

The advantage of TCP/IP hierarchy is that if the Internet is planned by a single protocol, when a design change is needed somewhere, all parts must be replaced as a whole. After layering, only the changed layers need to be replaced. After the interface between each layer is planned, the internal design of each layer can be changed freely. For example, applications on the application layer can only think about the tasks assigned to them, without thinking about other problems.

TCP/IP communication transport stream

When the TCP/IP protocol communicates, it communicates with the other party through a hierarchical sequence. The client goes down from the application layer and the server side goes up from the link layer. Look at the picture below.

1.3.1.jpg

First, the client issues a HTTP request at the application layer.

Then, after the data of the application layer is received by the transport layer, it is divided, and each message is marked with a serial number and the port number is forwarded to the network layer.

At the network layer, the MAC address of the communication destination is added and forwarded to the link layer.

The server at the receiving end (also known as the server side) receives data at the link layer and sends it sequentially to the upper layer all the way to the application layer. The HTTP request sent by the client is not really received until it is transmitted to the application layer.

Protocols related to HTTP

Before the HTTP client sends messages to the server, IP, TCP and DNS, which are closely related to HTTP, need to be used.

IP network protocol

The IP (Internet Protocol) network protocol is at the network layer. The function of the IP protocol is to send various packets to each other. But to ensure the correct transmission to the other party, two important conditions are the IP address and the MAC address. Think of it as your home address, or your phone number.

The IP address refers to the address to which the node is assigned, and the MAC address refers to the fixed address to which the network card belongs. IP addresses can be paired with MAC addresses. IP addresses are mutable and MAC addresses are immutable.

Don't get the IP address confused with the IP address. IP is a protocol. And the IP address is the identity of each computer.

ARP protocol

Communication between IP depends on MAC address. The two sides who communicate on the network rarely connect to each other on the same local area network, usually through multiple computers or network equipment. In the process of transit, the MAC address of the next station transit device will be used to search for the next transit target. At this point, the ARP protocol is used. ARP protocol is a protocol used to parse the address, and the corresponding IP address can be found out through the MAC address of the communicating party.

In the transit process before reaching the communication destination, the computer and the router can only get a rough transmission route, which is called routing.

It's the same reason as you buy things on Taobao. For example, if you buy a dress on Taobao, the courier company will deliver the goods according to your address, but not directly to you in the process of delivery. But through all kinds of Hangzhou transfer station and then to Shenzhen transfer station, and then sent to your hands.

TCP protocol

TCP protocol is in the transport layer, and its main function is to provide reliable byte streaming service. Byte streaming service refers to the management of large chunks of data divided into packets in units of message segments in order to facilitate transmission. Reliable transmission service refers to the ability to transmit data to each other accurately and reliably.

In order to accurately transmit the data to each other, the three-way handshake appears. The following figure shows this process.

1.4.1.jpg

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

First handshake: the client first sends a packet with the SYN logo to the other party.

Second handshake: after the server receives it, it sends back a packet with the SYN/ACK flag to convey the confirmation message.

Third handshake: finally, the client returns a packet with the ACK flag indicating the end of the handshake.

DNS service

DNS services, like the HTTP protocol, are at the application layer. Its main function is to resolve domain names to IP addresses. DNS protocol can find the IP address through the domain name, or reverse check the service of the domain name through the IP address.

The relationship between each protocol and the HTTP protocol is shown below.

What are URL and URI?

URL refers to the uniform resource locator, which is the address of the website that needs to be entered to access the Web website. For example, http://www.tutu.com.

URI refers to the uniform resource identifier, the full name is Uniform Resource Identifier, its function is to distinguish between different resources in the Internet. For example, HTML documents, images, video clips, programs, and so on. URL is a subset of URI.

URI format

The following figure shows the format of URI.

URI.jpg

The protocol name: http: or https: indicates the protocol name. Do not distinguish the case of letters, and finally add / /.

Login information: user:pass@ represents the user and password that obtained the server resource. But it is not recommended because it is not safe.

Server address: there are three kinds of server address:

Www.tutu.com as a domain name

With the address name of IPv4 192.168.0.1

The IPv6 address enclosed in square brackets.

The server port number:: 8080 represents the port number.

File path: / html/index.html indicates the server file path and the access location of the resource.

The query string:? userId=1 represents the parameters in the file path. ? Followed by a key=value form. If you need to add parameters later, use & stitching.

Fragment identifier: # cn1 indicates a location in the file. Is the usual web anchor location.

HTTP Foundation

HTTP is a stateless protocol that does not persist requests / responses that have been sent.

Persistent connection

All connections in HTTP 1.1 are on by default (keep-alive). The Connection field in the request / response header allows you to see whether the persistent connection is enabled (the value of this field is described later), which is turned off by default in HTTP1.0 (close).

Its characteristic is that the TCP connection will be maintained as long as one end of the client or server side does not propose to disconnect. The advantage is that the extra overhead caused by repeated establishment and disconnection of TCP connections is reduced, and the pressure on the server is reduced. This allows HTTP requests and responses to end faster, and also improves the display speed of the page.

Pipelining

Pipelining is the ability to send the next request without waiting for a response, that is, parallel processing. Instead of waiting for a response one after another, pipelining is faster than persistent connections.

HTTP message

There are two kinds of messages in HTTP: request message and response message. The message is divided into the message header and the message body, which is optional. The message contains the following three parts.

There are two types of starting lines (start line).

Request line: requested method, requested URL, version of HTTP

Response line: HTTP version, status code

Header field (header): some header information in the form of key: value.

Body: the data that is sent.

Message body .jpg

This picture takes the request message as an example.

Request method of HTTP

GET: get server resources.

POST: submit information to the server.

PUT: transfer files.

HEAD: same as the GET method. But only the response header is returned. The role is to determine the effectiveness of the URL and the timing of resource updates.

DELETE: deletes the specified resource.

OPTIONS: query the methods supported by the resource specified by the request server.

TRACE: used to confirm some actions that occur during the connection.

CONNECT: establish a connection channel for the proxy server.

HTTP status code

1xx

1XX indicates that the received request is being processed.

2xx succeeded

200OK: indicates that the request sent by the client is processed normally on the server side.

204 No Content: indicates that the request was processed successfully, but there are no resources to return.

206Partial Content: indicates that the client only gets a portion of the file, and the server successfully executes this part of the GET request. The response message contains the entity content of the specified part of the Content-Range.

3xx redirection

301 Moved Permanenty: permanent redirection. Indicates that the requested resource has been assigned a new URL, and the URL that the resource now refers to will be used later.

Found: temporary redirection. Indicates that the requested resource is assigned a new URL.

See Other: indicates that the requested resource has another URL, and the requested resource should be obtained using the GET method.

Not Modified: indicates that the request has been found but does not meet the criteria. The negotiation cache returns this status code.

307 Temporary Redirect: temporary redirection, similar to 302. But change the request method.

When the 301,302,303 response status code is returned, almost all browsers will change POST to GET and delete the body in the request message, after which the request will be automatically sent again. Standard 301,302 forbids changing POST to GET, but everyone will do so in practice.

4xx client error

400 Bad Request: indicates a syntax error in the request message.

401 Unauthorized: an authentication message indicating that the request sent is to be authenticated by HTTP. If a request has been made before, the user authentication failed.

Forbidden: indicates that access to the requested resource is denied by the server.

404 Not Found: indicates that the requested resource cannot be found on the server.

5xx server error

500 Internal Serve Error: indicates that an error occurred on the server side while executing the request.

503 Service Unavailable: indicates that the server is temporarily overloaded or is undergoing downtime maintenance.

Web servers related to HTTP

When HTTP communicates, in addition to client-side and server-side, there are also some applications for communication data forwarding. Such as proxies, gateways, tunnels, and caches.

Agent

Agent is a kind of application with forwarding function, which exists between the client and the server, which is equivalent to a middleman. It forwards the request sent by the client to the server. Of course, it also forwards the response returned by the server to the client.

Proxy server .jpg

Each time a request or response is forwarded through a proxy server, the Via field appears in the header.

Gateway

A gateway is a special server that acts as an intermediate entity for other servers. Used to convert HTTP requests into other protocol communications. When a gateway receives a request, it processes the request as if it were the source server of its own resource.

Gateway .jpg

Tunnel

A tunnel can establish a communication line with other servers as required, and then use SSL encryption to communicate. The purpose of the tunnel is to ensure secure communication between the client and the server.

Tunnel .jpg

Caching

Caching refers to a copy of a resource saved on the local disk of a proxy server or client. Caching can be used to reduce access to the source server, the main purpose is to reduce network bandwidth traffic and communication time.

A cache server is a type of proxy server that keeps a copy of the resource when the agent forwards the response returned from the server. The advantage of the cache server is that you can avoid forwarding resources from the source server multiple times through caching. So the client can get resources from the nearest cache server, and the source server does not have to process the same request multiple times.

Validity period of the cache

Whenever the resource on the source server is updated, if you still use the same cache, it will become the old resource before the update.

Even if there is a cache, it will confirm the validity of the resource to the source server because of the requirements of the client, the validity of the cache, and so on. If the cached resource has expired, the cache server acquires the new resource from the source server.

Client cache

The client cache here refers to the cache in the browser. If the browser cache does not expire, it does not have to request the same resource from the source server, but directly gets the resource cached on the local disk. When a resource expires, the validity of the resource is confirmed to the source server. If the cached resource expires, a resource request is made to the source server again.

Content negotiation

The content negotiation mechanism means that the client and the server negotiate with each other on the content of the response resources, and then provide the most appropriate resources for the client. Content negotiation will be in language, character set, encoding, and so on.

The main request headers used are:

Accept-Charset

Accept-Language

Content-Language

There are three types of content negotiation techniques.

Server driven negotiation (Server-driven Negotiation)

The content is negotiated by the server.

Client initiates negotiation (Agent-driven Negotiation)

The content is negotiated by the client.

Transparent negotiation server-driven and client-driven combination, a method of content negotiation between the server and the client.

End-to-end header and Hop-by-hop header

HTTP header fields are defined as cached proxies and non-cached proxies. There are two types.

End-to-end header End-to-end

The header in this category is forwarded to the final receiving destination corresponding to the request or response, and must be saved in the response generated by the cache, specifying that it must be forwarded.

Hop by hop head Hop-by-hop

Headers in this category are only valid for a single forwarding and will not be forwarded because they are cached or proxied. In HTTP 1.1 and later, if you use Hop-by-hop headers, you provide the Connection header field.

Except for the following 8 header fields, all other fields belong to the end-to-end header.

Connection

Keep-Alive

Proxy-Authenticate

Proxy-Authorization

Trailer

Transfer-Encoding

Upgrade

HTTP Universal header Field

The fields that appear in both the request / response headers are listed below, all of which contain important information.

Cache-Control

Cache-Control represents the caching operation of the resource, and the parameters are optional, if there are multiple parameters, separated by.

Request header

When you use the Cache-Control field in the request header, its value is as follows:

No-cache: forces the source server to verify that the cached resource is expired again without using a strong cache (go to the negotiated cache).

No-store: get the latest resources from the source server every time without using any cache.

Max-age: in seconds. Indicates that the cached resource does not exceed the specified time, and the client acquires the resource from the cache.

Min-fresh: in seconds. The proxy server is required to return cache resources that have not passed at least the specified time.

Max-stale: receive resources even if they are out of date.

Only-if-cached: tell the proxy server to get the resource from the cache, if any.

No-transform: resources cannot be converted and similar operations such as caching or proxy compression of images can be prevented.

Response head

When you use the Cache-Control field in the response header, its value is as follows:

Public: resources can be cached by browsers and proxy servers.

Private: resources can only be cached by browsers. Nothing else.

No-cache: can be cached, but verify that the cache resource is expired to the source server before each use.

S-maxage: only available to the proxy server, indicating how long the resource in the proxy server expires. After using s-maxage, the max-age and Expires fields are ignored.

Max-age: in seconds. Set the cache time, and if it is not exceeded, you do not have to request resources from the server. If it exceeds, it proves that the resource has expired. If the Expires field appears in the response header, max-age is limited in HTTP 1.1, but the opposite is true in HTTP1.0.

Must-revalidate: can be cached, but must be verified again with the source server. If the request fails, a 504 status code is returned. This field ignores max-stale.

Proxy-revalidate: requires the cache server to confirm the validity of the cache response.

No-transform: resources cannot be converted and similar operations such as caching or proxy compression of images can be prevented.

Connection

The Connection field determines whether to close the current TCP connection after it is complete. There are two kinds.

Keep-Alive: persistent connections.

Once the close:TCP connection is complete, close the connection immediately.

Date

The value of the Date field is in the GMT time date format, indicating the time and date on which the HTTP message was created.

Date: Tue, 13 Apr 2021 12:35:41 GMT

Pragma

Pragma is used for backward compatibility with caching servers that only support the HTTP1.0 protocol. It has the same effect as Cache-Control.

Pragma: no-cache

Upgrade

Upgrade is used to see if the HTTP protocol or other protocols can communicate using a later version.

Upgrade: HTTP/2.0 Connection: Upgrade

Via

Via is used to track the transmission path of request and response messages between the client and the server, and to avoid the occurrence of request loops.

Via: 1.0 gw.hackr.jp (Squid/3.1) Via: 1.0 gw.hackr.jp (Squid/3.1), 1.1 al.example.com (Squid/2.7)

When passing proxy server A, the Via header is appended with a string value such as "1. 0 gw.hackr.jp (Squid/3.1)". Line 1. 0 refers to the version of the HTTP protocol applied on the server that received the request. If you pass through multiple proxy servers, this information will be appended later.

Warning

The Warning field tells the user some warnings related to caching.

Request header field

The Accept request header is used to tell the server what types of content the client can handle. Several media types are listed below.

Text files: text/html, text/plain, text/css, application/xhtml+xml, application/xml, etc.

Picture files: image/jpeg, image/gif, image/png

Video files: video/mpeg, video/quicktime

Binary files: application/octet-stream, application/zip

When the value is * / *, the client can be any content type. The value image/*, is used to represent any other picture type.

If you want to give priority to the type of media displayed, separate it with a semicolon (;) by using Q = to indicate the weight value. The weight value ranges from 0 to 1, which can be accurate to 3 decimal places, with 1 being the maximum value. When no weight value is specified, the default weight is qweights 1.0.

Accept: text/html, appliaction/json;q=0.9

Accept-Charset

The Accept-Charset request header is used to tell the server the type of character set that the client can handle. In addition, you can specify multiple character sets at once. As with Accept, priority is represented by a Q value. The top applies to the server-driven negotiation of the content negotiation mechanism.

Accept-Charset: iso-8859-1 Accept-Charset: iso-8859-1

Accept-Encoding

The Accept-Encoding request header is used to tell the server how to encode the content that the client can understand. You can specify multiple content encodings at once, including the following.

Gzip: the encoding format generated by the file compression program gzip, using the Lempel-Ziv algorithm and 32-bit cyclic redundancy verification.

Compress: the encoding generated by the UNIX file compression program compress, using the Lempel-Ziv-Welch algorithm.

Deflate: combines the zlib format with the encoding generated by the deflate compression algorithm.

Indentity: default encoding format that does not perform compression or does not change.

Accept-Encoding: gzip, deflate

As with Accept, the priority is set with a Q value. An asterisk (*) is also used to specify any encoding format.

Accept-Language

The Accept-Language request header is used to tell the server the set of natural languages (Chinese and English sets) that the client can understand, as well as the priority of the natural language set. Like Accept, multiple natural language sets can be specified. Use the Q value to set priority.

Accept-Language: zh-CN,zh;q=0.9;q=0.8

If the server has a Chinese version, the client will request a response from the Chinese version, and if not, the English version will be returned.

Authorization

Authorization is used to tell the server the authentication information (certificate value) of the user agent. The header field Authorization is usually added to the request after the server returns a 401 status code response.

Authorization: Basic dWVub3NlbjpwYXNzd29yZA==

Expect

Expect is used to tell the server that the request will be processed only if this condition is met. If the server does not meet the requirements of the client, a 417 status code is returned. At present, only the condition of 100-continue is stipulated.

Expect: 100-continue

From

The From field represents the email address of the user of the user agent in order to display the email contact information of the person in charge of the search engine user agent.

From: info@hackr.jp

Host

The Host request header indicates the host name and port number of the server on which the requested resource is located. If the server does not set a hostname, a null value is sent.

Host: www.tutu.com

If-Match

Request header fields at the beginning of If-xxxx like this are all conditional requests. When the server receives a conditional request, it executes the request only if the condition is true.

The request will only be processed when the If-Match field is used to compare with the ETG value of the server resource, and the ETAG value is equal to the value of If-Match. Otherwise, a 412 status code is returned. By using an asterisk (*) to indicate that the request is processed as long as the resource exists, but the server ignores the value of ETag.

If-Match: "123456"

If-Modified-Since

If-Modified-Since is used to determine the resource availability of a proxy server or client. The request is processed when the requested resource changes after the specified time. If none of the resources have changed, a 304 status code is returned.

If-Modified-Since: Tue, 13 Apr 2021 12:35:41 GMT

If-None-Match

If-None-Match, in contrast to If-Match, processes the request only if the value of the server resource's ETag is different from that of the If-None-Match. Add this field to the GET and HEAD request methods to get the latest resources.

If-Range

If-Range is used to tell the server that if the value of the If-Range field is the same as the ETAG value or time of the requested resource, it will be processed as a range request (the Range field specifies how many bytes of data are requested). Otherwise, ignore the scope request and return all resources.

If-Range: "123456" Range: bytes=5001-10000

If-Unmodified-Since

The If-Unmodified-Since field is used to tell the server that the request will be processed only if the requested resource has not been modified after the specified time. If a modification occurs after the specified time, a 412 status code is returned.

If-Unmodified-Since: Tue, 13 Apr 2021 12:35:41 GMT

Proxy-Authorization

The Proxy-Authorization field contains the credentials that the user agent provides to the proxy server for authentication.

Proxy-Authorization: Basic dGlwOjkpNLAGfFY5

Range

The Range request header field indicates which part of the resource to get. When the server receives a request with a Range field, it returns a 206status code after processing the request. If it cannot be processed, the 200 status code is returned and all resources are returned.

Range: bytes=5001-10000

Referer

The Referer field indicates the Web page from which the requested URL is initiated. The server identifies the access source through the Referer field for statistical analysis, logging and cache optimization.

Referer: www.tutu.com

TE indicates that the client can handle the transmission encoding and priority of the response.

TE: gzip, deflate;q=0.5

User-Agent

The User-Agent field is used to pass information such as the requested browser and user agent name to the server.

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11 / 2 / 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36 response header field

Accept-Ranges

Accept-Ranges indicates that the server can handle resources within a specified range. There are two values: bytes and none. None indicates that requests within the specified scope cannot be processed.

Age

Age indicates how long ago the source server returned the resource. In seconds.

Age: 3600

ETag

ETag represents a specific identifier of a resource, and the server assigns a corresponding EAG value to each resource. When the resource changes, the ETAG value also changes. For example, if you visit a website with the same URL in both Chinese and English, when you switch to Chinese, you will return a resource in Chinese (ETag: user-chi), while when you switch to English, you will return a resource in English (ETag: user-us).

Strong and weak ETag

Strong ETag: changes the value of a resource no matter what happens to it.

ETag: "user-123456"

Weak ETag: whether the resource is the same or not, the ETAG value will be changed only when the resource is fundamentally changed and there is a difference. W / characters are added at the beginning of the field value

ETag: W / "user-123456"

Location

The Location field indicates that the page needs to be redirected to an address, which is usually useful if the response code is 3xx.

Location: www.baidu.com

Proxy-Authenticate

The Proxy-Authenticate field indicates how the resources of the proxy server are to be authenticated.

Retry-After

The Retry-After field indicates how long the client should request again. Used with 503 and 3xx status code response.

Retry-After: 120

Server

The Server field represents the software-related information used by the server that processed the request.

Server: Apache/2.2.17

Vary

The Vary field can control the cache. After receiving a response from the source server with the Vary specified item from the proxy server, if you want to cache, the request with the Vary specified header field in the request will be cached. Even if a request is made for the same resource, if the header field specified by the Vary is different, the resource must be retrieved from the source server.

Vary: Accept-Language entity header field

Allow

Allow is used to tell the client that the HTTP method supported by the resource will return a 405 status code in response if the server receives an unsupported HTTP method.

Allow: GET, DELETE

Content-Encoding

The Content-Encoding field indicates how the server encodes the content for the body of the entity. Content coding has been introduced in Accept-Encoding, and there are four.

Content-Encoding: gzip

Content-Language

The Content-Language field represents the natural language used by the entity body.

Content-Language: zh-CN

Content-Length

Content-Length represents the size, in bytes, of the body of the entity.

Content-Length: 1500

Content-Location

The Content-Location field represents the address of the data to be returned.

Content-Location: https://www.tutu.com/index.html

Content-Range

The Content-Range field represents the location of a piece of data throughout the file.

Content-Range: bytes 5001-10000 Compact 10000

Content-type

The Content-type field represents the media type of the object in the entity body.

Content-type: text/html; charset=UTF-8

Expires

The Expires field tells the client the date when the resource expired. After this date, the resource expires. That is, resources can be obtained from the browser cache within the specified date. If this date is exceeded, a resource request must be made to the server. If there is a Cache-Control: max-age in the header, the max-age instruction is processed first.

Expires: Tue, 13 Apr 2021 12:35:41 GMT

Last-Modified

The Last-Modified field indicates when the resource was last modified.

Last-Modified: Tue, 13 Apr 2021 12:35:41 GMTHTTP cache

HTTP cache can be divided into strong cache and negotiation cache, which is mainly used to speed up the acquisition of resources, improve user experience, reduce network connection, and alleviate server pressure.

Strong cache

For strong caching, the browser determines whether the requested resource is within the validity period. If it is within the validity period, the resource is read directly from the cache without sending a resource request to the server. Strong caching is set through the three header fields Expires, Cache-Control, and Pragma.

Cache-Control

The Cache-Control header field also details the property values on each side. Here are some of the most common values.

Public: this resource can be cached by browsers and proxy servers.

Private: this resource can only be cached by the browser, nothing else.

No-cache: forces the validity of the cache to be verified again to the source server without using a strong cache. This value indicates walking the negotiation cache.

No-store: do not use any cache and get the latest resources from the source server every time.

Max-age: if the cached resource does not exceed the specified time, the client acquires the resource from the cache. In seconds.

S-maxage: applies only to proxy servers, indicating how long the resources in the proxy server expire. After using s-maxage, the max-age and Expires fields are ignored.

Expires

The value of the Expires field is a time date in GMT format, which tells the client the date when the resource expires, and the client receives the response body with the field and caches it. When the client initiates the same resource request, the value of Expires is compared with the local time. If the local time of the request is less than the value of Expires, the resource in the cache is directly used without initiating the request to the server.

The value of Expires creates a problem. If you modify the local time, it will cause the client-side and server-side time to be inconsistent, so the judgment of cache expiration will not be as expected.

Expires has the lowest priority among the three.

Pragma

Pragma can take a look at the above introduction, which is not explained too much here.

Strong caching in Chrome returns a 200 status code and there are two cases.

Memory cache: resources are fetched from browser memory as long as the page is not closed.

Disk cache: reads cache resources from disk.

Strong caching. Jpg

After using a strong cache, if the resource on the server side is updated, the client does not know it, and the resource in the cache is used before it expires. You can force a refresh through Ctrl + F5.

Negotiation cache

The negotiation cache makes a GET request to the server before using the local cache to verify that the resources saved locally by the browser have expired.

Last-modified and if-unmodified-sine

In general, it is judged by the time stamp of the last modification of the requested resource. Let's take an example: suppose the client requests a file from the server in order to use the local cache through the negotiated caching mechanism when the resource is requested again. The response header that returns the resource for the first time contains a field of last-modified whose value indicates when the resource was last modified. When the page is refreshed, the resource uses the negotiation cache. The browser cannot confirm whether the local cache has expired, and then initiates a GET request to the server to negotiate the cache validity. The request header of this request contains an if-unmodified-since field, and the value of the field is the value of the last-modified field in the last response header.

Inadequacies of last-modified

Last-modified has two drawbacks:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

It is only based on the last modification time of the resource, although the requested file resource has been edited, the content has not changed, and the time will be updated. This leads to the invalidation of the verification of validity when negotiating the cache, and it is necessary to remake the request for resources.

Because the time unit of file resource modification is seconds, if the file is modified frequently. For example, if you change it every few hundred milliseconds, you will not be able to recognize the update of the file resource.

ETag and if-none-match

To make up for the lack of time judgment, HTTP 1.1 adds the header information of ETag (entity tag).

ETag represents a specific identifier of a resource, similar to a file fingerprint. The function is also mentioned above, but it is not explained too much here.

When both the last-modified and ETag fields exist in the response header, ETag will prevail. When a request is made for the resource again, the value of ETag in the previous response header will be taken as the value of the if-none-match field in this request and sent to the server for cache validity verification. If the cache is verified to be valid, the 304 status code is returned and the response is redirected to the local cache.

Inadequacies of ETag

The emergence of ETag is not a substitute for last-modified, but a supplementary scheme, which still has its drawbacks.

If the resources are large, the number is large, and the modifications are frequent, then the process of generating ETag will affect the performance of the server.

It is also mentioned above that ETag is also divided into strong ETag and weak ETag.

Strong ETag values are generated based on the content of the resource, ensuring that each byte is the same.

Weak ETag values are generated based on some property values of the resource and are generated quickly but there is no guarantee that every byte is the same.

If the browser is in the negotiation cache and the resource has not changed, the server will return a 304 status response code to tell the browser to obtain the locally cached resource.

1617632172962.jpg

Shortcomings of HTTP

The main shortcomings of HTTP protocol are as follows.

The communication is in clear text and the content will be eavesdropped.

If you do not verify the identity of the communicating party, you may encounter camouflage

Unable to prove the integrity of the message and may have been tampered with

The communication is in clear text and the content will be eavesdropped.

The HTTP protocol itself has no encryption function, so it is impossible to encrypt the communication request and response content.

TCP/IP is a network that can be bugged.

Because of the working mechanism of the TCP/IP protocol, the communication content may be peeped on all communication lines. No matter where the server is communicating with the client, some of the devices on the communication line cannot be personal belongings. Therefore, we do not rule out the act of being maliciously peered at at some point. Even if it is encrypted, the content of the communication will be peeped into. Eavesdropping on communications on the same end is not difficult, as long as data packets flowing over the network are collected. Packets can be collected through packet grabbing and sniffer tools.

Solution: encryption processing to prevent eavesdropping

The two most common encryption methods are communication encryption and content encryption.

Communication encryption

There is no encryption mechanism in HTTP protocol, but it can be used by a combination of SSL (Secure Socket Layer condom hierarchy) or TLS (Transport Layer Security secure transport layer protocol) to encrypt the communication content of HTTP. After a secure communication line is established with SSL, HTTP communication can be carried out on this line. The HTTP used in combination with SSL is called HTTPS (HTTP Secure Hypertext transfer Security Protocol) or HTTP over SSL.

Content encryption

Since there is no encryption mechanism in the HTTP protocol, the transmitted content itself can be encrypted. That is to encrypt the content contained in the HTTP message. In this case, the client needs to encrypt the HTTP message body (body) before sending the request. To encrypt the content, the premise is that both the client and the server have the mechanism of encryption and decryption. It is mainly used in Web server. This method is different from SSL and TLS to encrypt the entire communication line, so the content is still likely to be tampered with.

Failure to verify the identity of the communication player may encounter camouflage.

Neither the request nor the response of the HTTP protocol acknowledges the communicating party.

Anyone can make a request.

In HTTP protocol communication, because there is no processing step to determine the communicating party, anyone can initiate a request. As soon as the server receives a request, no matter who it is, it will return a response (only if the IP address and port number of the sender are not restricted by the Web server). That is to say, everything is open to us.

Could be a camouflaged server.

Could be a camouflaged client.

It is not possible to determine whether the party communicating has access. Because some Web servers hold important information, you only want to give specific users permission to communicate.

It is impossible to tell where the request came from and who made it.

Even meaningless requests are received. Unable to prevent Dos attacks (Denial of Service, deny server attacks) under a large number of requests.

Solution: find out the other party's certificate

Although it is not possible to determine the communicating party using the HTTP protocol, you can use SSL. In addition to encryption processing, SSL also uses a certificate means to identify the communicating party. Certificates are issued by trusted third parties to prove that the server and client are real.

Pass the certificate to prove that the communicating party is the expected server, which reduces the risk of personal information disclosure for individuals. In addition, the client can complete the confirmation of personal identity by holding the certificate, and it can also be used for the authentication of Web website.

Unable to prove the integrity of the message, may have been tampered with

The content received is incomplete.

There is no way to confirm that the request or response sent is the same as the request or response received. It is possible to tamper with other content midway, even if the content is really changed, the receiver will not know.

Solution: MD5 and SHA-1

You can encrypt the content using hash value verification methods such as MD5 and SHA-1, as well as the digital signature method (PGP signature) used to confirm the file. But there is no guarantee of correctness with these methods, because users will not know if MD5 and PGP themselves are modified.

HTTPS

HTTP plus encryption and authentication and integrity protection mechanism is HTTPS

HTTPS is a HTTP in a SSL shell.

HTTPS is not a new protocol in the application layer. It's just that the HTTP communication interface is partly replaced by SSL and TLS protocols. HTTP used to communicate with TCP, but after using SSL, HTTP communicates with SSL first, and then SSL and TCP.

HTTPS.png

With SSL, HTTP has the encryption, certificate, and full protection capabilities of HTTPS.

Encryption mode

SSL uses public key encryption. The encryption algorithm in the encryption method is public, and the key is confidential. In this way, the security of the encryption method can be maintained.

Keys are used for both encryption and decryption. If there is no key, there is no way to decrypt the password. Anyone who has a key can decrypt it. If the key is obtained by the attacker, then encryption is meaningless.

Symmetrical encryption

Encryption and decryption using the same key is called shared key encryption, also known as symmetric key encryption. In other words, the client and server share a common key to encrypt the message. When the client sends a request, it encrypts the message with a key. After the server receives it, it decrypts the message with the key.

Shortcoming

Although symmetric encryption ensures the confidentiality of messages, both the client and the server use the same key, if there is a middleman or attacker in the process of transmission. The key may fall into the hands of the attacker, which makes no sense in encrypting the message.

Asymmetric encryption

Asymmetric encryption solves the shortcomings of symmetric encryption. Asymmetric encryption uses a pair of asymmetric keys. One is called a private key and the other is called a public key. The private key can only be owned by yourself, while the public key can be obtained by anyone.

Before the client sends the message, it is encrypted with a public key, and after the server receives the message, it is decrypted with a private key.

Shortcoming

Asymmetric encryption needs to be encrypted with a public key when the sender sends a message. But the public key is available to anyone, as well as to the middleman. Although the middleman does not know what the receiver's private key is, he can intercept the sender's public key, generate another public key or tamper with the public key, and send the public key to the receiver. And asymmetric encryption is more complex than symmetric encryption, which leads to inefficiency.

Hybrid encryption mechanism

HTTPS uses a mixture of symmetric encryption and asymmetric encryption. The advantage of using symmetric encryption is that the decryption efficiency is fast, and the advantage of using asymmetric encryption is that it will not be cracked during the transmission of the message. Even if the data is intercepted and there is no corresponding private key, the message cannot be cracked.

Abstract algorithm

Digital abstract is to use Hash function to encrypt plaintext "abstract" into a fixed length (128bit) ciphertext, this series of ciphertext, also known as digital fingerprint, it has a fixed length, and different plaintext summary into ciphertext, the results are always different, and the same plaintext summary must be consistent. Digital digest is the fundamental reason why HTTPS can ensure data integrity and tamper-proof.

Digital signature

Digital signature is the application of asymmetric encryption and digital digest, which encrypts the summary information with the sender's private key and sends it to the receiver together with the original text. Only by using the sender's public key can the receiver decrypt the encrypted summary information, and then use the Hash function to generate a summary message to the original text, which is compared with the decrypted summary information. If it is the same, it means that the information received is complete. Otherwise, the information has been modified, so the digital signature can verify the integrity of the information.

A digital signature is a special encrypted check code attached to a message. There are two advantages to using digital signatures.

The signature determines that the message is concurrently signed by the sender because no one else can fake the sender's signature.

The signature determines the integrity of the message and proves that the data has not been tampered with.

The process of digital signature is as follows: plaintext\-> hash operation\-> Abstract\-> Private key encryption\-> Digital signature

Digital certificate

Digital certificates (CA) are like our ID cards, the information is unique. It belongs to some trusted third-party organizations. The certificate contains the following information.

CA, the issuer of the certificate

The validity period of the certificate

Public key

Certificate owner

Signature

The digital certificate also includes the public key of the object, the description of the object and the signature algorithm used. Everyone can create a digital certificate, but not everyone can get the right to sign, thus guaranteeing the certificate information and issuing the certificate with its private key.

The workflow of HTTPS

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

First, the client initiates a HTTPS request to the server.

The server returns the public key certificate to the client.

After receiving the public key certificate, the client verifies the digital signature with the public key of the certificate to confirm the authenticity of the public key of the server.

The client uses a random number generator to generate a temporary session key, then encrypts the session key with the server's public key and sends it to the server.

When the server receives it, it decrypts the session key with its own key.

Then HTTPS communication begins between the client and the server.

SSL and TSL

HTTPS uses two protocols, SSL (Secure Socket Layer condom hierarchy) and TLS (Transport Layer Security secure Transport layer Protocol). SSL was first advocated by Netscape, and then when Netscape got cold, it was transferred to IETF. IETF is based on SSL 3.0, followed by customization of TLS1.0, TLS1.1, and TLS1.2. TLS is a protocol developed on the basis of SSL. Sometimes the protocol is referred to as SSL.

Why not use HTTPS all the time

Everything has two sides, which does not mean that there is no problem with HTTPS security. In fact, it still has some problems. When using SSL, its processing speed slows down. There are two reasons, one is slow communication, the other is encrypted communication every time, which leads to the consumption of a lot of CPU and memory resources, resulting in slower processing speed.

In addition to connecting to TCP and sending requests and responses, you also need to communicate with SSL.

In addition, SSL needs to be encrypted, and both the server and the client have to carry out encryption and decryption operations.

To use HTTPS to communicate, it is necessary to purchase a certificate.

Of course, SSL acceleration (dedicated server) hardware can be used to improve efficiency. It can improve the computing speed of SSL and share the load. However, the effect of SSL accelerator can be exerted only when SSL is processed. For example, some non-sensitive information is communicated with HTTP, and sensitive information is communicated with HTTPS to save resources.

The difference between HTTP and HTTPS

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

HTTP is transmitted in clear text, while HTTPS is a secure SSL encrypted transmission protocol.

HTTP and HTTPS connect in two different ways and have different port numbers. The former is 80 and the latter is 443.

If you want to use HTTPS, you have to buy a certificate (CA), and free integers are generally very few, so you need to pay a certain fee.

HTTPS is more friendly to search engines and is good for SEO, giving priority to indexing HTTPS pages.

The connection of HTTP is simple and stateless. HTTPS is a network protocol constructed by SSL + HTTP protocol, which can carry out encrypted transmission and identity authentication, which is more secure than HTTP.

Disadvantages of SPDYHTTP 1.x to solve the bottleneck of HTTP 1.x

HTTP 1.x has the following main disadvantages:

1. HTTP 1.0 only allows one request to be sent on one TCP connection, and many TCP connections are allowed by default in HTTP 1.1. But in the same TCP connection, all data communication is carried out sequentially, and the server usually processes one response before moving on to the next. This leads to the problem of blocking at the head of the team.

two。 The request can only start from the client, and the client cannot receive instructions other than the response.

3. The request / response header is sent without compression. The more header information, the greater the delay.

4. Sending lengthy headers, each time sending the same headers to each other leads to a waste of resources.

5. You can choose the data compression format at will and send it without forced compression.

SPDY

SPDY is an application layer protocol based on the TCP protocol developed by Google. In order to optimize the performance of HTTP protocol, the goal is to shorten the loading time of web pages and improve security through compression, multiplexing and priority technology. The core idea of SPDY protocol is to minimize the number of TCP connections. SPDY is not a protocol to replace HTTP, but an enhancement to the HTTP protocol.

Instead of rewriting the HTTP protocol, SPDY operates in the form of a new session layer between the application layer and the transport layer of TCP/IP. At the same time, for security reasons, SPDY prescribes the use of SSL in communications.

SPDY.jpg

SPDY is added in the form of session layer to control the flow of data, but HTTP is still used to establish communication. Therefore, you can use HTTP's request method, Cookie, HTTP messages, and so on as usual.

HTTP 2.0

HTTP 2.0 can be said to be an upgraded version of SPDY (actually designed based on SPDY), but there are some differences between HTTP 2.0 and SPDY. There are two main points:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

HTTP 2.0 supports clear text transmission, while SPDY enforces the use of HTTP.

The compression algorithm for HTTP 2.0 headers uses HPACK, while SPDY uses DEFLATE.

Here is a brief introduction to the new features of HTTP 2.0. Because there are so many things designed for HTTP 2.0, I will talk about HTTP 2.0 separately in an article later.

Binary framing layer: the core of HTTP 2.0 performance enhancement is the new binary framing layer. HTTP 1.x uses newline characters as plain text separators, while HTTP 2.0 splits all transmitted information into smaller messages and frames and encodes them in binary format.

Multi-directional request and response: the binary framing layer in the center of HTTP 2.0, which breaks down HTTP messages into separate frames and interleaves them. They are then reassembled on the other side based on the stream identifier and header. The problem of blocking at the head of HTTP 1.x has been solved.

Request priority: after the HTTP message is decomposed into several separate frames, the performance can be further optimized by optimizing the interleaving and transmission order of those frames.

Server push: the server can send multiple responses to a client request. The server can also push resources to the client without an explicit request from the client.

Header compression: in HTTP 2.0, the HPACK (HTTP2 header Compression algorithm) compression format is used to encode the transmitted header, which reduces the size of the header. An index table is maintained at both ends to record the occurrence of the header, and then the recorded header name can be transmitted during the transmission process. After receiving the data, the peer can find the corresponding value through the key name.

If you want to know more about HTTP2.0, you can take a look at the authoritative Guide to Web performance, which is very detailed.

Supplementary OSI model

In addition to the TCP/IP model, the network architecture model also has the OSI model. The OSI model actually has three more layers.

OSI model. PNG

That is, SSL and SPDY are added to the above two protocols (both at the application layer).

The data link layer is subdivided into two layers:

Data link layer: provide reliable data transmission services on unreliable physical links. It includes framing, physical addressing, flow control, error control, access control and so on.

Physical layer: the main function is to connect network devices.

As mentioned earlier, HTTP is a stateless protocol that does not manage the status of previously sent requests and responses. Suppose that the user of the client sends a request, and the server wants to know which guy sent the request after receiving the request, then there must be a status to manage it. It is to solve this kind of problem that Cookie appears.

In the response header information returned from the server side, there is a Set-Cookie field information that tells the client to save the Cookie. The next time the client sends a request to the server, the client will automatically add the value of Cookie to the request header and send it out.

After receiving the Cookie sent by the client, the server will check which client sent the request from, then compare the records on the server, and finally get the previous status information.

Set-Cookie

Set-Cookie is a field that belongs to the response header and contains the following values.

NAME=VALUE: the name and value of Cookie.

Expires=DATE: the validity period of Cookie.

Path=PATH: the file directory on the server is used as the object of Cookie. If it is not set, the default is the file directory where the document is located.

Domain= domain name: the domain name that is applicable to Cookie. If it is not set, it defaults to the domain name of the server that created Cookie.

Secure: Cookie is sent only on HTTPS.

HttpOnly: JavaScript cannot access Cookie. The main purpose is to prevent Cookie from stealing information during cross-site scripting attacks.

The Cookie field in the request header

Cookie is a field in the request header that contains the value that the server coexists with the client through the Set-Cookie header setting. If you receive multiple Cookie, you can send it back in the form of multiple Cookie.

These are the basic knowledge of HTTP, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.