Analysis of http- protocol 07/19 Update SLTechnology News&Howtos

Analysis of http- protocol

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Introduction of http Protocol

Http:Hyper Text Transfer Protocol Hypertext transfer Protocol is one of the most widely used network protocols on the Internet, which is mainly used for Web services. The text information is processed by computer and the format is HTML (Hyper Text Mark Language) hypertext markup language.

Version of the http protocol

Http 0.9: transfer html documents only to users

Http 1.0

1. MIME (Multipurpose Internet Mail Extesions) mechanism is introduced: multi-purpose Internet mail extension. After introducing this technology, http can send multimedia (such as video, audio, etc.) messages. This mechanism allows http not only to support html format, but also to support other formats for sending.

two。 The keep-alive mechanism is introduced to support the function of persistent connections (but this keep-alive principle is formed by adding a field in the beginning, not natively).

3. Introduction of caching support

Http 1.1

Support for more request methods, more fine-grained cache control, and native support for persistent connections (presistent).

Http 2.0

HTTP semantically optimized transport is provided, and spdy: google introduces a technology that accelerates the interaction of http data, especially using the ssl acceleration mechanism, but spdy is not used much yet.

At present, the commonly used versions are http 1.0 and http 1.1.

Html text introduction

Html text schema

TITLE

ToGoogle

The way html documents are generated

Static state

Edited and defined in advance

Dynamic

Output the result in html format after compiling the program written in the language

Dynamic languages are: php,jsp,asp,.net

Note: these scripts must have corresponding interpreters, for example, php needs an php interpreter, etc.

Static and dynamic approach

Static state

1. The Web server registers socket with the kernel

2. The client initiates a request request to the Web server through the browser

3. The Web server receives the request information from the client

4. If the resource requested by the user is local to the server, the http service will apply for a call from the system kernel

5. The kernel calls the data on the local disk and sends the data to the http service

6. Http sends the resources requested by the user through the response message, and finally responds to the client

Dynamic

The difference is that if the user requests dynamic content, then the http service will call the back-end parser, and the dynamic language will process the user's request. If you need to request data, it will request a call from the kernel to obtain the user-specified data from the disk. Run through the interpreter, the running result will usually generate a file in html format. Then the response message is built and finally sent back to the client.

Http protocol

Http protocol message

There are many lines in the HTTP message, which are generally composed of ASCII code strings, and the length of each field is uncertain. HTTP messages can be divided into two types: request message and response message.

1.request Message (request message)

Client-→ server

The client sends a request to the server, and different websites are used to request different resources (html documents)

2.response Message (response message)

Server-→ client

It is the server that responds to the client's request

Introduction to the format of request message

Request line + request header + blank line + request entity

For example:

# this must be a blank line

1. Request line

It is composed of request method field + request URL field + HTTP protocol version, which is used to identify the request method, the requested resource, and the requested protocol version, which are directly separated by "spaces"!

What is the method of this request, that is, the method of request

Which resource is requested and which URL. Can be a relative path, such as / p_w_picpaths/log.jpg, or an absolute path, such as http://www.baidu.com/p_w_picpaths.banner.jpg

What is the requested protocol version, http protocol version, format HTTP/., for example: HTTP/1.0,HTTP/1.1

The image above shows the display result of capturing the http request message with the wireshark tool. The "\ r\ n" after the first part indicates a carriage return and a line feed to separate the head from the next one.

Or use curl command to get http request message

two。 Request header

It is composed of keyword + keyword value, separated by ":" and formatted as Name:Value. The function of the request header is to inform the server side of the request content through the client, and there can be more than one header.

The first, the first may be more than one. All kinds of first information that can be used.

3. Blank line

There will be a blank line after the request header, which is used to inform the server that the request header information will no longer appear in the following content by sending carriage return characters and newline characters.

4. Request entity

What exactly do you need to ask for?

Request entity, what exactly is the content of your request?

Introduction to the format of response message

Start line + response header + blank line + response entity

For example:

# this must be a blank line

1. Start line

Also known as the status line, it is used by the server to respond to client requests. It consists of version number, status code and reason phrase, such as "HTTP/1.1 200 OK"

The server needs to respond to what version is requested by the client when responding.

What is the status code of the request? 202403, etc.

What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information

A lot of response heads.

Response body

two。 Response header

The server needs to respond to what version is requested by the client when responding.

What is the status code of the request? 202403, etc.

What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information

A lot of response heads.

Response body

Similar to a request message, there are usually several header fields after the starting line. Each header field contains a name and a value, separated by a colon. Format Name:Value.

For example:

Content-Type: test/html; charset=utf-8

Content-Length: 78

3. Blank line

The last response to the first message is followed by a blank line, which notifies the client that there is no header under the blank line by sending carriage return and newline characters.

4. Response entity

The response entity is loaded with data to be returned to the client. The data can be text or binary (for example, pictures, videos)

The server needs to respond to what version is requested by the client when responding.

What is the status code of the request? 202403, etc.

What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information

A lot of response heads.

Response body

HTTP request method

In the process of HTTP communication, each HTTP request message contains a HTTP request method, which is used to tell the client to request to the server to perform some specific operations. Here are several commonly used HTTP request methods.

HTTP request method

Description

GET

Used by the client to request specified resource information and return the specified resource entity

HEAD

Similar to GET, but it does not need the server to respond to the requested resources, but returns the response header (just need to respond to the header, that is, tell me whether I have it or not, and do not need the cache interface to give it to me)

POST

Submit data to the server based on HTML forms, and the server usually needs to store this data, usually in a relational database such as mysql

PUT

In contrast to GET, resources are sent to the server, which usually needs to store this resource (usually in a file system)

DELETE

Request the server to delete the resource specified by URL

MOVE

Request the server to move the specified page to another network address

OPTIONS

Probe the request methods supported by the server for the requested URL

TRACE

A proxy server, firewall, or gateway experienced in the middle of a request.

The commonly used HTTP request methods are GET, POST, HEAD

Status code of HTTP

Status code

Description

1XX

Informational status code, which is used to specify certain actions corresponding to the client

2XX

Success status code, I request a resource, this resource is in, which means that the request is successful.

3XX

The redirected status code sometimes returns a new address, not the result

4XX

Client class error, the resource you requested does not exist, or when you request, we deny you access to this resource, you do not have permission

5XX

Error message for the server class. When a request is made to the server, the server finds that a script needs to be run to invoke the parsing library. This can happen if something goes wrong during the call. Or there are syntax errors in your script, which may also cause this problem.

Description of common status codes

Status code

Description

two hundred

The server successfully returned a web page, which is the standard status code returned by a successful HTTP request

two hundred and one

Display after CREATED uploads the file successfully

three hundred and one

Move Permanently, a permanent redirect, will return a new address and tell us that the address you requested will be permanently moved to that new address

three hundred and two

Fonud, temporarily redirected, temporarily placed somewhere, will use "Location: new location" in the response message.

three hundred and four

Not Modified, the resource has not been modified.

four hundred and three

Forbidden request denied

four hundred and four

The resource requested by Not Found does not exist

four hundred and five

Method Not Allowed the method you are using is not allowed, not supported

five hundred

Internal Server Error: server internal error

five hundred and two

Bad Gateway, the proxy server receives a pseudo response from the upstream server; the upper server returns an incomprehensible message, so the proxy server indicates an error.

five hundred and three

Service Unavailable, the service is temporarily unavailable

Introduction to HTTP

Universal first part

Request header

Response header

Entity header: specifically used to represent the internal type, length, coding format, etc., of resources in an entity.

Extension header: non-standard header, which can be created by programmers

Universal first part

Connection: define options related to request and response between Cramp S

In http1.0, if he wants to use persistent connections, the option he sets is

Connection:keep-alive

Cache-Control: cache control to achieve finer cache control. It is more common on http 1.1.

Request header

Client-IP: client IP address

Host: the requested host, which is useful when implementing a virtual host based on hostname

Referer: indicates the URL that requests the original resource of the current resource. Hotlink protection can be achieved by using referer.

User-Agent: a user agent, generally a browser

Accept header: which types of encodings can be accepted by the client

§Accept: the type of media that the server can send

§Accetp-Charset: character set received

§Accept-Encoding: encoding format

§Accept-Lanage: acceptable language coding format

Conditional request header: (only used in http1.1)

When sending a request, ask the other party first whether the condition is met. If the condition is met, the request will be made, and if not, no request will be made.

Security-related requests:

§Authorization

§Cookie

Response header

Age: how long can you use after the resource response is given to you?

Server: explain to the client the name and version of the program you are using

The first part of the negotiation class:

§Vary: the first list, according to which the server will select the most suitable version and send it to the client

Related to security:

§WWW-Authentication

§Set-Cookie

Entity header

Location: indicates the new location of the resource, which is usually used when implementing the 302 response code

Allow: the request method allowed for this resource

The first part of the content

§Content-Encoding

§Content-Language

§Content-Length

§Content-Location: where the content is located

§Content-Type

Cache related:

§ETag: extension tags / tags

§Expires: expiration time

§Last-Modified: last modified time

ETag explained:

On the network, there are some cache servers. In addition, the browser itself has a cache function.

Based on a premise: the picture will not be changed frequently, the server will return the signature Etag of the picture when it returns the status code 200.When the browser visits the picture again, it will go to the server to verify the fingerprint information. if the picture does not change, use the picture in the cache directly, so as to reduce the burden on the server. It saves time for pictures to be transmitted on the network by fetching pictures from the local cache.

Attached:

The most common request headers for HTTP are as follows:

Accept: MIME types acceptable to browsers

Accept-Charset: a character set acceptable to browsers

Accept-Encoding: the way in which browsers can decode data, such as gzip.

Accept-Language: the kind of language the browser wants

Authorization: authorization information, usually in the reply to the WWW-Authenticate header sent by the server

Connection: indicates whether a persistent connection is required. With a value of "Keep-Alive", or seeing that the request uses HTTP 1.1 (HTTP 1.1 defaults to persistent connections), it can take advantage of persistent connections, significantly reducing the time it takes to download when the page contains multiple elements (such as Applet, images).

Content-Length: indicates the length of the body of the request message

Cookie: this is one of the most important request header information

Cookie-related HTTP extension header

1) Cookie: client returns the Cookie set by the server to the server

2) Set-Cookie: the server sets Cookie to the client

3) Cookie2 (RFC2965): the client indicates that the server supports the version of Cookie

4) Set-Cookie2 (RFC2965): the server sets Cookie to the client.

The process of Cookie

The server sends the contents of the Cookie back to the client with the Set-Cookie header in the response message, and the client carries the same content in the Cookie header in the new request

Host: host and port in the initial URL

If-Modified-Since: returns the requested content only if it has been modified after the specified date, otherwise a 304" Not Modified "reply is returned.

Referer: contains a URL from which the user accesses the currently requested page from the page represented by the URL.

User-Agent: browser typ

The most common response header of HTTP

The most common response headers for HTTP are as follows:

Allow: which request methods are supported by the server (such as GET, POST, etc.)

Content-Encoding: the Encode method of the document.

Content-Length: indicates the length of the content. This data is needed only if the browser uses a persistent HTTP connection.

Content-Type: the table shows what MIME type the following document belongs to.

Accept-Ranges: bytes this response header indicates that the server supports Range requests and that the units supported by the server are bytes (which is the only available unit). We can also know that the server supports breakpoint continuation and multiple parts of the file can be downloaded at the same time, that is, the download tool can use the scope request to speed up the download of the file. the Accept-Ranges: none response header indicates that the server does not support scope requests.

Date: the current GMT time.

Expires: indicates when a document should be considered out of date so that it is no longer cached.

Last-Modified: the last time the document was changed.

Location: indicates where the customer should extract the document.

Refresh: indicates how long the browser should refresh the document, in seconds.

The most common entity header of HTTP

Entity header is used as meta-information of entity content, describing the attributes of entity content, including entity information type, length, compression method, last modification time, data validity and so on.

Allow:GET,POST

Content-Encoding: the Encode method of the document, for example: gzip

Content-Language: the language type of the content, for example: zh-cn

Content-Length: indicates the content length, eg:80. For more information, please see "2.5 response headers".

Content-Location: indicates where the customer should extract the document, for example: http://www.dfdf.org/dfdf.html

A MD5 summary of a Content-MD5:MD5 entity that is used as a checksum. Both the sender and the receiver calculate the MD5 digest, and the recipient compares the calculated value with the value passed in this header.

Content-Type: indicates the MIME type of the entity that is sent or received. Eg:text/html; charset=GB2312 main type / subtype

Transactions of HTTP

Contains a HTTP request, and the response to the corresponding request is called a http transaction, and it can also be understood that a http transaction is a complete process of HTTP request and HTTP response.

Http protocol by default, each transaction will open and close a new connection, so it will be quite time-consuming and bandwidth-consuming. Due to the slow start feature of TCP, the performance of each new connection will be degraded, so there is a limit to the number of parallel connections that can be opened. So using persistent connections is a little better than not using persistent connections by default, and its benefit is that it takes less time to request and disconnect from tcp.

HTTP resources

Resources are the content that users can request and obtain from the server through the browser or user agent through the HTTP protocol, such as html documents, a picture and so on.

Resource type: is tagged through MIME

Format: major/minor primary and secondary tags

Commonly used MIME types

MIME Typ

File type

Test/html

Html, htm text types

Text/plain

Text text type

P_w_picpath/jpeg

Jpeg image type

P_w_picpath/gif

Gif image type

Vedio/mpeg4

Audio tag Typ

Application/vnd.ms-powerpoint

The marking mode of dynamic resources

URI and URL

URI (Uniform Resource Identifier) same resource identifier

A string used to identify the name of an Internet resource that allows your users to interoperate with the resource through a specific protocol. Every resource available on the Web, including HTML documents, images, video clips, programs, and so on, is located by a common resource identifier. So we can use URI to identify the name of each resource

URL (Uniform Resource Locator) (uniform resource locator)

Used to describe the specific location of a resource on a particular server.

For example: http://www.baidu.com:80/download/bash-4.3.1-1.rpm

The format of URL is divided into three parts

I. scheme (scheme) (also known as protocol): http://

ii. Internet address: generally, this address refers to the server: www.baidu.com:8080

iii. Resources on a specific server: download/bash-4.3.1-1.rpm

CGI

Common Gateway Interface Universal Gateway Interface web server finds that the script needs to be executed, so it deals with the back-end application program through CGI protocol, and dynamically delivers the user's request to the server. The server's result is returned to the http server through the CGI protocol.

Other knowledge that needs to be known

The specific process of a Web resource request

1. The client enters the address that needs to be accessed in the Web browser

2. The Web browser will request the DNS server to query the address resolved to the specified domain name and Web server.

3. The client establishes a connection with the requested Web server (TCP three-way handshake)

4. After the TCP is successfully established, initiate a HTTP request

5. After the server receives the client HTTP request, it will process the request.

6. Processing the resources specified by the client

7. The server builds a response message and responds to the client

8. The server records this information in the log

How to receive multiple user requests concurrently by http

Because http works under the blocking model by default, it only receives one request at a time, and then receives the next request after processing the request, so it can only be done one by one.

So we want to respond to user requests concurrently and need a multi-process model. The web server itself generates multiple child processes to respond to user requests, that is, when a user request is sent to the Web server, the web main process does not respond directly to the user's request, but generates a child process to respond to the user's request, so that after the child process establishes a connection with the user. The main process of Web waits for another user's request, and when the second user's request comes, it regenerates into a child process to respond to the second user's request. and so on. So each user request is processed by a child process.

Extended knowledge point 1: using wireshark to analyze HTTP protocol

Experimental procedure

Clear the cache

Before doing the trace, we first empty the Web browser's cache to ensure that the Web page is obtained from the network, not from the cache. After that, clear the DNS cache on the client side to ensure that the mapping of the Web server domain name to IP address is requested from the network.

Start wireshare

Begin to capture

1) Select capture-options from the menu, select Network, and open start. As shown below:

2) enter www.baidu.com in the browser address bar, then end the capture, and get the following result:

3) Select HTTP in the filter and click apply to get the following result:

IV. Analysis of data

Select the group in which "GET/HTTP/1.1" is located in the protocol box and you will see this basic request line followed by a series of additional request headers. The "\ r\ n" after the first part indicates a carriage return and a line feed to separate the head from the next one.

The first "Host" is required in the HTTP1.1 version, which describes the domain name of the machine in URL, which in this test is www.baidu.com. This allows a Web server to support many different domain names at the same time. With this header, the Web server can tell which Web server the customer is trying to connect to and respond to different content for each customer.

The first part of User-Agent describes the Web browser and client machine that made the request.

Then there is a series of Accpet headers, including Accept (acceptance), Accept-Language (acceptance language), Accept-Encoding (acceptance coding), and Accept-Charset (acceptance character set). They tell the Web server customer the type of data that the Web browser is going to process. Web servers can transform data into different languages and formats. These premiums show the customer's abilities and preferences.

The Keep-Alive and Connection headers describe the information about the TCP connection over which HTTP requests and responses are sent. It indicates whether the connection remains active and for how long after the request is sent. Most HTTP1.1 connections are persistent, meaning that the TCP connection is not closed after each request, but is maintained to accept multiple requests from the same server.

Now that we have looked at the request sent by the Web browser, let's look at the response from the Web server. The response first sends "HTTP/1.1 200ok", indicating that it starts using the HTTP1.1 version to send web pages. Similarly, in the response packet, it is followed by some headers. Finally, the actual data requested is sent. The first Cache-control header describes whether a copy of the data is stored or cached for future reference. In general, an individual's Web browser will cache some recently visited web pages on the local computer, and then when the same page is visited again, if the web page is still stored in the cache, it will no longer request data from the server. In a HTTP request, the Web server lists the content type and acceptable content encoding. In this example, the type of content that the Web server chooses to send is text/html

Expand the knowledge point 2:curl to view HTTP response header information

First, let's see that the client (browser) requests data from the server through the following basic steps:

1. The user initiates a http request, and the cache fetches the URL, depending on the URL to find out if there is a matching copy, which may be in memory or on the local disk.

2. If the request hits the local cache, get a "copy" of the corresponding resource from the local cache.

3. Check whether the "copy" expires, otherwise return directly, and if so, continue to forward the request to the server. In HTTP, the expiration time of the document is specified by the Cache-Control header and the Expires header. By judging the expiration time, the cache can know whether the document is within the expiration date. Both the Expires header and the Cache-Control:max-age header are used to tell the cache whether the document has expired, so why do you need two response headers to do this simple thing? In fact, all this is for historical reasons. Expires was first mentioned in HTTP 1.0 because it uses absolute dates, and if the server and client clocks are out of sync (which is actually very common), the cache may think that the document has passed its expiration date.

4. The server receives the request and determines whether the resource has changed. If so, the new content will be returned. Otherwise, 304 will be returned and the expiration time will be updated.

Information of HTTP response header

(1)。 HTTP error code:

1xx:client 's request has been received by server and is being processed

2xx: a successful client request has been received, understood and processed by the server

3xx:client requests are redirected to other server [other URL]

4xx: indicates that the client request is incorrect and cannot be recognized by server

The service on 5xx:server is abnormal.

(2)。 Cache-Control:

Caching settings for web sites: Cache-Control specifies the caching mechanism that requests and responses follow

Cache classification

1) Private caching: common is the built-in cache in our browsers.

2) Public caching: proxy caching is common

First, look at the optional parameters of Cache-Control: private, public, no-cache, max-age, must-revalidate, etc.

No-cache: the response is not cached, but requests resources from the server in real time

No-store: under any conditions, the response will not be cached and will not be written to the client's disk, which is only used by some sensitive responses for security reasons.

Private: indicates that all or part of a response message for a single user cannot be processed by a shared cache. This allows the server to describe only part of the response message for the current user, which is not valid for requests from other users. Can no longer be shared among users.

Public: responses are cached and shared among multiple users. Normally, if HTTP authentication is required, the response is automatically set to private.

Max-age: indicates that the client can receive responses whose lifetime is not longer than the specified time (in seconds). For example, Cache-control: max-age=5 indicates that you will not go to the server if you visit this page again within 5 seconds.

Must-revalidate: the response is reused under certain conditions to satisfy the next request, but it must go to the server to verify that it is still up-to-date (forcing all caches to validate the response).

Proxy-revalidate: similar to must-revalidate, it requires verification of public caches

(3)。 Connection:

Whether server supports persistent connections; if keep-alive indicates that web's server supports persistent connections.

But TCP's persistent connections are bidirectional; both client and server must support persistent connections before a persistent connection can be established.

Generally speaking, client [browsers] support persistent connections by default, so as long as the client supports persistent connections, you can establish persistent connections.

Through the-w parameter of curl, we can customize the output of curl.% {http_code} represents the http status code.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.