In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
Introduction of http Protocol
Http:Hyper Text Transfer Protocol Hypertext transfer Protocol is one of the most widely used network protocols on the Internet, which is mainly used for Web services. The text information is processed by computer and the format is HTML (Hyper Text Mark Language) hypertext markup language.
Version of the http protocol
Http 0.9: transfer html documents only to users
Http 1.0
1. MIME (Multipurpose Internet Mail Extesions) mechanism is introduced: multi-purpose Internet mail extension. After introducing this technology, http can send multimedia (such as video, audio, etc.) messages. This mechanism allows http not only to support html format, but also to support other formats for sending.
two。 The keep-alive mechanism is introduced to support the function of persistent connections (but this keep-alive principle is formed by adding a field in the beginning, not natively).
3. Introduction of caching support
Http 1.1
Support for more request methods, more fine-grained cache control, and native support for persistent connections (presistent).
Http 2.0
HTTP semantically optimized transport is provided, and spdy: google introduces a technology that accelerates the interaction of http data, especially using the ssl acceleration mechanism, but spdy is not used much yet.
At present, the commonly used versions are http 1.0 and http 1.1.
Html text introduction
Html text schema
TITLE
H1
H2
ToGoogle
The way html documents are generated
Static state
Edited and defined in advance
Dynamic
Output the result in html format after compiling the program written in the language
Dynamic languages are: php,jsp,asp,.net
Note: these scripts must have corresponding interpreters, for example, php needs an php interpreter, etc.
Static and dynamic approach
Static state
1. The Web server registers socket with the kernel
2. The client initiates a request request to the Web server through the browser
3. The Web server receives the request information from the client
4. If the resource requested by the user is local to the server, the http service will apply for a call from the system kernel
5. The kernel calls the data on the local disk and sends the data to the http service
6. Http sends the resources requested by the user through the response message, and finally responds to the client
Dynamic
The difference is that if the user requests dynamic content, then the http service will call the back-end parser, and the dynamic language will process the user's request. If you need to request data, it will request a call from the kernel to obtain the user-specified data from the disk. Run through the interpreter, the running result will usually generate a file in html format. Then the response message is built and finally sent back to the client.
Http protocol
Http protocol message
There are many lines in the HTTP message, which are generally composed of ASCII code strings, and the length of each field is uncertain. HTTP messages can be divided into two types: request message and response message.
1.request Message (request message)
Client-→ server
The client sends a request to the server, and different websites are used to request different resources (html documents)
2.response Message (response message)
Server-→ client
It is the server that responds to the client's request
Introduction to the format of request message
Request line + request header + blank line + request entity
For example:
# this must be a blank line
1. Request line
It is composed of request method field + request URL field + HTTP protocol version, which is used to identify the request method, the requested resource, and the requested protocol version, which are directly separated by "spaces"!
What is the method of this request, that is, the method of request
Which resource is requested and which URL. Can be a relative path, such as / p_w_picpaths/log.jpg, or an absolute path, such as http://www.baidu.com/p_w_picpaths.banner.jpg
What is the requested protocol version, http protocol version, format HTTP/., for example: HTTP/1.0,HTTP/1.1
The image above shows the display result of capturing the http request message with the wireshark tool. The "\ r\ n" after the first part indicates a carriage return and a line feed to separate the head from the next one.
Or use curl command to get http request message
two。 Request header
It is composed of keyword + keyword value, separated by ":" and formatted as Name:Value. The function of the request header is to inform the server side of the request content through the client, and there can be more than one header.
The first, the first may be more than one. All kinds of first information that can be used.
3. Blank line
There will be a blank line after the request header, which is used to inform the server that the request header information will no longer appear in the following content by sending carriage return characters and newline characters.
4. Request entity
What exactly do you need to ask for?
Request entity, what exactly is the content of your request?
Introduction to the format of response message
Start line + response header + blank line + response entity
For example:
# this must be a blank line
1. Start line
Also known as the status line, it is used by the server to respond to client requests. It consists of version number, status code and reason phrase, such as "HTTP/1.1 200 OK"
The server needs to respond to what version is requested by the client when responding.
What is the status code of the request? 202403, etc.
What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information
A lot of response heads.
Response body
two。 Response header
The server needs to respond to what version is requested by the client when responding.
What is the status code of the request? 202403, etc.
What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information
A lot of response heads.
Response body
Similar to a request message, there are usually several header fields after the starting line. Each header field contains a name and a value, separated by a colon. Format Name:Value.
For example:
Content-Type: test/html; charset=utf-8
Content-Length: 78
3. Blank line
The last response to the first message is followed by a blank line, which notifies the client that there is no header under the blank line by sending carriage return and newline characters.
4. Response entity
The response entity is loaded with data to be returned to the client. The data can be text or binary (for example, pictures, videos)
The server needs to respond to what version is requested by the client when responding.
What is the status code of the request? 202403, etc.
What is the information of the status code of the response, the reason phrase, the meaning of the response of the status code, readable information
A lot of response heads.
Response body
HTTP request method
In the process of HTTP communication, each HTTP request message contains a HTTP request method, which is used to tell the client to request to the server to perform some specific operations. Here are several commonly used HTTP request methods.
HTTP request method
Description
GET
Used by the client to request specified resource information and return the specified resource entity
HEAD
Similar to GET, but it does not need the server to respond to the requested resources, but returns the response header (just need to respond to the header, that is, tell me whether I have it or not, and do not need the cache interface to give it to me)
POST
Submit data to the server based on HTML forms, and the server usually needs to store this data, usually in a relational database such as mysql
PUT
In contrast to GET, resources are sent to the server, which usually needs to store this resource (usually in a file system)
DELETE
Request the server to delete the resource specified by URL
MOVE
Request the server to move the specified page to another network address
OPTIONS
Probe the request methods supported by the server for the requested URL
TRACE
A proxy server, firewall, or gateway experienced in the middle of a request.
The commonly used HTTP request methods are GET, POST, HEAD
Status code of HTTP
Status code
Description
1XX
Informational status code, which is used to specify certain actions corresponding to the client
2XX
Success status code, I request a resource, this resource is in, which means that the request is successful.
3XX
The redirected status code sometimes returns a new address, not the result
4XX
Client class error, the resource you requested does not exist, or when you request, we deny you access to this resource, you do not have permission
5XX
Error message for the server class. When a request is made to the server, the server finds that a script needs to be run to invoke the parsing library. This can happen if something goes wrong during the call. Or there are syntax errors in your script, which may also cause this problem.
Description of common status codes
Status code
Description
two hundred
The server successfully returned a web page, which is the standard status code returned by a successful HTTP request
two hundred and one
Display after CREATED uploads the file successfully
three hundred and one
Move Permanently, a permanent redirect, will return a new address and tell us that the address you requested will be permanently moved to that new address
three hundred and two
Fonud, temporarily redirected, temporarily placed somewhere, will use "Location: new location" in the response message.
three hundred and four
Not Modified, the resource has not been modified.
four hundred and three
Forbidden request denied
four hundred and four
The resource requested by Not Found does not exist
four hundred and five
Method Not Allowed the method you are using is not allowed, not supported
five hundred
Internal Server Error: server internal error
five hundred and two
Bad Gateway, the proxy server receives a pseudo response from the upstream server; the upper server returns an incomprehensible message, so the proxy server indicates an error.
five hundred and three
Service Unavailable, the service is temporarily unavailable
Introduction to HTTP
Universal first part
Request header
Response header
Entity header: specifically used to represent the internal type, length, coding format, etc., of resources in an entity.
Extension header: non-standard header, which can be created by programmers
Universal first part
Connection: define options related to request and response between Cramp S
In http1.0, if he wants to use persistent connections, the option he sets is
Connection:keep-alive
Cache-Control: cache control to achieve finer cache control. It is more common on http 1.1.
Request header
Client-IP: client IP address
Host: the requested host, which is useful when implementing a virtual host based on hostname
Referer: indicates the URL that requests the original resource of the current resource. Hotlink protection can be achieved by using referer.
User-Agent: a user agent, generally a browser
Accept header: which types of encodings can be accepted by the client
§Accept: the type of media that the server can send
§Accetp-Charset: character set received
§Accept-Encoding: encoding format
§Accept-Lanage: acceptable language coding format
Conditional request header: (only used in http1.1)
When sending a request, ask the other party first whether the condition is met. If the condition is met, the request will be made, and if not, no request will be made.
Security-related requests:
§Authorization
§Cookie
Response header
Age: how long can you use after the resource response is given to you?
Server: explain to the client the name and version of the program you are using
The first part of the negotiation class:
§Vary: the first list, according to which the server will select the most suitable version and send it to the client
Related to security:
§WWW-Authentication
§Set-Cookie
Entity header
Location: indicates the new location of the resource, which is usually used when implementing the 302 response code
Allow: the request method allowed for this resource
The first part of the content
§Content-Encoding
§Content-Language
§Content-Length
§Content-Location: where the content is located
§Content-Type
Cache related:
§ETag: extension tags / tags
§Expires: expiration time
§Last-Modified: last modified time
ETag explained:
On the network, there are some cache servers. In addition, the browser itself has a cache function.
Based on a premise: the picture will not be changed frequently, the server will return the signature Etag of the picture when it returns the status code 200.When the browser visits the picture again, it will go to the server to verify the fingerprint information. if the picture does not change, use the picture in the cache directly, so as to reduce the burden on the server. It saves time for pictures to be transmitted on the network by fetching pictures from the local cache.
Attached:
The most common request headers for HTTP are as follows:
Accept: MIME types acceptable to browsers
Accept-Charset: a character set acceptable to browsers
Accept-Encoding: the way in which browsers can decode data, such as gzip.
Accept-Language: the kind of language the browser wants
Authorization: authorization information, usually in the reply to the WWW-Authenticate header sent by the server
Connection: indicates whether a persistent connection is required. With a value of "Keep-Alive", or seeing that the request uses HTTP 1.1 (HTTP 1.1 defaults to persistent connections), it can take advantage of persistent connections, significantly reducing the time it takes to download when the page contains multiple elements (such as Applet, images).
Content-Length: indicates the length of the body of the request message
Cookie: this is one of the most important request header information
Cookie-related HTTP extension header
1) Cookie: client returns the Cookie set by the server to the server
2) Set-Cookie: the server sets Cookie to the client
3) Cookie2 (RFC2965): the client indicates that the server supports the version of Cookie
4) Set-Cookie2 (RFC2965): the server sets Cookie to the client.
The process of Cookie
The server sends the contents of the Cookie back to the client with the Set-Cookie header in the response message, and the client carries the same content in the Cookie header in the new request
Host: host and port in the initial URL
If-Modified-Since: returns the requested content only if it has been modified after the specified date, otherwise a 304" Not Modified "reply is returned.
Referer: contains a URL from which the user accesses the currently requested page from the page represented by the URL.
User-Agent: browser typ
The most common response header of HTTP
The most common response headers for HTTP are as follows:
Allow: which request methods are supported by the server (such as GET, POST, etc.)
Content-Encoding: the Encode method of the document.
Content-Length: indicates the length of the content. This data is needed only if the browser uses a persistent HTTP connection.
Content-Type: the table shows what MIME type the following document belongs to.
Accept-Ranges: bytes this response header indicates that the server supports Range requests and that the units supported by the server are bytes (which is the only available unit). We can also know that the server supports breakpoint continuation and multiple parts of the file can be downloaded at the same time, that is, the download tool can use the scope request to speed up the download of the file. the Accept-Ranges: none response header indicates that the server does not support scope requests.
Date: the current GMT time.
Expires: indicates when a document should be considered out of date so that it is no longer cached.
Last-Modified: the last time the document was changed.
Location: indicates where the customer should extract the document.
Refresh: indicates how long the browser should refresh the document, in seconds.
The most common entity header of HTTP
Entity header is used as meta-information of entity content, describing the attributes of entity content, including entity information type, length, compression method, last modification time, data validity and so on.
Allow:GET,POST
Content-Encoding: the Encode method of the document, for example: gzip
Content-Language: the language type of the content, for example: zh-cn
Content-Length: indicates the content length, eg:80. For more information, please see "2.5 response headers".
Content-Location: indicates where the customer should extract the document, for example: http://www.dfdf.org/dfdf.html
A MD5 summary of a Content-MD5:MD5 entity that is used as a checksum. Both the sender and the receiver calculate the MD5 digest, and the recipient compares the calculated value with the value passed in this header.
Content-Type: indicates the MIME type of the entity that is sent or received. Eg:text/html; charset=GB2312 main type / subtype
Transactions of HTTP
Contains a HTTP request, and the response to the corresponding request is called a http transaction, and it can also be understood that a http transaction is a complete process of HTTP request and HTTP response.
Http protocol by default, each transaction will open and close a new connection, so it will be quite time-consuming and bandwidth-consuming. Due to the slow start feature of TCP, the performance of each new connection will be degraded, so there is a limit to the number of parallel connections that can be opened. So using persistent connections is a little better than not using persistent connections by default, and its benefit is that it takes less time to request and disconnect from tcp.
HTTP resources
Resources are the content that users can request and obtain from the server through the browser or user agent through the HTTP protocol, such as html documents, a picture and so on.
Resource type: is tagged through MIME
Format: major/minor primary and secondary tags
Commonly used MIME types
MIME Typ
File type
Test/html
Html, htm text types
Text/plain
Text text type
P_w_picpath/jpeg
Jpeg image type
P_w_picpath/gif
Gif image type
Vedio/mpeg4
Audio tag Typ
Application/vnd.ms-powerpoint
The marking mode of dynamic resources
URI and URL
URI (Uniform Resource Identifier) same resource identifier
A string used to identify the name of an Internet resource that allows your users to interoperate with the resource through a specific protocol. Every resource available on the Web, including HTML documents, images, video clips, programs, and so on, is located by a common resource identifier. So we can use URI to identify the name of each resource
URL (Uniform Resource Locator) (uniform resource locator)
Used to describe the specific location of a resource on a particular server.
For example: http://www.baidu.com:80/download/bash-4.3.1-1.rpm
The format of URL is divided into three parts
I. scheme (scheme) (also known as protocol): http://
ii. Internet address: generally, this address refers to the server: www.baidu.com:8080
iii. Resources on a specific server: download/bash-4.3.1-1.rpm
CGI
Common Gateway Interface Universal Gateway Interface web server finds that the script needs to be executed, so it deals with the back-end application program through CGI protocol, and dynamically delivers the user's request to the server. The server's result is returned to the http server through the CGI protocol.
Other knowledge that needs to be known
The specific process of a Web resource request
1. The client enters the address that needs to be accessed in the Web browser
2. The Web browser will request the DNS server to query the address resolved to the specified domain name and Web server.
3. The client establishes a connection with the requested Web server (TCP three-way handshake)
4. After the TCP is successfully established, initiate a HTTP request
5. After the server receives the client HTTP request, it will process the request.
6. Processing the resources specified by the client
7. The server builds a response message and responds to the client
8. The server records this information in the log
How to receive multiple user requests concurrently by http
Because http works under the blocking model by default, it only receives one request at a time, and then receives the next request after processing the request, so it can only be done one by one.
So we want to respond to user requests concurrently and need a multi-process model. The web server itself generates multiple child processes to respond to user requests, that is, when a user request is sent to the Web server, the web main process does not respond directly to the user's request, but generates a child process to respond to the user's request, so that after the child process establishes a connection with the user. The main process of Web waits for another user's request, and when the second user's request comes, it regenerates into a child process to respond to the second user's request. and so on. So each user request is processed by a child process.
Extended knowledge point 1: using wireshark to analyze HTTP protocol
Experimental procedure
Clear the cache
Before doing the trace, we first empty the Web browser's cache to ensure that the Web page is obtained from the network, not from the cache. After that, clear the DNS cache on the client side to ensure that the mapping of the Web server domain name to IP address is requested from the network.
Start wireshare
Begin to capture
1) Select capture-options from the menu, select Network, and open start. As shown below:
2) enter www.baidu.com in the browser address bar, then end the capture, and get the following result:
3) Select HTTP in the filter and click apply to get the following result:
IV. Analysis of data
Select the group in which "GET/HTTP/1.1" is located in the protocol box and you will see this basic request line followed by a series of additional request headers. The "\ r\ n" after the first part indicates a carriage return and a line feed to separate the head from the next one.
The first "Host" is required in the HTTP1.1 version, which describes the domain name of the machine in URL, which in this test is www.baidu.com. This allows a Web server to support many different domain names at the same time. With this header, the Web server can tell which Web server the customer is trying to connect to and respond to different content for each customer.
The first part of User-Agent describes the Web browser and client machine that made the request.
Then there is a series of Accpet headers, including Accept (acceptance), Accept-Language (acceptance language), Accept-Encoding (acceptance coding), and Accept-Charset (acceptance character set). They tell the Web server customer the type of data that the Web browser is going to process. Web servers can transform data into different languages and formats. These premiums show the customer's abilities and preferences.
The Keep-Alive and Connection headers describe the information about the TCP connection over which HTTP requests and responses are sent. It indicates whether the connection remains active and for how long after the request is sent. Most HTTP1.1 connections are persistent, meaning that the TCP connection is not closed after each request, but is maintained to accept multiple requests from the same server.
Now that we have looked at the request sent by the Web browser, let's look at the response from the Web server. The response first sends "HTTP/1.1 200ok", indicating that it starts using the HTTP1.1 version to send web pages. Similarly, in the response packet, it is followed by some headers. Finally, the actual data requested is sent. The first Cache-control header describes whether a copy of the data is stored or cached for future reference. In general, an individual's Web browser will cache some recently visited web pages on the local computer, and then when the same page is visited again, if the web page is still stored in the cache, it will no longer request data from the server. In a HTTP request, the Web server lists the content type and acceptable content encoding. In this example, the type of content that the Web server chooses to send is text/html
Expand the knowledge point 2:curl to view HTTP response header information
First, let's see that the client (browser) requests data from the server through the following basic steps:
1. The user initiates a http request, and the cache fetches the URL, depending on the URL to find out if there is a matching copy, which may be in memory or on the local disk.
2. If the request hits the local cache, get a "copy" of the corresponding resource from the local cache.
3. Check whether the "copy" expires, otherwise return directly, and if so, continue to forward the request to the server. In HTTP, the expiration time of the document is specified by the Cache-Control header and the Expires header. By judging the expiration time, the cache can know whether the document is within the expiration date. Both the Expires header and the Cache-Control:max-age header are used to tell the cache whether the document has expired, so why do you need two response headers to do this simple thing? In fact, all this is for historical reasons. Expires was first mentioned in HTTP 1.0 because it uses absolute dates, and if the server and client clocks are out of sync (which is actually very common), the cache may think that the document has passed its expiration date.
4. The server receives the request and determines whether the resource has changed. If so, the new content will be returned. Otherwise, 304 will be returned and the expiration time will be updated.
Information of HTTP response header
(1)。 HTTP error code:
1xx:client 's request has been received by server and is being processed
2xx: a successful client request has been received, understood and processed by the server
3xx:client requests are redirected to other server [other URL]
4xx: indicates that the client request is incorrect and cannot be recognized by server
The service on 5xx:server is abnormal.
(2)。 Cache-Control:
Caching settings for web sites: Cache-Control specifies the caching mechanism that requests and responses follow
Cache classification
1) Private caching: common is the built-in cache in our browsers.
2) Public caching: proxy caching is common
First, look at the optional parameters of Cache-Control: private, public, no-cache, max-age, must-revalidate, etc.
No-cache: the response is not cached, but requests resources from the server in real time
No-store: under any conditions, the response will not be cached and will not be written to the client's disk, which is only used by some sensitive responses for security reasons.
Private: indicates that all or part of a response message for a single user cannot be processed by a shared cache. This allows the server to describe only part of the response message for the current user, which is not valid for requests from other users. Can no longer be shared among users.
Public: responses are cached and shared among multiple users. Normally, if HTTP authentication is required, the response is automatically set to private.
Max-age: indicates that the client can receive responses whose lifetime is not longer than the specified time (in seconds). For example, Cache-control: max-age=5 indicates that you will not go to the server if you visit this page again within 5 seconds.
Must-revalidate: the response is reused under certain conditions to satisfy the next request, but it must go to the server to verify that it is still up-to-date (forcing all caches to validate the response).
Proxy-revalidate: similar to must-revalidate, it requires verification of public caches
(3)。 Connection:
Whether server supports persistent connections; if keep-alive indicates that web's server supports persistent connections.
But TCP's persistent connections are bidirectional; both client and server must support persistent connections before a persistent connection can be established.
Generally speaking, client [browsers] support persistent connections by default, so as long as the client supports persistent connections, you can establish persistent connections.
Through the-w parameter of curl, we can customize the output of curl.% {http_code} represents the http status code.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.