In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "what is the transmission process of HTTP protocol". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Brief introduction of HTTP Protocol
Hypertext transfer Protocol (HyperText Transfer Protocol, abbreviation: HTTP) is an application layer protocol for distributed, collaborative and hypermedia information systems. HTTP is the basis of data communication on the World wide Web.
The development of HTTP was initiated by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in 1989. The standard setting of HTTP was coordinated by the World wide Web Association (World Wide Web Consortium,W3C) and the Internet Engineering Task Force (Internet Engineering Task Force,IETF), and finally released a series of RFC, the most famous of which is RFC 2616, which defines a widely used version of the HTTP protocol-HTTP 1.1.
In December 2014, the Hypertext Transfer Protocol Bis (httpbis) working Group of the Internet Engineering Task Force (IETF) submitted the HTTP/2 standard proposal to IESG for discussion and was approved on February 17, 2015. The HTTP/2 standard was officially published as RFC 7540 in May 2015, replacing HTTP 7540 as the implementation standard of HTTP.
Overview of HTTP Protocol
HTTP is a standard (TCP) for requests and responses from client terminals (users) and servers (websites). By using a web browser, web crawler, or other tool, the client initiates a HTTP request to the specified port on the server (the default port is 80). We call this client the user agent (user agent). Resources, such as HTML files and images, are stored on the answering server. We call this answering server the source server (origin server). There may be multiple "middle tiers" between the user agent and the source server, such as a proxy server, gateway, or tunnel (tunnel).
Although TCP/IP is the most popular application on the Internet, HTTP does not stipulate that it or the layer it supports must be used. In fact, HTTP can be implemented on any Internet protocol, or on any other network. HTTP assumes that its underlying protocols provide reliable transmission. Therefore, any protocol that can provide such a guarantee can be used by it. So it uses TCP as its transport layer in the TCP/IP protocol family.
Typically, the HTTP client initiates a request to create an TCP connection to the server's designated port (the default is port 80). The HTTP server listens for client requests on that port. Once a request is received, the server returns a status, such as "HTTP/1.1 200 OK", to the client, as well as the content returned, such as the requested file, error message, or other information.
How HTTP works
The HTTP protocol defines how the Web client requests Web pages from the Web server and how the server delivers the Web pages to the client. The HTTP protocol adopts the request / response model. The client sends a request message to the server, which contains the request method, URL, protocol version, request header and request data. The server responds with a status line, including the version of the protocol, success or error code, server information, response header, and response data.
The following are the steps for the HTTP request / response:
1. Client connects to Web server
A HTTP client, usually a browser, establishes a TCP socket connection to the HTTP port of the Web server (default is 80).
two。 Send HTTP request
Through the TCP socket, the client sends a text request message to the Web server. A request message consists of four parts: the request line, the request header, the blank line and the request data.
3. The server accepts the request and returns a HTTP response
The Web server parses the request and locates the request resource. The server writes a copy of the resource to the TCP socket and the client reads it. A response consists of four parts: the status line, the response header, the blank line and the response data.
4. Release connection TCP connection
If the connection mode is close, the server actively closes the TCP connection, and the client passively closes the connection, releasing the TCP connection; if the connection mode is keepalive, the connection will remain for a period of time during which you can continue to receive requests
5. Client browser parses HTML content
The client browser first parses the status line to see the status code indicating whether the request was successful or not. Each response header is then parsed, and the response header tells the following HTML document and its character set of several bytes. The client browser reads the response data HTML, formats it according to the syntax of HTML, and displays it in the browser window.
For example, type URL in the browser address bar, and press enter to go through the following process:
The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL.
After parsing the IP address, establish a TCP connection with the server according to the IP address and the default port 80
The browser issues a HTTP request to read the file (the file after the domain name in URL). The request message is sent to the server as the data of the third message of the TCP three-way handshake.
The server responds to the browser request and sends the corresponding html text to the browser
Release TCP connection
The browser will send the html text and display the content.
Http protocol is an application layer protocol based on TCP/IP protocol.
Based on request-response pattern
The HTTP protocol stipulates that the request is sent from the client, and finally the server responds to the request and returns. In other words, it must start with the client.
If the communication is established, the server will not send a response until the request is received.
Stateless save
HTTP is a stateless (stateless) protocol that does not save state. The HTTP protocol itself does not save the state of communication between requests and responses. That is, at the HTTP level, the protocol does not persist requests or responses that have been sent.
Using the HTTP protocol, every time a new request is sent, a corresponding new response is generated. The protocol itself does not retain all previous request or response message information. This is to handle a large number of transactions faster and ensure the scalability of the protocol, while the HTTP protocol is deliberately designed to be so simple. However, with the continuous development of Web, there are more and more cases in which business processing becomes tricky due to statelessness. For example, a user who logs in to a shopping site needs to be able to maintain his login status even after he jumps to other pages of the site. For this example, the website needs to save the user's status in order to know who sent the request. Although HTTP/1.1 is a stateless protocol, in order to achieve the desired state-keeping function, Cookie technology is introduced. With Cookie and then using the HTTP protocol to communicate, you can manage the state. The details of Cookie will be explained later.
No connection
Connectionless means limiting the processing of only one request per connection. After the server processes the customer's request and receives the customer's reply, it disconnects. Using this method can save transmission time, and can improve concurrent performance, can not establish a long-term connection with each user, a corresponding request, the server and the client will be interrupted. But there are two ways of connectionless. The early http protocol disconnected directly after a request and a response, but now the http version 1.1 is not disconnected directly, but wait a few seconds. What are you waiting for these seconds? wait for the user to have a follow-up operation. If the user has a new request within a few seconds, then send and receive messages through the previous connection channel. If the user does not send a new request after a few seconds, then the connection will be disconnected, which can improve efficiency and reduce the number of times to establish a connection in a short time, because establishing a connection is also time-consuming, and the default seems to be now in 3 seconds. but this time can be adjusted through our back-end code, and our website analyzes and calculates an optimal waiting time according to the behavior of users of our own website.
HTTP request method
The HTTP/1.1 protocol defines eight methods (also known as "actions") to manipulate specified resources in different ways:
GET
Issues a display request to the specified resource. Using the GET method should only be used to read data and should not be used in operations that produce "side effects", such as in Web Application. One reason is that GET may be accessed at will by web spiders and so on.
HEAD
Like the GET method, a request for a specified resource is made to the server. It's just that the server will not return the article portion of the resource. Its advantage is that using this method, you can get "information about the resource" (meta-information, or metadata) without having to transfer all the content.
POST
Submit data to the specified resource and request the server for processing (such as submitting a form or uploading a file). The data is included in the request article. This request may create a new resource or modify an existing resource, or both.
PUT
Uploads the latest content to the specified resource location.
DELETE
Request the server to delete the resource identified by Request-URI.
TRACE
Echo requests received by the server, mainly for testing or diagnostics.
OPTIONS
This method enables the server to return all the HTTP request methods supported by the resource. Send an OPTIONS request to the Web server with'* 'instead of the resource name to test whether the server is functioning properly.
CONNECT
The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipelines. Typically used for links to SSL encrypted servers (via unencrypted HTTP proxy servers).
Note:
Method names are case-sensitive. When the resource for which a request is targeted does not support the corresponding request method, the server should return the status code 405 (Method Not Allowed), and when the server does not recognize or support the corresponding request method, it should return the status code 501 (Not Implemented).
The HTTP server should at least implement the GET and HEAD methods, and the other methods are optional. Of course, the implementations supported by all methods should match the respective semantic definitions of the following methods. In addition, in addition to the above methods, specific HTTP servers can also extend custom methods. For example, PATCH (the method specified by RFC 5789) is used to apply local modifications to the resource _. _
Request method: get and post request (we write and see through the form form)
The data submitted by GET will be placed after URL, that is, in the request line, so that? Split the URL and transmit the data, and the parameters are connected by &, such as EditBook?name=test1&id=123456. (the content-type in the request header does this parameter form, later) the POST method puts the submitted data in the request body of the HTTP package.
The size of the data submitted by GET is limited (because the browser has a limit on the length of URL), while the data submitted by the POST method has no limit.
The difference between GET and POST requests in getting request data on the server side is that we have a different way of getting request data on the server side.
HTTP status code
The first line of all HTTP responses is the status line, followed by the current HTTP version number, a three-digit status code, and a phrase describing the status, separated by spaces.
The first number of the status code represents the type of current response:
1xx message-the request has been received by the server and continues processing
2xx successful-the request has been successfully received, understood, and accepted by the server
3xx redirection-subsequent operations are required to complete this request
4xx request error-the request contains a lexical error or cannot be executed
5xx server error-an error occurred while the server was processing a correct request
Although phrases to describe state are recommended in RFC 2616, such as "200 OK" and "404 Not Found", WEB developers can still decide which phrase to use to display localized state descriptions or custom information.
URL
The uniform Resource Locator of Hypertext transfer Protocol (HTTP) includes the five basic elements of obtaining information from the Internet in a simple address:
Transport protocol.
Hierarchical URL tag symbol ([/ /], fixed)
Credential information required to access resources (can be omitted)
Server. (usually a domain name, sometimes an IP address)
Port number. (expressed numerically, the default value of ": 80" for HTTP can be omitted)
The path. (distinguish each directory name in the path with the "/" character)
Inquiry. (form parameters in GET mode to "?" Characters as the starting point, each parameter is separated by "&", and then the parameter name is separated from the data by "=", which is usually encoded by UTF8's URL to avoid the problem of character conflict)
Footage. Start with the "#" character
Take http://www.luffycity.com:80/news/index.html?id=250&page=1 as an example, where:
Http, it's the protocol.
Www.luffycity.com, is the server
80, which is the default network port number on the server and is not displayed by default
/ news/index.html, is the path (URI: navigate directly to the corresponding resource)
? id=250&page=1, it's an inquiry.
Most web browsers do not require users to enter the "http://"" part of a web page, because the vast majority of web content is a hypertext transfer protocol file. Similarly, "80" is a common port number for hypertext transfer protocol files, so it is generally not necessary to specify it. Generally speaking, users only need to type part of the uniform resource locator (www.luffycity.com:80/news/index.html?id=250&page=1).
Because the hypertext transfer protocol allows the server to redirect the browser to another web address, many servers allow users to omit parts of the web address, such as www. Technically speaking, the omitted web address is actually a different web address, and the browser itself cannot determine whether the new address is accessible or not, and the server must complete the task of redirection.
HTTP request format (request protocol)
URL contains: / index/index2?a=1&b=2; path and parameters are here.
Take the content in the request header as an example: this length represents the data length in the request body, and these key-value pairs in other request headers will be discussed one after another. We will probably know about it. There is a user-agent that you need to remember, that is, to tell your server what I used to send you the request.
Take JD.com as an example. Take a look at user-agent.
Looking at an example of a crawler, there is no problem when climbing JD.com, but you must bring user-agent when climbing a drawer, because the drawer makes a judgment on user-agent to determine whether you are a normal request, which can be regarded as a kind of anti-scraping mechanism.
Open the demo.html file we saved, and then open it through the browser to see the effect of the page.
The purpose of writing the above is to let you know that there is such a request header, some of which are meaningful, and we can also define the request header ourselves, just add it to the headers= {} in the requests module.
HTTP response format (response protocol)
This is the end of the content of "what is the transmission process of HTTP protocol". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.