Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the request method of Python crawler

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "what is the request method of Python crawler". The editor shows you the operation process through an actual case. The operation method is simple, fast and practical. I hope this article "what is the request method of Python crawler" can help you solve the problem.

1. Request target (URL)

URL, also known as uniform resource locator, is a way to fully describe the addresses of web pages and other resources on the Internet. A file path similar to windows.

2. The composition of the website:

1. Http://: this is the protocol, that is, the HTTP hypertext transfer protocol, that is, the protocol for the transmission of web pages over the Internet.

2.mail: this is the server name, which means it is a mailbox server, so it is mail.

3.163.com: this is the domain name, a unique name used to locate a website.

4.mail.163.com: this is the name of the web site, which is composed of server name and domain name.

5.Compact: this is the root directory, that is, find the server by the website name, and then store the root directory of the web page on the server.

6.index.html: this is the web page under the root directory.

7. Http://mail.163.com/index.html: this is called URL, a uniform resource locator, a global address, used to locate resources on the Internet.

3. Request body (request)

Just like making a phone call, what exactly did HTTP say to the server to get the server to return the correct message? in fact, the client's request tells the server these things: request line, request header, blank line, request data

4. Request method (Method)

HTTP requests can use a variety of request methods, but there are two main methods for crawlers: GET and POST methods.

Get requests: in general, get requests are used when you only get data from the server and will not have any impact on the server resources.

Post request: send data to the server (login), upload files, etc., which will be used when the server resources are affected.

Post request.

The above two methods are commonly used in website development. And in general, it will follow the principle of use. However, some websites and servers often do not play cards according to common sense in order to do anti-crawler mechanism. it is possible that a request that should use the get method must be changed to a post request, which depends on the situation.

The difference between the GET and POST methods:

1.GET is to get data from the server, and POST is to transfer data to the server.

The 2.GET request parameters are displayed on the browser URL, that is, the parameters of the Get request are part of URL. For example: http://www.baidu.com/s?wd=Chinese

The 3.POST request parameters are in the request body, and the message length is unlimited and is sent implicitly, which is usually used to deliver large amounts of data to the HTTP server. The parameter type of the request is contained in the "Content-Type" header, indicating the data format to be submitted when the request is sent.

Note:

Web site creators generally do not use Get to submit forms because they may cause security problems. For example, if you use Get in the login form, the user name and password entered by the user will be exposed in the address bar. And the browser will record historical information, leading to the existence of account insecurity factors.

Commonly used request headers

The request header describes the encoding used by the client to send the request to the server, as well as the length of the content sent, telling the server whether it has logged in, what browser to access, and so on.

1.Accept: the browser tells the server what data types, text, pictures, etc., it accepts.

2.Accept-charset: the browser declares the character set it receives.

3.Accept-Encoding: browsers declare the encoding method they receive, usually specify the compression method, whether compression is supported, and what compression methods (gzip, deflate,br) are supported.

4.Accept-Language: the browser declares the language it receives.

5.Authorization: authorization information, usually in the reply to the WWW-Authenticate header sent by the server.

6.content-Length represents the length of the body of the request message.

7.origin: declares the starting location of the requested resource

8.connection: after processing this request, whether to disconnect or keep the connection. 9.Cookie: Cookie content sent to a WEB server, often used to determine whether or not you have logged in.

9.Cookie: Cookie content sent to a WEB server, often used to determine whether or not you have logged in.

10.Host: the client specifies the domain name / IP address and port number of the WEB server it wants to access.

11.If-Modified-since: the client uses this header to tell the server when the resource is cached. It is returned only if the requested content has been modified after the specified time, otherwise a 304" Not Modified "reply is returned.

12.Pragma: specifying a value of "no-cache" means that the server must return a refreshed document, even if it is a proxy server and already has a local copy of the page.

13.Referer: tells the server which page the page is linked from.

The email address of the sender of the 14.From ∶ request, which is used by some special Web client program and is not used by the browser.

15. (user-Agent: the browser indicates its identity (what kind of browser)

16.upgrade-insecure-requests ∶ states that browsers support automatic upgrades from http requests to https requests, and use https when sending requests later.

UA-Pixels,uA-Color,uA-oS,UA-CPU: a non-standard request header sent by some versions of IE browsers, indicating screen size, color depth, operating system, and CPu type.

VI. Requests module to view the request body

When we request data with the requests module, we carry the field information of the appeal request header to camouflage our crawler code. After the same camouflage, we can also view the field information of the request body through the code. There are several common attributes:

# check the url address in the request body response.request.url# view the request header information in the request body response.request.headers# view the request method in the request body response.request.method about "what is the request method of Python crawler", thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report