Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of Request in Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about the role of Request in Python. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Introduction of Request

In the introductory tutorial, we introduced the use of the urllib library and urllib2, while we learned the basics of some crawlers and gained a basic understanding of them. In fact, in our production environment, using the Request library is more convenient and practical, and we need just a few lines of code to solve a lot of things.

Installation of Request

We have installed the PIP management tool in the Python introduction and environment configuration. If you are in version 2.x, you can also use easy_install to install the Request library, which can help us easily install the three-party library:

Install Request using pip

# pip2.x install requestspip install requests# pip3.x install requestspip3 install requests

Install Request using easy_install

The use of easy_install requestsRequest

It is very easy to introduce a tripartite module into python, and you only need to introduce import:

Import requestsreq = requests.get ("https://ptorch.com")print(req.text)

In this way, we can quickly extract the code of the target web page, which is very convenient to use!

Request basic request mode

You can send all http requests through the requests library:

Requests.get ("http://httpbin.org/get") # GET request requests.post (" http://httpbin.org/post") # POST request requests.put ("http://httpbin.org/put") # PUT request requests.delete (" http://httpbin.org/delete") # DELETE request requests.head ("http://httpbin.org/get") # HEAD request requests.options (" http://httpbin.org/get") # OPTIONS request) "

Send a GET request using Request

If you want to use a crawler to get a target web page, you can send a HTTP GET request directly using the get method:

Req = requests.get ("http://httpbin.org/get")

In general, we do not only visit basic web pages, especially when crawling dynamic web pages, we need to pass different parameters to get different content. There are two ways for GET to pass parameters. You can add parameters directly to the link or use params to add parameters:

Import requestspayload = {'key1':' value1', 'key2':' value2'} req = requests.get ("http://httpbin.org/get", params=payload) # method two # req = requests.get (" http://httpbin.org/get?key2=value2&key1=value1")print(req.url) sends POST requests using Request

In fact, sending a POST request is very similar to GET, except that the parameters need to be defined in data:

Import requestspayload = {'key1':' value1', 'key2':' value2'} req = requests.post ("http://httpbin.org/post", data=payload) print (req.text)

POST sends JSON data

Many times the data you want to send is not encoded as a form, and you find this problem especially in crawling a lot of java URLs. If you pass a string instead of a dict, the data will be published directly. We can use json.dumps () to convert dict to str format; in addition to coding the dict yourself, you can also pass it directly using the json parameter, and then it will be automatically encoded.

Import jsonimport requestsurl = 'http://httpbin.org/post'payload = {' some': 'data'} req1 = requests.post (url, data=json.dumps (payload)) req2 = requests.post (url, json=payload) print (req1.text) print (req2.text)

POST file upload

If we want to use crawlers to upload files, we can use the file parameter:

Url = 'http://httpbin.org/post'files = {' file': open ('test.xlsx',' rb')} req = requests.post (url, files=files) req.text

If you have a partner who is familiar with WEB development, you should know that if you send a very large file as an multipart/form-data request, you may want to stream the request. Requests does not support it by default, you can use the requests-toolbelt Triple Library.

Request session

In many cases, the crawler we developed needs to log in. After logging in, we need to record the login status, otherwise we cannot crawl the web page that can only be crawled after login. Classes such as requests.Session () are provided in request:

Import requestss = requests.Session () s.get ('http://httpbin.org/get')

In this way, our request will automatically maintain the website's Session record of our login status as long as we call the login entry once. In the future, we can directly use request to access the login page.

Cookie acquisition

We can use cookies to get the cookie in the response: if a response contains some cookie, you can quickly access them:

Req = requests.get ("https://ptorch.com")req = requests.get (" https://ptorch.com")print(req.cookies)print(req.cookies['laravel_session']))

To send your cookies to the server, use the cookies parameter:

Cookies= dict (cookies_are='working Test') req = requests.get ("http://httpbin.org/cookies", cookies=cookies) print (req.text) #'{" cookies ": {" cookies_are ":" working Test "}'

The returned object of Cookie is RequestsCookieJar, and its behavior is similar to that of a dictionary, but the interface is more complete and suitable for cross-domain name and cross-path use. You can also send Cookie Jar to Requests:

Jar = requests.cookies.RequestsCookieJar () jar.set ('tasty_cookie',' yum', domain='httpbin.org', path='/cookies') jar.set ('gross_cookie',' blech', domain='httpbin.org', path='/elsewhere') url = 'http://httpbin.org/cookies'req = requests.get (url, cookies=jar) print (req.text) #' {"cookies": {"tasty_cookie": "yum"}'

Save cookie for next visit. We need to convert CookieJar into dictionary or dictionary into CookieJar

# convert CookieJar to dictionary: cookies = requests.utils.dict_from_cookiejar (r.cookies) # change dictionary to CookieJar:cookies = requests.utils.cookiejar_from_dict (cookie_dict, cookiejar=None, overwrite=True) timeout configuration

You can tell requests to stop waiting for a response after the number of seconds set with the timeout parameter. Basically all production code should use this parameter. If you don't use it, your program may become unresponsive forever:

Requests.get ('http://github.com', timeout=0.001)

Note: timeout is only valid for the connection process and has nothing to do with the download of the responder.

That is, this time only limits the time of the request. Even if the returned response contains a lot of content, it will take some time to download, but this is of no use.

Agent

In many cases, the URL has an anti-crawler mechanism, if our traffic reaches a certain number of IP, for example, many friends often need to use a proxy to climb Wechat articles, you can use the proxies parameter for any request to set the proxy, we can Baidu free agent to get some free agents, the speed is not very fast, but the practice is enough.

Import requestsproxies = {"https": "http://127.0.0.1:4433"}req = requests.post (" http://httpbin.org/post", proxies=proxies) print (req.text)

We can also configure the agent through the HTTP_PROXY and HTTPS_PROXY environment variables.

Export HTTP_PROXY= "http://127.0.0.1:2091"export HTTPS_PROXY=" http://127.0.0.1:2092" request header setting

In crawlers, we need to customize request headers to modify our HTTP requests. In particular, many crawler tools prohibit script access. We can set headers parameters to simulate browser access, and we can also pass cookie through headers to maintain our login status:

Headers= {'user-agent':' my-app/0.0.1'} req = requests.get ("https://api.github.com/some/endpoint", headers=headers) download pictures

Sometimes we want to download the img image of the page we crawled. You can use requests to request the image, get the response.content text information, and actually get the binary text of the image, and then save it:

Import requestsresponse = requests.get ("https://cache.yisu.com/upload/information/20210521/347/241257.png")img = response.contentopen ('logo.jpg',' wb') .write (response.content))

If you want to download the CAPTCHA, you can use the session request above to load the download code here.

Get Request response

We have sent a request in the crawler, and we can use the following method to obtain the Request response for analysis and detection:

# response status code req.status_code# response header req.headers# get request link req.url# get page code req.encoding# get cookiereq.cookies# get page code req.text above is what is the role of Request in Python shared by Xiaobian. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report