In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the relevant knowledge of "Python crawler basic case analysis". The editor shows you the operation process through the actual case, and the operation method is simple, fast and practical. I hope this article "Python crawler basic case analysis" can help you solve the problem.
First of all, the friends in front of the screen need to install the requests library first, and you need to install the Python environment before installing it. If it is not installed, the editor provides you with the latest Python compiler installation tutorial: Python's latest 3.9.0 compiler installation tutorial.
After installing the Python environment, windows users can open the cmd command and enter the following command (the rest of the system installation is roughly the same).
Pip install requests
Linux users:
Sudo pip install requests
Next is an example to explain, friends more hands-on practice!
1. Climb the home page of Baidu and get the page information
Example
# crawling Baidu page
Import requests # Import requests crawler library
Resp = requests.get ('http://www.baidu.com') # generates a response object
Resp.encoding = 'utf-8' # set the encoding format to utf-8
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
2. An example of get method in requests library
Before that, I would like to introduce a URL: httpbin.org, this website can test all kinds of information of HTTP request and response, such as cookie, ip, headers and login authentication, and supports GET, POST and other methods, which is very helpful for web development and testing. It is written in Python + Flask and is an open source project.
Official website: http://httpbin.org/
Open source address: https://github.com/Runscope/httpbin
Example
# get method example
Import requests # Import requests crawler library
Resp5, crawl the web page picture, and save it locally. 5. Climb the picture of the web page and save it locally. = requests.get ("http://httpbin.org/get") # get method
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
3. An example of post method in requests library
Example
# post method example
Import requests # Import requests crawler library
Resp = requests.post ("http://httpbin.org/post") # post method
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
4. An example of put method in requests library
Example
# put method example
Import requests # Import requests crawler library
Resp = requests.put ("http://httpbin.org/put") # put method
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
5. The get method of requests library transfers parameters.
There are two ways to pass parameters using the get method:
Add the parameter to be passed after the get method and link it with a "=" sign and separate it with a "&" symbol.
Use the params dictionary to pass multiple parameters. Examples are as follows:
Example
# instance 1 of get parameter transfer method
Import requests # Import requests crawler library
Resp = requests.get ("http://httpbin.org/get?name=w3cschool&age=100") # get pass parameters
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
Example
# instance 2 of get parameter transfer method
Import requests # Import requests crawler library
Data = {
"name": "w3cschool"
"age":
} # using dictionaries to store passed parameters
Resp = requests.get ("http://httpbin.org/get", params=data) # get pass parameters
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
6. Transfer parameters by post method in requests library
Passing parameters using the post method is similar to the second method of passing parameters using the get method. Examples are as follows:
Example
# example of post parameter transfer method
Import requests # Import requests crawler library
Data = {
"name": "w3cschool"
"age":
} # using dictionaries to store passed parameters
Resp = requests.post ("http://httpbin.org/post", params=data) # post pass parameters
Print (resp.status_code) # print status code
Print (resp.text) # outputs crawled information
7. How to bypass the anti-crawler measures of major websites, taking the Cat's Eye box office as an example:
Example
Import requests # Import requests crawler library
Url = 'http://piaofang.maoyan.com/dashboard' # Maoyan box office address
Headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
} # set header information to disguise the browser
Resp = requests.get (url, headers=headers) #
Print (resp.status_code) # print status code
Print (resp.text) # Web page information
8. Crawl the picture of the web page and save it locally.
First set up a crawler directory on the E disk before you can save the information. Friends can choose the directory to save and change the corresponding directory code in the code.
Example
Import requests # Import requests crawler library
Headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
} # set header information to disguise the browser
Resp = requests.get ('http://7n.yisu.com/statics/img/logo/indexlogo@2x.png', headers = headers) # get response to the picture
File = open ("E:\\ crawler\\ test.png", "wb") # Open a file, wb means to open a file in binary format for writing only
File.write (resp.content) # write to a file
File.close () # close file operation
This is the end of the introduction to the basic case Analysis of Python crawler. Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.