In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
What are the entry examples of Python crawlers? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Take friends to learn python crawler, prepared a few simple getting started examples to share with you.
The main points of knowledge are:
How web interacts
Application of get and post functions in requests Library
Related functions, properties of response object
Opening and saving of python files
Comments are given in the code, and can be run directly.
How to install the requests library (friends who have installed python can refer to it directly, but if not, it is recommended to install a python environment first)
Windows users, Linux users are almost the same: open cmd and enter the following command. If the python environment is in the directory of disk C, it will prompt you that the permissions are not enough. Just run the cmd window pip install-I https://pypi.tuna.tsinghua.edu.cn/simple requests as an administrator.
Linux users are similar (ubantu as an example): if the permissions are insufficient, you can add sudo before the command.
Sudo pip install-I https://pypi.tuna.tsinghua.edu.cn/simple requests1. Crawl the powerful BD page, print the page information # the first crawler example, climb the Baidu page
Import requests # imports the crawler's library, otherwise the crawler's function cannot be called
Response = requests.get ("http://www.baidu.com") # generates a response object
Response.encoding = response.apparent_encoding # set the encoding format
Print ("status code:" + str (response.status_code)) # print status code
Print (response.text) # outputs crawled information 2. The get method example of the common method, and the following example of passing parameters # the second get method example
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.get ("http://httpbin.org/get") # get method
Print (response.status_code) # status code
Print (response.text) 3. Examples of post methods of common methods, and the following are examples of passing parameters # the third example of post method
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.post ("http://httpbin.org/post") # post method access
Print (response.status_code) # status code
Print (response.text) 4. Put method example # fourth put method example
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.put ("http://httpbin.org/put") # put method access
Print (response.status_code) # status code
Print (response.text) 5. An example of passing parameters by get method of commonly used methods (1)
If you need to pass multiple parameters, you only need to use the & symbol connection as follows:
# the fifth example of get parameter transfer method
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.get ("http://httpbin.org/get?name=hezhi&age=20") # get pass parameters
Print (response.status_code) # status code
Print (response.text) 6. An example of passing parameters by get method of commonly used methods (2)
Multiple params can be passed in a dictionary.
# the sixth example of get parameter transfer method
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Data = {"name": "hezhi", "age": 20} response = requests.get ("http://httpbin.org/get", params=data) # get pass parameters
Print (response.status_code) # status code
Print (response.text) 7. The example of post method for passing parameters of common methods (2) is very similar to the previous one # the seventh example of post method
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Data = {"name": "hezhi", "age": 20} response = requests.post ("http://httpbin.org/post", params=data) # post pass parameters
Print (response.status_code) # status code
Print (response.text) 8. Regarding bypassing anti-crawling mechanism, take zh father as an example # several examples of methods
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.get ("http://www.zhihu.com") # visit Zhihu for the first time and do not set header information
Print ("for the first time, no header information, status code:" + response.status_code) # does not write headers, cannot crawl normally, the status code is not 200
# the following is the difference that can be crawled normally, changing the User-Agent field
Headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
} # set header information to disguise the browser
Response = requests.get ("http://www.zhihu.com", headers=headers) # get method access, passing in the headers parameter
Print (response.status_code) # 200! Access to the successful status code
Print (response.text) 9. Crawl the information and save it locally because of the directory, set up a folder called crawler on disk D, and then save the information
Pay attention to the encoding settings when saving the file
# climb a html and save it
Import requests
Url = "http://www.baidu.com"
Response = requests.get (url)
Response.encoding = "utf-8" # set the receiving encoding format
Print ("type of\ nr" + str (type (response)))
Print ("\ nThe status code is:" + str (response.status_code))
Print ("\ nheader information:" + str (response.headers))
Print ("\ nResponsive content:")
Print (response.text)
# Save the file file = open ("D:\\ crawler\\ baidu.html", "w", encoding= "utf") # Open a file, w if the file does not exist, create a new file. Wb is not used here because you do not have to save to binary
File.write (response.text)
File.close () 10. Crawl the picture and save it locally # Save the Baidu picture to the local
Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called
Response = requests.get ("https://www.baidu.com/img/baidu_jgylogo3.gif") # get method response to the picture"
File = open ("D:\\ crawler\\ baidu_logo.gif", "wb") # Open a file, wb means to open a file in binary format for writing only
File.write (response.content) # write to a file
File.close () # close operation, after running, go to your directory to see if you have successfully read the above contents. Have you mastered the methods of getting started with Python crawlers? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.