Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the examples of getting started with Python crawlers

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What are the entry examples of Python crawlers? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Take friends to learn python crawler, prepared a few simple getting started examples to share with you.

The main points of knowledge are:

How web interacts

Application of get and post functions in requests Library

Related functions, properties of response object

Opening and saving of python files

Comments are given in the code, and can be run directly.

How to install the requests library (friends who have installed python can refer to it directly, but if not, it is recommended to install a python environment first)

Windows users, Linux users are almost the same: open cmd and enter the following command. If the python environment is in the directory of disk C, it will prompt you that the permissions are not enough. Just run the cmd window pip install-I https://pypi.tuna.tsinghua.edu.cn/simple requests as an administrator.

Linux users are similar (ubantu as an example): if the permissions are insufficient, you can add sudo before the command.

Sudo pip install-I https://pypi.tuna.tsinghua.edu.cn/simple requests1. Crawl the powerful BD page, print the page information # the first crawler example, climb the Baidu page

Import requests # imports the crawler's library, otherwise the crawler's function cannot be called

Response = requests.get ("http://www.baidu.com") # generates a response object

Response.encoding = response.apparent_encoding # set the encoding format

Print ("status code:" + str (response.status_code)) # print status code

Print (response.text) # outputs crawled information 2. The get method example of the common method, and the following example of passing parameters # the second get method example

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.get ("http://httpbin.org/get") # get method

Print (response.status_code) # status code

Print (response.text) 3. Examples of post methods of common methods, and the following are examples of passing parameters # the third example of post method

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.post ("http://httpbin.org/post") # post method access

Print (response.status_code) # status code

Print (response.text) 4. Put method example # fourth put method example

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.put ("http://httpbin.org/put") # put method access

Print (response.status_code) # status code

Print (response.text) 5. An example of passing parameters by get method of commonly used methods (1)

If you need to pass multiple parameters, you only need to use the & symbol connection as follows:

# the fifth example of get parameter transfer method

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.get ("http://httpbin.org/get?name=hezhi&age=20") # get pass parameters

Print (response.status_code) # status code

Print (response.text) 6. An example of passing parameters by get method of commonly used methods (2)

Multiple params can be passed in a dictionary.

# the sixth example of get parameter transfer method

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Data = {"name": "hezhi", "age": 20} response = requests.get ("http://httpbin.org/get", params=data) # get pass parameters

Print (response.status_code) # status code

Print (response.text) 7. The example of post method for passing parameters of common methods (2) is very similar to the previous one # the seventh example of post method

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Data = {"name": "hezhi", "age": 20} response = requests.post ("http://httpbin.org/post", params=data) # post pass parameters

Print (response.status_code) # status code

Print (response.text) 8. Regarding bypassing anti-crawling mechanism, take zh father as an example # several examples of methods

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.get ("http://www.zhihu.com") # visit Zhihu for the first time and do not set header information

Print ("for the first time, no header information, status code:" + response.status_code) # does not write headers, cannot crawl normally, the status code is not 200

# the following is the difference that can be crawled normally, changing the User-Agent field

Headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"

} # set header information to disguise the browser

Response = requests.get ("http://www.zhihu.com", headers=headers) # get method access, passing in the headers parameter

Print (response.status_code) # 200! Access to the successful status code

Print (response.text) 9. Crawl the information and save it locally because of the directory, set up a folder called crawler on disk D, and then save the information

Pay attention to the encoding settings when saving the file

# climb a html and save it

Import requests

Url = "http://www.baidu.com"

Response = requests.get (url)

Response.encoding = "utf-8" # set the receiving encoding format

Print ("type of\ nr" + str (type (response)))

Print ("\ nThe status code is:" + str (response.status_code))

Print ("\ nheader information:" + str (response.headers))

Print ("\ nResponsive content:")

Print (response.text)

# Save the file file = open ("D:\\ crawler\\ baidu.html", "w", encoding= "utf") # Open a file, w if the file does not exist, create a new file. Wb is not used here because you do not have to save to binary

File.write (response.text)

File.close () 10. Crawl the picture and save it locally # Save the Baidu picture to the local

Import requests # imports the crawler's library first, otherwise the crawler's function cannot be called

Response = requests.get ("https://www.baidu.com/img/baidu_jgylogo3.gif") # get method response to the picture"

File = open ("D:\\ crawler\\ baidu_logo.gif", "wb") # Open a file, wb means to open a file in binary format for writing only

File.write (response.content) # write to a file

File.close () # close operation, after running, go to your directory to see if you have successfully read the above contents. Have you mastered the methods of getting started with Python crawlers? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report