Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A case study of basic introduction to Python crawler

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "Python crawler basic case analysis". The editor shows you the operation process through the actual case, and the operation method is simple, fast and practical. I hope this article "Python crawler basic case analysis" can help you solve the problem.

First of all, the friends in front of the screen need to install the requests library first, and you need to install the Python environment before installing it. If it is not installed, the editor provides you with the latest Python compiler installation tutorial: Python's latest 3.9.0 compiler installation tutorial.

After installing the Python environment, windows users can open the cmd command and enter the following command (the rest of the system installation is roughly the same).

Pip install requests

Linux users:

Sudo pip install requests

Next is an example to explain, friends more hands-on practice!

1. Climb the home page of Baidu and get the page information

Example

# crawling Baidu page

Import requests # Import requests crawler library

Resp = requests.get ('http://www.baidu.com') # generates a response object

Resp.encoding = 'utf-8' # set the encoding format to utf-8

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

2. An example of get method in requests library

Before that, I would like to introduce a URL: httpbin.org, this website can test all kinds of information of HTTP request and response, such as cookie, ip, headers and login authentication, and supports GET, POST and other methods, which is very helpful for web development and testing. It is written in Python + Flask and is an open source project.

Official website: http://httpbin.org/

Open source address: https://github.com/Runscope/httpbin

Example

# get method example

Import requests # Import requests crawler library

Resp5, crawl the web page picture, and save it locally. 5. Climb the picture of the web page and save it locally. = requests.get ("http://httpbin.org/get") # get method

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

3. An example of post method in requests library

Example

# post method example

Import requests # Import requests crawler library

Resp = requests.post ("http://httpbin.org/post") # post method

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

4. An example of put method in requests library

Example

# put method example

Import requests # Import requests crawler library

Resp = requests.put ("http://httpbin.org/put") # put method

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

5. The get method of requests library transfers parameters.

There are two ways to pass parameters using the get method:

Add the parameter to be passed after the get method and link it with a "=" sign and separate it with a "&" symbol.

Use the params dictionary to pass multiple parameters. Examples are as follows:

Example

# instance 1 of get parameter transfer method

Import requests # Import requests crawler library

Resp = requests.get ("http://httpbin.org/get?name=w3cschool&age=100") # get pass parameters

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

Example

# instance 2 of get parameter transfer method

Import requests # Import requests crawler library

Data = {

"name": "w3cschool"

"age":

} # using dictionaries to store passed parameters

Resp = requests.get ("http://httpbin.org/get", params=data) # get pass parameters

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

6. Transfer parameters by post method in requests library

Passing parameters using the post method is similar to the second method of passing parameters using the get method. Examples are as follows:

Example

# example of post parameter transfer method

Import requests # Import requests crawler library

Data = {

"name": "w3cschool"

"age":

} # using dictionaries to store passed parameters

Resp = requests.post ("http://httpbin.org/post", params=data) # post pass parameters

Print (resp.status_code) # print status code

Print (resp.text) # outputs crawled information

7. How to bypass the anti-crawler measures of major websites, taking the Cat's Eye box office as an example:

Example

Import requests # Import requests crawler library

Url = 'http://piaofang.maoyan.com/dashboard' # Maoyan box office address

Headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'

} # set header information to disguise the browser

Resp = requests.get (url, headers=headers) #

Print (resp.status_code) # print status code

Print (resp.text) # Web page information

8. Crawl the picture of the web page and save it locally.

First set up a crawler directory on the E disk before you can save the information. Friends can choose the directory to save and change the corresponding directory code in the code.

Example

Import requests # Import requests crawler library

Headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'

} # set header information to disguise the browser

Resp = requests.get ('http://7n.yisu.com/statics/img/logo/indexlogo@2x.png', headers = headers) # get response to the picture

File = open ("E:\\ crawler\\ test.png", "wb") # Open a file, wb means to open a file in binary format for writing only

File.write (resp.content) # write to a file

File.close () # close file operation

This is the end of the introduction to the basic case Analysis of Python crawler. Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report