Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the basic knowledge points of Python3 web crawler

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what are the basic knowledge points of Python3 web crawler". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Brief introduction of web crawler

Web crawler, also known as web spider (Web?Spider). It crawls the web content according to the web address (URL), which is the link to the site that we enter in the browser (URL). For example, it is a URL.

1. Review elements

Enter the URL address in the browser's address bar, right-click on the web page, and find and check. (different browsers have different names. Chrome browsers are called check and Firefox browsers are called view elements, but the functions are all the same.)

We can see that there is a big push code on the right, which is called HTML. What is HTML? Take an easy-to-understand example: our genes determine our original appearance, and the HTML returned by the server determines the original appearance of the site.

Why is it a primitive appearance? Because people can have plastic surgery! It hurts, doesn't it? Can the website also be "plastic surgery"? Sure! Please take a look at the following picture:

Can I have so much money? Obviously not. How do I "facelift" the website? Is by modifying the HTML information returned by the server. Each of us is a "plastic surgery master" and can modify the page information. Where we click on the review element on the page, the browser will navigate to the corresponding HTML location for us, and then we can change the HTML information locally.

Another small example: we all know that using the browser's "remember password" function, the password will become a bunch of small black dots, which is invisible. Can I have the password displayed? Yes, just give the page a "minor operation"! Take Taobao as an example, right-click on the input password box and click check.

As you can see, the browser automatically navigates to the corresponding HTML location for us. Change the value of the password attribute in the following figure to the value of the text attribute (modify it directly at the right code):

The password we asked the browser to remember shows up like this:

What do you mean by saying so much? The browser takes the information from the server as a client, parses it, and shows it to us. We can modify the HTML information locally to "facelift" the web page, but our modified information will not be sent back to the server, and the HTML information stored by the server will not be changed. Refresh the interface and the page will return to its original appearance. It's like cosmetic surgery, we can change something superficial, but we can't change our genes.

2. Simple examples

The first step of a web crawler is to obtain the HTML information of a web page according to URL. In Python3, you can use urllib.request and requests to crawl web pages.

The urllib library is built into python, so no additional installation is required, as long as you have installed Python, you can use it. The requests library is a third-party library that needs to be installed by ourselves.

The requests library is powerful and easy to use, so this paper uses the requests library to obtain the HTML information of the web page. Github address of the requests library:

(1) requests installation

?

In cmd, install requests using the following instructions:

Pip install requests

Or:

Easy_install requests

(2) simple examples

The basic method of the requests library is as follows:

First, let's take a look at the requests.get () method, which is used to initiate a GET request to the server. It doesn't matter if you don't know about the GET request. We can understand it this way: get means to get and grasp, then the requests.get () method is to get and grab data from the server, that is, to get data. Let's look at an example (for example) to deepen our understanding:

#-*-coding:UTF-8-*-

Import?requests

If?__name__ = ='_ _ main__':

? Target =''

? Req = requests.get (url=target)

? Print (req.text)

One parameter that the requests.get () method must set is url, because we have to tell the GET request who our target is and whose information we want to get.

The content of "what are the basic knowledge points of Python3 web crawler" is introduced here. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report