Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb the Shangdao Network Project

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

The main content of this article is to explain "how to climb the Python project", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Python how to climb the Shangdao project".

I. achieving the goal

Get the corresponding company name and save the document.

II. Project preparation

Software: PyCharm

Required library: requests,fake_useragent,time

III. Project analysis

How to access the web page?

Http://www.daogame.cn/qudao-p-2.html?s=/qudao-p-1.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-2.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-3.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-4.html

When you click on the next page, add 1 per page to p-{}. Html, replace the changed variable with {}, and then use the for loop to traverse the URL to achieve multiple URL requests.

IV. Project realization

1. Define a class class to inherit object, define the init method to inherit self, and the main function main to inherit self. Import the required library and request address.

Import requestsfrom lxml import etreefrom fake_useragent import UserAgentimport timeclass Shangdao (object): def _ _ init__ (self): self.url = "http://www.daogame.cn/qudao-p-2.html?s=/qudao-p-{}.html" # website def main (self): passif _ name__ = ='_ main__': Siper = Shangdao () Siper.main ()

two。 Randomly generate UserAgent to prevent anti-crawling.

For i in range (1,50): self.headers = {'User-Agent': ua.random,}

3. Send a request to get a response, and the page is called back to facilitate the next request.

Def get_page (self, url): res = requests.get (url=url, headers=self.headers) html = res.content.decode ("utf-8") return html

4. Get the company name, for traversal.

Def page_page (self, html): parse_html = etree.HTML (html) one = parse_html.xpath ('/ / h3/a/text ()') for i in one: print (I)

5. Write to the document.

F = open ('company .doc', 'asides, encoding='utf-8') # Open the file f.write (str (I)) in' w' mode

6. Call the method to realize the function.

Def main (self): stat = int (input ("input start (2):") end = int (input ("input junction:") for page in range (stat) End + 1): url = self.url.format (page) print (url) html = self.get_page (url) self.page_page (html) print ("=% s page crawled successfully! = "% page)

Project Optimization:

Set the time delay.

Time.sleep (1. 4) 5. Effect demonstration

Click the small green triangle to run the input start page and end the page (starting from page 0).

The name of the channel company, and the result shows the console.

Save the document.

At this point, I believe you have a deeper understanding of "Python how to climb the Shangdao Network Project". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report