In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
The main content of this article is to explain "how to climb the Python project", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Python how to climb the Shangdao project".
I. achieving the goal
Get the corresponding company name and save the document.
II. Project preparation
Software: PyCharm
Required library: requests,fake_useragent,time
III. Project analysis
How to access the web page?
Http://www.daogame.cn/qudao-p-2.html?s=/qudao-p-1.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-2.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-3.htmlhttp://www.daogame.cn/qudao-p-2.html?s=/qudao-p-4.html
When you click on the next page, add 1 per page to p-{}. Html, replace the changed variable with {}, and then use the for loop to traverse the URL to achieve multiple URL requests.
IV. Project realization
1. Define a class class to inherit object, define the init method to inherit self, and the main function main to inherit self. Import the required library and request address.
Import requestsfrom lxml import etreefrom fake_useragent import UserAgentimport timeclass Shangdao (object): def _ _ init__ (self): self.url = "http://www.daogame.cn/qudao-p-2.html?s=/qudao-p-{}.html" # website def main (self): passif _ name__ = ='_ main__': Siper = Shangdao () Siper.main ()
two。 Randomly generate UserAgent to prevent anti-crawling.
For i in range (1,50): self.headers = {'User-Agent': ua.random,}
3. Send a request to get a response, and the page is called back to facilitate the next request.
Def get_page (self, url): res = requests.get (url=url, headers=self.headers) html = res.content.decode ("utf-8") return html
4. Get the company name, for traversal.
Def page_page (self, html): parse_html = etree.HTML (html) one = parse_html.xpath ('/ / h3/a/text ()') for i in one: print (I)
5. Write to the document.
F = open ('company .doc', 'asides, encoding='utf-8') # Open the file f.write (str (I)) in' w' mode
6. Call the method to realize the function.
Def main (self): stat = int (input ("input start (2):") end = int (input ("input junction:") for page in range (stat) End + 1): url = self.url.format (page) print (url) html = self.get_page (url) self.page_page (html) print ("=% s page crawled successfully! = "% page)
Project Optimization:
Set the time delay.
Time.sleep (1. 4) 5. Effect demonstration
Click the small green triangle to run the input start page and end the page (starting from page 0).
The name of the channel company, and the result shows the console.
Save the document.
At this point, I believe you have a deeper understanding of "Python how to climb the Shangdao Network Project". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.