Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to crawl CSDN popular comment URL and save it to redis

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to use python to crawl CSDN popular comments URL and save it in redis". The editor shows you the operation process through an actual case. The method of operation is simple, fast and practical. I hope that this article "how to use python to crawl CSDN popular comments URL and save it in redis" can help you solve the problem.

1. Configure webdriver

Download the Google browser driver and configure it

Import timeimport randomfrom PIL import Imagefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECif _ _ name__ = ='_ main__':options = webdriver.ChromeOptions () options.binary_location = r'C:UsershhhAppDataLocalGoogleChromeApplication Google browser .exe'# driver=webdriver.Chrome (executable_path=r'D:360Chromechromedriverchromedriver.exe') driver=webdriver.Chrome (options=options) # take the java module as an example, driver.get ('https://www. Csdn.net/nav/java') for i in range (1Magazine 20): driver.execute_script ("window.scrollTo (0) Document.body.scrollHeight) ") time.sleep (2) II. Get URLfrom bs4 import BeautifulSoupfrom lxml import etree html = etree.HTML (driver.page_source) # soup = BeautifulSoup (html, 'lxml') # soup_herf=soup.find_all (" # feedlist_id > li:nth-child (1) > div > div > H2 > a ") # soup_herftitle = html.xpath (' / * [@ id=" feedlist_id "] / li/div/div/h2/a/@href')

As you can see, it crawls a lot at one time, and the speed is very fast.

3. Write to Redis

After importing the redis package, configure the redis port and redis database, and use the rpush function to write

Open redis

Import redisr_link = redis.Redis (port='6379', host='localhost', decode_responses=True, db=1) for u in title:print ("ready to write {}" .format (u) r_link.rpush ("csdn_url", u) print ("{} write succeeded!" .format (u)) print ('='* 30, 'nails, "total url: {}" (len (title),' nails,'='* 30)

The great task has been completed!

As you can see in Redis Desktop Manager, crawling and writing are very fast.

To use it, simply use rpop to OK the stack.

One_url = r_link.rpop ("csdn_url)") while one_url:print ("{} is popped!" Format (one_url)) this is the end of the content on "how to use python to crawl the URL of CSDN's popular comments and save it to redis". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report