In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the relevant knowledge of "how to use python to crawl CSDN popular comments URL and save it in redis". The editor shows you the operation process through an actual case. The method of operation is simple, fast and practical. I hope that this article "how to use python to crawl CSDN popular comments URL and save it in redis" can help you solve the problem.
1. Configure webdriver
Download the Google browser driver and configure it
Import timeimport randomfrom PIL import Imagefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECif _ _ name__ = ='_ main__':options = webdriver.ChromeOptions () options.binary_location = r'C:UsershhhAppDataLocalGoogleChromeApplication Google browser .exe'# driver=webdriver.Chrome (executable_path=r'D:360Chromechromedriverchromedriver.exe') driver=webdriver.Chrome (options=options) # take the java module as an example, driver.get ('https://www. Csdn.net/nav/java') for i in range (1Magazine 20): driver.execute_script ("window.scrollTo (0) Document.body.scrollHeight) ") time.sleep (2) II. Get URLfrom bs4 import BeautifulSoupfrom lxml import etree html = etree.HTML (driver.page_source) # soup = BeautifulSoup (html, 'lxml') # soup_herf=soup.find_all (" # feedlist_id > li:nth-child (1) > div > div > H2 > a ") # soup_herftitle = html.xpath (' / * [@ id=" feedlist_id "] / li/div/div/h2/a/@href')
As you can see, it crawls a lot at one time, and the speed is very fast.
3. Write to Redis
After importing the redis package, configure the redis port and redis database, and use the rpush function to write
Open redis
Import redisr_link = redis.Redis (port='6379', host='localhost', decode_responses=True, db=1) for u in title:print ("ready to write {}" .format (u) r_link.rpush ("csdn_url", u) print ("{} write succeeded!" .format (u)) print ('='* 30, 'nails, "total url: {}" (len (title),' nails,'='* 30)
The great task has been completed!
As you can see in Redis Desktop Manager, crawling and writing are very fast.
To use it, simply use rpop to OK the stack.
One_url = r_link.rpop ("csdn_url)") while one_url:print ("{} is popped!" Format (one_url)) this is the end of the content on "how to use python to crawl the URL of CSDN's popular comments and save it to redis". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.