Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to crawl Little Red Book data by using Cooperative Program in def

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to use the cooperative process to climb Little Red Book data in def. I hope you will get something after reading this article. Let's discuss it together.

Data acquisition course of Little Red Book Use the cooperative program to climb the data under the popular page of Little Red Book from gevent import monkey# Monkey Patch monkey.patch_all () from gevent.pool import Poolfrom queue import Queueimport requestsimport jsonfrom lxml import etreeclass RedBookSpider (): "Little Red Book Crawler"def _ _ init__ (self)" Pages): "" initialize "self.url = 'https://www.xiaohongshu.com/web_api/sns/v2/trending/page/brand?page={}&page_size=20' self.headers = {" User-Agent ":" Mozilla/5.0 (Linux) Android 5.0 SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Mobile Safari/537.36 "} self.url_queue = Queue () self.pool = Pool (5) self.pages = pages pass def get_url (self):"get url" for page in range (1) Self.pages): url = self.url.format (page) self.url_queue.put (url) def save_data (self, items): "" data save "" with open ('. / redbook.txt', 'a preservation, encoding='utf-8') as f: f.write (str (items) +'\ n') def deal_detail (self, detail_url, items) Data): "details page content extraction" resp = requests.get (url=detail_url Headers=self.headers) eroot = etree.HTML (resp.text) items ['fans'] = eroot.xpath (' / / div [@ data-v-64bff0ce] / div [@ class= "extra"] / text ()') items ['articles'] = eroot.xpath (' / / div/span [@ class= "stats"] / text ()') items ['introduce'] = eroot.xpath (' / / div [@ class= "desc"] / div [@ class= "content"] / text () 'items [' detail_url'] = detail_url items ['image'] = data [' page_info'] ['banner'] print (items) self.save_data (items) def deal_response (self) Resp): "" data extraction "" dict_data = json.loads (resp.text) dict_data = dict_data ['data'] for data in dict_data: items = {} items [' name'] = data ['page_info'] [' name'] detail_url = 'https://www.xiaohongshu.com/page/brands /'+ data ['page_id'] self.deal_detail (detail_url Items, data) def execute_task (self): "" process response "url= self.url_queue.get () resp = requests.get (url=url, headers=self.headers) # print (resp.text) self.deal_response (resp) self.url_queue.task_done () def execute_task_finished (self Result): "" Task callback "self.pool.apply_async (self.execute_task, callback=self.execute_task_finished) def run (self):"Startup Program" self.get_url () for i in range (3): self.pool.apply_async (self.execute_task) Callback=self.execute_task_finished) self.url_queue.join () passif _ _ name__ ='_ _ main__': user = RedBookSpider (4) # change how many pages of data you need to crawl to user.run () to finish reading this article I believe you have a certain understanding of "how to climb Little Red Book data in def". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report