In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
WeiboUserScrapy crawler crawling example analysis, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
I stopped maintaining the GUI feature set version and focused on the development and maintenance of the feature independent version without the GUI version, and made sure that each feature was the smallest available product and did not interfere with each other. However, the functional independent version has always had a historical problem: it does not separate all the Weibo functions of crawling each user in the centralized version.
The overall stripping process is relatively easy, because the feature set version of each feature has a relatively independent class, the user Weibo crawler is a WeiboUserScrapy class, but in the centralized version for communication and coordination with other functional modules, the introduction of PyQT5 semaphores, as well as some common configuration variables, independent of these things can be removed.
After you get the code, the two things you need to do are:
Change the Cookie in the code, change the user_id to the user id (pure numbers) you want to crawl, and then run the code. In a moment, you will see files in the format {user_id} _ {nickname} _ {weibo_num} blog _ {followers} powder _ {following} follow .csv'in the user folder under the root directory of the project, where all the crawled Weibo is saved. Think of the question that brothers may have: does a blogger have 4w Weibo posts, climb 2w posts and suddenly lose his network or his Cookie expires? As a conscience blogger, of course, it is not difficult to add a breakpoint to continue climbing. Every time you write csv, you can save and update the paging parameter page to a configuration file. The core code is as follows:
User_page_config = 'user_page.json'
If not os.path.exists ('user_page.json'):
Page = 1
With open (user_page_config,'w', encoding='utf-8-sig') as f:
F.write (json.dumps ({f'{self.user_id}': page}, indent=2))
Else:
With open (user_page_config,'r', encoding='utf-8-sig') as f:
Page = json.loads (f.read ()) [f'{self.user_id}']
Random_pages = random.randint (1,5)
For page in range (page, page_num + 1):
Self.get_one_page (page) # get all Weibo on page
With open (user_page_config,'r', encoding='utf-8-sig') as f:
Old_data = json.loads (f.read ())
Old_data [f'{self.user_id}'] = page
With open (user_page_config,'w', encoding='utf-8-sig') as f:
F.write (json.dumps (old_data, indent=2))
In this way, you can right-run the code and happily do something else without having to stare at the code all the time. After reading the above, have you mastered the method of analyzing the examples of WeiboUserScrapy crawlers? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.