Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb the hole in the Weibo tree

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how Python climbs the Weibo tree hole, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.

Especially for a project like mine, which is in great demand. But fortunately, a breakthrough was found at last: the "Weibo Tree Hole". The "Weibo Tree Cave" refers to the Weibo of people who have passed away who have declared a suicidal behavior, and its message area belongs to thousands of depressed or desperate people, below which many negative energy and even death-seeking declarations are issued.

For example, Weibo with meals:

1. Find Weibo comment data interface

There are two data interfaces for Weibo comments, one is the mobile version and the other is the PC version. The mobile version can climb only 15 pages of data, so let's start with the PC version and take a look at how to find the interface of the PC version and what it looks like.

First, right-click-check (F12) on the current Weibo page to open the developer tool, and then follow the steps in the figure below (select NetWork-, select XHR-, and click on another comment page-view the new request on the right):

Then if we look at the new request, you will find that you can see the formatted data in Preview, and there is a html in it. If you take a closer look at this html, you will find that this is the data of the comment list. We just need to parse the html.

Then take a look at the URL requested by get:

Https://weibo.com/aj/v6/comment/big?ajwvr=6&id=3424883176420210&page=2&__rnd=1573219876141

Ajwvr is a fixed value of 6. Id refers to the Weibo id that wants to crawl the comment, page refers to the page of the comment, and _ rnd refers to the millisecond timestamp of the request.

However, Weibo requires login to see more comments, so we need to visit Weibo and get the value of cookie before we start climbing.

two。 Write crawlers

Follow the Python practical treasure book at the bottom of the article and reply to Weibo comment crawler to get the complete source code of this project.

Set four parameters:

Set the cookie:

Send a request and parse the data:

To parse the data we need in this string of HTML, XPATH is used here. If you don't know anything about XPATH, you can read the article "learn the XPath of reptiles, read this one is enough":

Https://zhuanlan.zhihu.com/p/29436838

The functions for writing files and downloading images are as follows:

This is the code we used. Reply to Weibo comment crawler in the official account background and you can download the complete source code (with mobile version of crawler).

3. Timing crawler

Still, we don't have enough data. The Weibo comment page in PC only supports climbing to page 50, and you won't get the data after page 51, as shown in the figure:

However, many people replied on the Weibo that there are almost 50 pages of data in a day, and we can get the data by climbing 50 pages at a time every day. The linux system can be implemented using crontab timing scripts, and the windows system can be implemented by scheduling tasks:

Https://blog.csdn.net/wwy11/article/details/51100432

Here we will talk about the implementation of crontab.

Suppose your Python is stored in / usr/bin/ and the script is named weibo.py and stored in home. After typing crontab-e in the terminal, add this sentence at the end:

00 * / usr/bin/python / home/weibo.py the above is how Python climbs the hole in the Weibo tree. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report