How does Python crawl Sina Weibo data 04/02 Update SLTechnology News&Howtos

How does Python crawl Sina Weibo data

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how Python crawls Sina Weibo data. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Now to teach you how to crawl Weibo data in batches, greatly speed up the speed of data migration!

We are using weiboSpider, a crawler library developed by third-party authors (if there are tools, of course). The default here is that you have installed Python.

1. Download the project

Go to the URL below and click Download ZIP to download the project file

Https://github.com/dataabc/weiboSpider

If you have git, you can type the following command in cmd/terminal to install it.

Git clone https://github.com/dataabc/weiboSpider.git

two。 Installation dependency

After unzipping the project package, open your cmd/Termianl to enter the project directory and enter the following command:

Pip install-r requirements.txt

You will begin to install the project dependency and wait for its installation to complete.

3. Set up cookie

Open the weibospider.py file under the weibospider folder and replace "your cookie" with the crawler Weibo's cookie, about line 22 of the weibospider.py file. How to obtain cookie:

3.1Login to Weibo

3.2 Press F12 or right-click in the blank space of the page-check and open the developer tool

Select network-press F5 to refresh-Select the first file-find cookie in the right window

Then replace the cookie about 22 lines in the weibospider.py file, as shown in the figure:

Before replacement:

After replacement:

4. Set the user user_id to climb

4.1 get user_id

Click on the user's home page you want to crawl, and then view the url at this time:

You will find a string of numbers in the link, and this is the userID we are going to use, just copy it.

4.2 set the user_id to be crawled

Open the weibospider.py file under the weibospider folder and assign the user_id of one or more Weibo we want to crawl to user_id_list.

The user_id setting code is located in the main function of weibospider.py. The specific code is as follows:

# climb a single Weibo user and change it to any legitimate user id

User_id_list = ['1669879400']

# climb multiple Weibo users and change them to any legitimate user id

User_id_list = ['1223178222,' 1669879400, '1729370543']

"" you can read user_id_list in a file, which can contain many user_id

Each user_id occupies one line, the file name is arbitrary, the type is txt, and the location is in the same directory of this program.

For example, a file can be called user_id_list.txt "".

User_id_list = wb.get_user_list ('user_id_list.txt')

In this way, our basic settings are complete. Of course, you can also set Mysql database and MongoDB database writes if necessary, and if not, you can write them to txt and csv files by default.

5. Run the crawler

Open cmd/terminal to enter the project directory and enter:

Python weibospider.py

You can start to crawl data, how about it, is it super convenient? And you can also customize the crawled information, such as the start time of Weibo, whether to write to the database, and even add new functions to its code! (for example, add a cookie pool or an agent pool)

So much for sharing the data on how Python crawled Sina Weibo. I hope the above content can be of some help and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.