In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how Python crawls Sina Weibo data. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
Now to teach you how to crawl Weibo data in batches, greatly speed up the speed of data migration!
We are using weiboSpider, a crawler library developed by third-party authors (if there are tools, of course). The default here is that you have installed Python.
1. Download the project
Go to the URL below and click Download ZIP to download the project file
Https://github.com/dataabc/weiboSpider
Or
If you have git, you can type the following command in cmd/terminal to install it.
Git clone https://github.com/dataabc/weiboSpider.git
two。 Installation dependency
After unzipping the project package, open your cmd/Termianl to enter the project directory and enter the following command:
Pip install-r requirements.txt
You will begin to install the project dependency and wait for its installation to complete.
3. Set up cookie
Open the weibospider.py file under the weibospider folder and replace "your cookie" with the crawler Weibo's cookie, about line 22 of the weibospider.py file. How to obtain cookie:
3.1Login to Weibo
3.2 Press F12 or right-click in the blank space of the page-check and open the developer tool
Select network-press F5 to refresh-Select the first file-find cookie in the right window
Then replace the cookie about 22 lines in the weibospider.py file, as shown in the figure:
Before replacement:
After replacement:
4. Set the user user_id to climb
4.1 get user_id
Click on the user's home page you want to crawl, and then view the url at this time:
You will find a string of numbers in the link, and this is the userID we are going to use, just copy it.
4.2 set the user_id to be crawled
Open the weibospider.py file under the weibospider folder and assign the user_id of one or more Weibo we want to crawl to user_id_list.
The user_id setting code is located in the main function of weibospider.py. The specific code is as follows:
# climb a single Weibo user and change it to any legitimate user id
User_id_list = ['1669879400']
Or
# climb multiple Weibo users and change them to any legitimate user id
User_id_list = ['1223178222,' 1669879400, '1729370543']
Or
"" you can read user_id_list in a file, which can contain many user_id
Each user_id occupies one line, the file name is arbitrary, the type is txt, and the location is in the same directory of this program.
For example, a file can be called user_id_list.txt "".
User_id_list = wb.get_user_list ('user_id_list.txt')
In this way, our basic settings are complete. Of course, you can also set Mysql database and MongoDB database writes if necessary, and if not, you can write them to txt and csv files by default.
5. Run the crawler
Open cmd/terminal to enter the project directory and enter:
Python weibospider.py
You can start to crawl data, how about it, is it super convenient? And you can also customize the crawled information, such as the start time of Weibo, whether to write to the database, and even add new functions to its code! (for example, add a cookie pool or an agent pool)
So much for sharing the data on how Python crawled Sina Weibo. I hope the above content can be of some help and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.