Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to crawl the data of Weibo's hot search list

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to use Python to climb Weibo hot search list data". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Weibo's hot search list is of great value to the study of public traffic. Today's tutorial is about how to climb Weibo's most searched list. The link to the hot search list is:

Https://s.weibo.com/top/summary/

Browse with a browser and find that you can view it normally without logging in, which is much easier. Use the developer tool (F12) to view the page logic and get the CSS location of each hot search, as follows:

According to this method, the selector that gets the td tag is:

Pl_top_realtimehot > table > tbody > tr:nth-child (3) > td.td-02

Where nth-child (3) refers to the third tr tag, because this hot search is in the third place, but we have to climb all the hot searches, so: nth-child (3) can be removed. Also note that pl_top_realtimehot is the id,id of the tag, which needs to be preceded by a # sign, and finally becomes:

# pl_top_realtimehot > table > tbody > tr > td.td-02

You can customize the information you want to climb. The information I need here is: the link and title of the hot search, the popularity of the hot search. Their corresponding CSS selectors are:

Link and title: # pl_top_realtimehot > table > tbody > tr > td.td-02 > a

Heat: # pl_top_realtimehot > table > tbody > tr > td.td-02 > span

It is worth noting that the link and the title are in the same place, the link is in the href attribute of the a tag, and the title is in the text of a, there is a way to get it with beautifulsoup, please see the later code.

Now that we know the location of this information, we can start writing programs. By default, you have installed python and can use cmd's pip, if not, see this tutorial: python installation. The python packages that need to be used are:

BeautifulSoup4:

Cmd/Terminal installation instruction: pip install beautifulsoup4.

Lxml parser:

Cmd/Terminal installation instructions: pip install lxmllxml is a package in python that contains tools to convert html text into xml objects to locate tags. The package that can be used to identify the location of these tags in xml objects is Beautifulsoup4.

Write code:

For a description of the code, please see the comments

Results:

Please see the comments for the code description, but by doing so, you will just save the result to an array, which is very difficult to view, so let's save it as a csv file.

The effect is as follows, how is it, is it much better:

The complete code is as follows. Please read the original text and go to the website to read:

This is the end of the content of "how to climb Weibo Hot search list data using Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report