Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Python crawler crawls for fans' comments

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Python crawler crawling fan comments, many novices are not very clear about this, in order to help you solve this problem, the following small series will explain in detail for everyone, there are people who need this can learn, I hope you can gain something.

This time use python crawler to crawl something fun

These two days just happen to have the nba finals, fans you certainly will not miss, not to mention this year's western conference finals is the rockets vs. warriors, this year's rockets are very strong, because there will always be people standing up when it comes to the key. Of course, the Warriors are also quite strong, after all, can not underestimate Curry Durant and other four giants.

I don't know much about the finals in the East. I always thought that Celtics would fight hard against Cavaliers. Who knew that Celtics, who lacked two main players, were still very strong, and they also played Cavaliers 2:0. It seems that this Cavaliers will be doomed. I don't know if Celtics will succeed in revenge. Let's wait and see!

There must be comments when there is a live broadcast, so I want to crawl down the fan comments and see what they are talking about!

preparations

Libraries needed:

requests: Used for network requests

jieba: used as a participle

wordcloud: making word cloud

numpy: making background pictures

Word cloud background image:

The above libraries can be downloaded directly with pip, but wordcloud will report an error as follows:

We need to download the whl file from the official website for manual installation

Official website: www.lfd.uci.edu/~gohlke/pythonlibs/

Then find the Python version that corresponds to your installation and download it

Finally, install it on the command line

pip install "file path +whl file name"

Next, find the target page.

Text Live Address: www.zhibo8.cc/zhibo/nba/2018/0517123898.htm? redirect=zhibo

On this page by grabbing the package (press f12) lesson know that the link below is to return comments information, and is a json

Link: cache.zhibo8.cc/json/2018/nba/0517123898_384.htm? key=0.6512348313080727

Through multiple analysis, we know that the information in bold above is the information of the live broadcast room, and the number of comments after the underline is the number of pages. The last key parameter is a random number. It doesn't matter whether the request is carried out or not.

Use code to get comment information

def __get_json(self, index):

url = 'https://cache.zhibo8.cc/json/2018/nba/0517123898_%d.htm? key=0.1355540028791382' % index

response = requests.get(url)

if response.status_code == 200:

for item in response.json():

#Write to file

self.__ write_file(item['content'])

self.num += 1

return 1

else:

return 0

Comment information is available, and then make a word cloud map

def __get_wordcloud(self):

with open('comments.txt', 'r', encoding='utf-8') as comments:

text = comments.read () #Load Data

words = ' '.join(jieba.cut(text, cut_all=True)) #Use stuttering full participle mode

image = np.array(Image.open ('1.jpg ')) #Background images

#Initialize Word Cloud

wc = WordCloud(font_path=r'C:\Windows\Fonts\simkai.ttf',

background_color='white', mask=image,

max_font_size=100, max_words=2000)

wc.generate(words) #Generate word cloud

wc.to_file ('img.png')#generate images

image_file = Image.open ('img.png ') #Open image

image_file.show()

Okay, code is complete, look at the effect:

Using the word cloud map, you can see what fans are commenting on at a glance, because I climbed the second game of the Rockets 'home game against the Warriors, and the most discussed is the Warriors Rockets, followed by Durant, the God of Death. Durant scored 38 points or lost to the Rockets in this game. Naturally, he was discussed the most. Also, Tucker, who stood up in this game, scored 6 of 5 from 3 points and broke the highest score in the personal playoffs. It was normal to discuss him. There is also a very conspicuous is the third quarter, many people think the Warriors are "brave three crazy," think this game Warriors will break out in the third quarter? In fact, this season's Rockets third quarter is also very strong, no weaker than the Warriors.

Did reading the above help you? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report