In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Python how to crawl Tencent Video running man's comments and do simple text visual analysis, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.
The fifth season of running Bar has been broadcast twice. Relying on the areas along the Yellow River Ecological Economic Belt, the program shows the important position of the Yellow River Basin through innovative game settings, live broadcast, poverty alleviation and other new forms, and describes the "cultural beauty" of the cities in the Yellow River eco-economic belt.
However, netizens do not seem to buy it. After Deng Chao and Zheng Kai quit running men, "the ratings are obviously not as good as before", while complaints seem to have increased. In order to understand what onlookers think of running men, I climbed Tencent Video's comments on running men and made a simple text "visual analysis".
Data acquisition
Tencent Video's comments need to click "View more comments" to load more data, which is obviously a dynamic web page, and the comment content uses "Ajax dynamic loading technology". Therefore, we need to find the "real URL" and then request the data. Get cursor=? through real URL And _? These two parameters are fine. The core code is as follows:
Def main (): # _ = of the initial page? Page=1607948139253 # cursor=? of the page to be refreshed initially LastId= "0" for i in range (1p1000): time.sleep (1) html = get_content (page,lastId) # get comment data commentlist=get_comment (html) print ("- th" + str (I) + "turn page comment -") k = 0 for j in range (1 Len (commentlist): comment = commentlist [j] k + = 1 print (% s comments:% s% (k) Comment)) # get the next round of refresh page ID lastId=get_lastId (html) page + = 1if _ _ name__ = ='_ _ main__': main () data processing import related package import jiebaimport reimport numpy as npimport pandas as pd import matplotlib.pyplot as plt from pyecharts.charts import * from pyecharts import options as opts from pyecharts.globals import ThemeType import stylecloudfrom IPython.display import Image import comment data
Two reviews were crawled separately, so you need to read and merge all the data separately.
Df1 = pd.read_csv ('/ Tencent comment / paonan.csv',names= ['comment content]) df2 = pd.read_csv (' / Tencent Review / paonan1.csv',names= ['comment content]) df = pd.concat ([df1,df2]) df.head (10)
Data preview
View data information print ('total number of comments:', df.shape [0], 'messages')
Total comments: 21307
Df.info () df ['comment content'] = df ['comment content'] .astype ('str') Int64Index:21307 entries 0 to 11833Data columns (total 1 columns): # Column Non-Null Count Dtype-- 0 comment content 21199 non-null objectdtypes: object (1) memory usage: 332.9 + KB Delete duplicate comment df = df.drop_duplicates () Delete missing data df = df.dropna () add comment type
The types of comments are artificially divided: short comments with less than 20 words, medium comments with 20-50 words, and long comments with more than 50 words.
Cut = lambda x: 'essay' if len (x)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.