In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to use python to climb Tencent Video Ha bullet screen", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to use python to climb Tencent Video Ha bullet screen" this article.
Data acquisition
At present, "ha" has been broadcast for 10 episodes, and this article has crawled two on-screen comments from the 10th episode. The on-screen comment data crawler has been explained in detail in previous original articles. I will not repeat it in this article. The complete code is given below:
#-*-coding = uft-8-*-import requestsimport jsonimport timeimport pandas as pddf = pd.DataFrame () for page in range (15, 3973, 30): headers = {'User-Agent':' Mozilla/5.0 (Windows NT WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'} url = 'https://mfm.video.qq.com/danmu?otype=json × tamp= {} & target_id=6384458060%26vid%3Dd0035ka5c02&count=80'.format (page) print ("extracting" + str (page) + "page") html = requests.get (url,headers = headers) bs = json.loads (html.text) Strict = False) # strict parameter solves partial content json format parsing error time.sleep (1) # traversing to get the target field for i in bs ['comments']: name = I [' opername'] # nickname content = I ['content'] # on-screen comment upcount = I [' upcount'] # praise user_degree = I ['uservip_degree'] # Membership level timepoint = I ['timepoint'] # release time comment_id = I [' commentid'] # on-screen comment id cache = pd.DataFrame ({'user name': [name] On-screen comment: [content], 'membership level': [user_degree], 'release time': [timepoint], 'like on-screen comment': [upcount],'on-screen comment id': [comment_id]}) df = pd.concat ([df,cache]) df.to_csv ('-1.csv',encoding =' utf-8') print (df.shape) data processing
1. Data read and preview
First of all, the data of the two on-screen comment csv files are merged and the concat method is adopted.
Import pandas as pdimport numpy as npdf1 = pd.read_csv ("/ on-screen comment / Tencent / ha / -1.csv") df1 ["number of issues"] = "10 issues" df2 = pd.read_csv ("/ on-screen comments / Tencent / ha / -2.csv") df2 ["issues"] = "10 issues" df = pd.concat ([df1,df2]) df.sample (10)
Sample 10 on-screen comments, and the preview results are as follows:
two。 View data information
Df.info () Int64Index: 47687 entries 0 to 21820Data columns (total 8 columns): # Column Non-Null Count Dtype-0 Unnamed: 0 47687 non-null int64 1 user name 13433 non-null object 2 on-screen comment 47687 non-null object 3 membership 47687 non-null int64 4 release time 47687 non -null int64 5 on-screen comment 47687 non-null int64 6 on-screen comment id 47687 non-null int64 7 47687 non-null objectdtypes: int64 (5) Object (3) memory usage: 3.3 + MB
The following problems are found in the data:
(1) the field name can be adjusted
(2) redundant fields such as Unnamed and on-screen comment id
(3) the user name field has a missing value, which can be filled in.
(4) the type of release time field needs to be adjusted
3. Data cleaning
# rename field df = df.rename (columns= {'username': 'user nickname','on-screen comment':'on-screen comment content', 'release time': 'send time', 'comment like': 'like on-screen comment', 'like on-screen comment') # filter unwanted fields df = df [["user nickname", "on-screen comment content", "membership level", "send time", "like on screen comment" "number of periods"] # missing values fill in df ["user nickname"] = df ["user nickname"] .fillna ("John Doe")
After cleaning, the data preview is as follows:
# draw word cloud text1 = get_cut_words (content_series=df ['on-screen comment']) stylecloud.gen_stylecloud (text=' '.join (text1), max_words=200, collocations=False, font_path='simhei.ttf', icon_name='fas fa-video', size=653 # palette='matplotlib.Inferno_9', output_name='./.png') Image (filename='./.png')
two。 Who is mentioned in the barrage?
Lu Han was mentioned 7329 times by the audience, Wang Chenyi 3222 times, Zhang Yanqi 1632 times. Lu Han's powder-absorbing physique has brought a large amount of traffic to the variety show, while Chen he seems to have been forgotten by some viewers in the latest episode.
Df8 = df ["people mentioned"] .value_counts () [1:11] print (df8.index.to_list ()) print (df8.to_list ()) c = (Bar (init_opts=opts.InitOpts (theme=ThemeType.WALDEN)) .add _ xaxis (df8.index.to_list ()) .add _ yaxis ("", df8.to_list ()) .set _ global_opts (title_opts=opts.TitleOpts (title= "number of people mentioned") Subtitle= "data Source: Tencent Video\ t drawing: dish J learn Python", pos_left = 'left'), xaxis_opts=opts.AxisOpts (axislabel_opts=opts.LabelOpts (font_size=13)), # change Abscissa font size yaxis_opts=opts.AxisOpts (axislabel_opts=opts.LabelOpts (font_size=13)) # change the ordinate font size) .set _ series_opts (label_opts=opts.LabelOpts (font_size=16,position='top')) c.render_notebook ()
We drew the cloud pictures of the six main actors, and found that their popularity was really good, and there were almost no negative comments. Chen he has a lot of nicknames: Hercules, Brother Hege, I don't know if I don't make a word cloud. It seems that Brother J needs to add his knowledge.
3. Who is the barrage transmitter?
Every day is Xiao Chunri and launches a total of 158 barrage, far ahead of other barrage parties, worthy of the name of the barrage transmitter. Thinking too much, the de cat followed, firing 97 barrage, and if you remember, he or she is also the "captivating offer" season 2 barrage transmitter.
Df8 = df ["user nickname"] .value_counts () [1:11] df8 = df8.sort_values (ascending=True) df8 = df8.tail (10) print (df8.index.to_list ()) print (df8.to_list ()) c = (Bar (init_opts=opts.InitOpts (theme=ThemeType.WALDEN,width= "1100px", height= "500px")) .add _ xaxis (df8.index.to_list ()) .add _ yaxis ("" Df8.to_list (). Reversal_axis () # X axis and y axis swap order. Set _ global_opts (title_opts=opts.TitleOpts (title= "number of on-screen comments sent TOP10", subtitle= "data Source: Tencent Video\ t drawing: vegetable J learn Python", pos_left = 'left'), xaxis_opts=opts.AxisOpts (axislabel_opts=opts.LabelOpts (font_size=13)) # change Abscissa font size yaxis_opts=opts.AxisOpts (axislabel_opts=opts.LabelOpts (font_size=13)), # change ordinate font size # yaxis_opts=opts.AxisOpts (axislabel_opts= {"rotate": 40}) # change ordinate font size) .set _ series_opts (label_opts=opts.LabelOpts (font_size=16) Position='right')) c.render_notebook ()
Let's take a look at what the on-screen barrage transmitter discussed. Through the descending order of comments, we screened out the 10 most popular on-screen comments, almost all of which are related to Wang Mian, and there is no doubt about loyal fans.
Df_first = df [df ["user nickname"] = "every day is Spring Day and"] df_first.sort_values ('like on-screen comment', inplace=True,ascending=False) df_first [: 10]
The above is all the contents of the article "how to use python to climb Tencent Video's bullet screen". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.