In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
In this article Xiaobian for you to introduce in detail "JavaScript visual display data example analysis", the content is detailed, the steps are clear, the details are handled properly, I hope this "JavaScript visual display data example analysis" article can help you solve your doubts, the following follow the editor's ideas slowly in-depth, together to learn new knowledge.
Douyin data collection, analysis, visual display of data, fan portraits, comment word cloud
It mainly contains the nickname, gender, location, type, likes, fans, videos, comments, sharing, followers, graduation school, certification, profile and other information.
Among them, the one with the most fans is "People's Daily", nearly 120 million. "CCTV News" also broke 100 million. I remember that when I broke 100 million, I went on a hot search.
The blogger with the least number of followers also has nearly 150w + followers, with more than 5000 big Vs totaling 23.65 billion fans, more than three times the population of the earth!
Data visualization
Import related libraries, and then read the data.
1. From pyecharts.charts import Pie, Bar, TreeMap, Map, Geo2. From wordcloud import WordCloud, ImageColorGenerator3. From pyecharts import options as opts4. Import matplotlib.pyplot as plt5. From PIL import Image6. Import pandas as pd7. Import numpy as np8. Import jieba9. 10. Df = pd.read_csv ('douyin.csv', header=0, encoding='utf-8-sig') 11. Print (df) gender distribution
On the whole, there is little difference in the ratio of men to women. Except for the unknown data, it's basically 1:1. The visualization code is as follows.
1. Def create_gender (df): 2. Df = df.copy () 3. # modify the value 4. Df.loc [df.gender = = '0mm,' gender'] = 'unknown' 5. Df.loc [df.gender = = '1mm,' gender'] = 'male' 6. Df.loc [df.gender = ='2' 'gender'] =' female'7. # grouped by gender 8. Gender_message = df.groupby (['gender']) 9. # count the results of the grouping 10. Gender_com = gender_message [' gender'] .agg (['count']) 11. Gender_com.reset_index (inplace=True) 12. 13. # pie chart data 14. Attr = gender_com ['gender'] 15. V1 = gender_com [' count'] 16. 17. # initialization configuration 18. Pie = Pie (init_opts=opts.InitOpts (width= "800px") Height= "400px") 19. # add data Set the radius 20. Pie.add ("", [list (z) for z in zip (attr, v1)], radius= ["40%", "75"]) 21. # set global configuration items, title, legend, toolbox (download pictures) 22. Pie.set_global_opts (title= "gender Distribution of Douyin Big V", pos_left= "center", pos_top= "top"), 23. Legend_opts=opts.LegendOpts (orient= "vertical" pos_left= "left"), 24. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}) 25. # set series configuration items, label style 26. Pie.set_series_opts (label_opts=opts.LabelOpts (is_show=True, formatter= "{b}: {d}%")) 27. The number of likes of pie.render ("gender distribution of Big V in Douyin .html")
Like the number of TOP10, except for "small group" and "poisonous tongue", the other is the big V of the news media category.
This year, because of the epidemic, a lot of news spread in the first place on Douyin, so the influence is relatively large, so there are more likes.
I remember that "Sichuan Watch" was also ridiculed as watching around in the comment area, meaning that the news was released very quickly.
Wonder why there are 1 million likes for Big V and 20w + for small F Douyin.
Finally, it is found that it is the problem included in the third-party monitoring, and this batch of data can be deleted directly next time.
There are more than 500 big Vs who like more than 100 million, and the number of big Vs with 10 million to 50 million likes is the largest.
The visualization code is as follows.
1. Def create_likes (df): 2. # sort Descending 3. Df = df.sort_values ('likes', ascending=False) 4. # get TOP10 data 5. Attr = df [' name'] [0:10] 6. V1 = [float ('.1f'% (float (I) / 100000000)) for i in df ['likes'] [0:10]] 7. 8. # initialization configuration 9. Bar = Bar (init_opts=opts.InitOpts (width= "800px") Height= "400px") 10. # x-axis data 11. Bar.add_xaxis (list (reversed (attr.tolist () 12. # y-axis data 13. Bar.add_yaxis (", list (reversed (v1) 14. # set the global configuration item Title, toolkit (download picture), y-axis split line 15. Bar.set_global_opts (title= "Douyin big V TOP10", pos_left= "center", pos_top= "18"), 16. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}) 17. Xaxis_opts=opts.AxisOpts (splitline_opts=opts.SplitLineOpts (is_show=True)) 18. # set series configuration items Label style 19. Bar.set_series_opts (label_opts=opts.LabelOpts (is_show=True, position= "right", color= "black")) 20. Bar.reversal_axis () 21. Bar.render ("Douyin big V likes TOP10 (100 million) .html") 22. 23. 24. Def create_cut_likes (df): 25. # segment the data into 26. Bins = [0, 1000000, 5000000, 10000000, 25000000, 50000000, 10000000000, 5000000000] 27. Labels = ['0-100, '100-500,' 500-1000, '1000-2500,' 2500-5000, '5000-10000,' 10000 and above] 28. Len_stage = pd.cut (df ['likes'], bins=Bins, labels=Labels). Value_counts (). Sort_index () 29. # get data 30. Attr = len_stage.index.tolist () 31. V1 = len_stage.values.tolist () 32. thirty-three。 # generate bar chart 34. Bar = Bar (init_opts=opts.InitOpts (width= "800px", height= "400px") 35. Bar.add_xaxis (attr) 36. Bar.add_yaxis (", v1) 37. Bar.set_global_opts (title_opts=opts.TitleOpts (title= "Douyin big V likes distribution (ten thousand)", pos_left= "center", pos_top= "18"), 38. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}), 39. Yaxis_opts=opts.AxisOpts (splitline_opts=opts.SplitLineOpts (is_show=True)) 40. Bar.set_series_opts (label_opts=opts.LabelOpts (is_show=True, position= "top", color= "black") 41. Bar.render ("Douyin big V likes distribution (10,000) .html") number of fans
The fans of "People's Daily" and "CCTV News" have reached 100 million.
Compared with last year's Douyin figures, Reba has hundreds of thousands fewer fans, while Chen he has gained a lot of fans.
This year's live broadcast is very popular, and it is not surprising that Li Jiaqi is in the top ten. After all, he brought goods with him.
Let's take a look at the distribution of the number of big V fans.
Over 50 million, 56, good bosses.
200w~500w has the largest number of bloggers, and many bloggers who have become popular for a while have barely gained fans after a period of time.
They may all stay here, such as the "three flowers" that Xiao F has brushed before, but they can all be popular.
The visualization code here is similar to the above, so it won't be released.
Number of comments TOP10
The comment area of Douyin video is also an interesting place.
For example, brushing the drama to urge more, "hurry to update, it has been more than ten minutes, the donkeys of the production team do not dare to rest for so long."
And five cats that shook their heads crazily also occupied the comment area for some time.
Let's just say, it's too magic.
Generally speaking, there are many video comments in the media category.
Number of shares TOP10
The sharing of Douyin is a way for video to spread to the outside world, so that more people can see the video.
From a statistical point of view, people still prefer to share news and food videos.
It is possible to spend a month at home during the Lunar New year epidemic, except for Ge You lying down and watching the news.
Everyone has a dream of becoming a chef.
Summary distribution map of the number of likes / fans of each type
Remember that a big shot once said that Douyin this product is to kill your time (Kill Time), not save time (Save Time), slightly more sophisticated video basically can not survive.
As can be seen from the rectangular tree above, everyone likes the "beauty" type of video. After all, who doesn't like pretty girls?
For example, there are too many popular videos of girls watching bronze people affectionately, girls who send stars to the sky in the college entrance examination, knives and so on.
In addition, "funny", "game", "plot" kind of video is also more attractive, proper Kill Time.
The visualization code is as follows.
1. Def create_type_likes (df): 2. # grouping sum 3. Likes_type_message = df.groupby (['category']) 4. Likes_type_com = likes_type_message [' likes'] .agg (['sum']) 5. Likes_type_com.reset_index (inplace=True) 6. # processing data 7. Dom = [] 8. For name, num in zip (likes_type_com [' category']) Likes_type_com ['sum']: 9. Data = {} 10. Data [' name'] = name11. Data ['value'] = num12. Dom.append (data) 13. Print (dom) 14. 15. # initialize configuration 16. Treemap = TreeMap (init_opts=opts.InitOpts (width= "800px", height= "400px")) 17. # add data 18. Treemap.add ('', dom) 19. # set the global configuration item Title, toolbox (download pictures) 20. Treemap.set_global_opts (title_opts=opts.TitleOpts (title= "Summary Map of Big V likes of various types of Douyin", pos_left= "center", pos_top= "5"), 21. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}), 22. Legend_opts=opts.LegendOpts (is_show=False)) 23. 24. Treemap.render ("summary picture of big V likes of various types of Douyin". Html) average number of likes / fans TOP10
"Li Xian" as the top traffic in 2019, ranked first, there is no problem.
None of the other bloggers Xiao F has followed it.
Went to search and found that most accounts only have one or two videos.
After looking at the comment area, I found that the original number had been sold, and it was possible that Big V broke up with the company. After all, many companies that are Internet celebrities are next if they are not hot.
The other is for individuals to transfer their accounts and make money and run away.
The visualization code is as follows.
1. Def create_avg_likes (df): 2. # screening 3. Df = df [df ['videos'] > 0] 4. # calculate the average number of likes for a single video 5. Df.eval (' result = likes/ (videos*10000), inplace=True) 6. Df ['result'] = df [' result'] .round (decimals=1) 7. Df = df.sort_values ('result', ascending=False) 8. 9. # take TOP1010. Attr = df ['name'] [0:10] 11. V1 = [' .1F'% (float (I)) for i in df ['result'] [0:10]] 12. 13. # initialization configuration 14. Bar = Bar (init_opts=opts.InitOpts (width= "800px") Height= "400px") 15. # add data 16. Bar.add_xaxis (list (reversed (attr.tolist () 17. Bar.add_yaxis (", list (reversed (v1) 18. # set global configuration items Title, toolkit (download pictures), y-axis split line 19. Bar.set_global_opts (title= "Douyin V average likes TOP10 (ten thousand)", pos_left= "center", pos_top= "18"), 20. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}), 21. Xaxis_opts=opts.AxisOpts (splitline_opts=opts.SplitLineOpts (is_show=True)) 22. # set series configuration item 23. Bar.set_series_opts (label_opts=opts.LabelOpts (is_show=True, position= "right", color= "black") 24. # Flip xy axis 25. Bar.reversal_axis () 26. Bar.render ("Douyin Big V average number of likes TOP10 (ten thousand) .html") Douyin Big V distribution
After seeing the province, let's take a look at the city TOP10.
Beijing is far ahead, the gathering place of Big V.
Hangzhou, a city rich in online celebrities, ranked second.
The visualization code is as follows.
1. Def create_city (df): 2. Df1 = df [df ["country"] = "China"] 3. Df1 = df1.copy () 4. Df1 ["city"] = df1 ["city"] .str.replace ("city", ") 5. 6. Df_num = df1.groupby (" city ") [" city "] .agg (count=" count "). Reset_index (). Sort_values (by=" count ") Ascending=False) 7. Df_city = df_num [: 10] ["city"]. Values.tolist () 8. Df_count = df_num [: 10] ["count"]. Values.tolist () 9. 10. Bar = Bar (init_opts=opts.InitOpts (width= "800px", height= "400px")) 11. Bar.add_xaxis (df_city) 12. Bar.add_yaxis ("" Df_count) 13. Bar.set_global_opts (title= "Douyin Big V City Distribution TOP10", pos_left= "center", pos_top= "18"), 14. Toolbox_opts=opts.ToolboxOpts (is_show=True, feature= {"saveAsImage": {}}) 15. Yaxis_opts=opts.AxisOpts (splitline_opts=opts.SplitLineOpts (is_show=True)) 16. Bar.set_series_opts (label_opts=opts.LabelOpts (is_show=True, position= "top", color= "black")) 17. Bar.render ("Douyin Big V Urban Distribution TOP10.html")
After watching at home, it should be abroad.
There are many "foreigners" on Douyin who speak Chinese very well.
The United States ranks first, and many Chinese in the United States will share some things about their life in the United States.
Some people at home are also interested in this aspect to see whether the moon abroad is round or not.
Haha, it's a joke, in fact, it is to let us know about life abroad.
Douyin University V graduation School TOP10
Beijing Film, Zhong Zhuan, Zhe Zhuan, Chinese Opera, Shangxi Opera, Yangmei, proper bigwigs in the entertainment circle.
Check the certification of Big V through the code.
1. Df1 = df [(df ["custom_verify"]! = "") & (df ["custom_verify"]! = "unknown")] 2. Df1 = df1.copy () 3. Df_num = df1.groupby ("custom_verify") ["custom_verify"] .agg (count= "count"). Reset_index (). Sort_values (by= "count", ascending=False) 4. Print (df_num [: 20])
The results are as follows.
All need the talent of performance and expression.
Brief introduction of Douyin Big V Ciyun
It can be seen that most of the Big Vs leave a message of business cooperation, which is good for content creators, so that they can be win-win.
According to statistics, more than 22 million creators in Douyin have achieved an income of more than 41.7 billion yuan.
From creation to creation, Douyin speaks this sentence very well. The visualization code is as follows.
1. Def create_wordcloud (df, picture): 2. Words = pd.read_csv ('chineseStopWords.txt', encoding='gbk', sep='\ t') Names= ['stopword']) 3. # participle 4. Text =' 5. Df1 = df [df ["signature"]! = "] 6. Df1 = df1.copy () 7. For line in df1 ['signature']: 8. Text + ='. Join (jieba.cut (str (line). Replace (", ") Cut_all=False) 9. # stop word 10. Stopwords = set ('') 11. Stopwords.update (words ['stopword']) 12. Backgroud_Image = plt.imread (' douyin.png') 13. # use Douyin background color 14. Alice_coloring = np.array (Image.open (r "douyin.png")) 15. Image_colors = ImageColorGenerator (alice_coloring) 16. Wc = WordCloud (17. Background_color='white') 18. Mask=backgroud_Image,19. Font_path=' founder Orchid Pavilion published black. TTF', 20. Max_words=2000,21. Max_font_size=70,22. Min_font_size=1,23. Prefer_horizontal=1,24. Color_func=image_colors,25. Random_state=50,26. Stopwords=stopwords,27. Margin=528. ) 29. Wc.generate_from_text (text) 30. # see which words have high frequency 31. Process_word = WordCloud.process_text (wc, text) 32. Sort = sorted (process_word.items (), key=lambda e: e [1], reverse=True) 33. Print (sort [: 50]) 34. Plt.imshow (wc) 35. Plt.axis ('off') 36. Wc.to_file (picture) 37. Print ('word cloud generated successfully!') After reading this, the article "JavaScript Visualization data instance Analysis" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.