In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you "Python how to climb a certain station video on-screen comment and draw a word cloud map", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "Python how to climb a certain station video on-screen comment and draw word cloud map" this article.
Preface
[lesson questions]:
Python crawls the video on-screen comment of a certain station or Tencent Video on-screen comment, and draws the word cloud picture.
[knowledge points]:
1. Basic process of crawler
two。 Regular pattern
3. Requests > > pip install requests
4. Jieba > > pip install jieba
5. Imageio > > pip install imageio
6. Wordcloud > > pip install wordcloud
[development environment]:
Python 3.8
Pycharm
Win + R enter cmd enter installation command pip install module name if there is a hit may be due to the network connection timeout to switch the domestic mirror source
The corresponding installation package / installation tutorial / activation code / use tutorial / learning materials / tool plug-in can contact me
The basic train of thought process of climbing on-screen crawler
one。 Data source analysis
1. Are you sure what data we want?
Crawl the data of a certain station and save the text txt
two。 Grab package analysis through developer tools.
The data address of the video on-screen comment can be found directly through the interface.
two。 Crawler code implementation steps
1. Send a request, send a request for (a comment)
Need to pay attention to:
-determination of request method
-request header parameters
two。 Get data, get the data returned by the server
3. Parse the data, extract the content of the data we want, on-screen comment
4. Save the data and save the txt text of the acquired data.
Emulate the browser to send a request to the server
Import module import requests # data request module third party module pip install requestsimport re # regular expression module built-in module does not need to install code # # 1. Send request # url ='(comment)'# # headers request header disguises the Python code as a browser to send the request # # user-agent browser basic identity # # headers request header dictionary data type # headers = {# 'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'#} # # through the get request method in the requests module, send a request for the url address and carry the headers request header, and finally use the response variable to receive the return data # response = requests.get (url=url Headers=headers) # response.encoding = response.apparent_encoding# # response object 200status code indicates that the request was successful # # if you want to get the same data content as the source code of the web page, get the text data of the response body # # if the data returned by the server is not a complete json data dictionary data to directly obtain response.json (), it will report an error # # 2. Get data response.text returns data html string data # # print (response.text) # # 3. Parsing data, parsing method re [can directly extract string data] css xpath [mainly based on tag attributes / nodes to extract data] # # () exact matching indicates the desired data universal matching. *? Regular expression metacharacters can match any character (except newline character\ n) # data_list = re.findall ('(. *?)', response.text) # for index in data_list:# # mode Save method encoding Encoding # # pprint.pprint () formatting input json dictionary data # with open ('on-screen comment .txt', mode='a') Encoding='utf-8') as fug # f.write (index) # f.write ('\ n') # print (index) url = 'https://mapi.vip.com/vips-mobile/rest/shopping/pc/search/product/rank?callback=getMerchandiseIds&app_name=shop_pc&app_version=4.0&warehouse=VIP_NH&fdc_area_id=104104101&client=pc&mobile_platform=1&province_id=104104&api_key=70f71280d5d547b2a7bb370a529aeea1&user_id=&mars_cid=1634797375792_17a23bdc351b36f2915c2f7ec16dc88e & wap_consumer=a&standby_id=nature&keyword=%E5%8F%A3%E7%BA%A2&lv3CatIds=&lv2CatIds=&lv1CatIds=&brandStoreSns=&props=&priceMin=&priceMax=&vipService=&sort=0&pageOffset=0&channelId=1&gPlatform=PC&batchSize=120&_=1639640088314'headers = {'referer':' https://category.vip.com/', 'user-agent':'Mozilla/5.0 (Windows NT 10.0 Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'} response = requests.get (url=url, headers=headers) print (response.text)
Make a word cloud map
[knowledge points]:
1. Basic process of crawler
two。 Regular pattern
3. Requests > > pip install requests
4. Jieba > > pip install jieba
5. Imageio > > pip install imageio
6. Wordcloud > > pip install wordcloud
[development environment]:
Python 3.8
Pycharm
Import module import jieba # stutter participle pip install jiebaimport wordcloud # word cloud image pip install wordcloudimport imageio # read local picture modification word cloud graph img = imageio.imread ('Apple .png') read on-screen comment data f = open ('on-screen comment .txt', encoding='utf-8') text = f.read () # print (text)
To divide a sentence into many words.
Text_list = jieba.lcut (text) print (text_list) # list is converted to the string text_str = '.join (text_list) print (text_str)
Word cloud map configuration
Wc = wordcloud.WordCloud (width=500, # width height=500, # height background_color='white', # background color mask=img, stopwords= {'every', 'one','re','of', 'dream', 'help'}, font_path='msyh.ttc' # font file) wc.generate (text_str) wc.to_file ('word cloud 1.png')
The above is all the contents of the article "how to climb a video on-screen comment and draw a word cloud map by Python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.