In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about how to use Python to climb video on-screen comment. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Preface
Before iqiyi alone broadcast the hit drama "son-in-law" is very popular, the author has been chasing, with the help of the technology in hand, want to climb the barrage to analyze the specific situation of the play and netizens' comments!
In order to let Xiaobai thoroughly learn how to use python to crawl iqiyi's on-screen comment, this article describes in detail how to crawl, and the following article will analyze the data!
Analyze packet
1. Lookup packet
Press F12 in the browser
Find this kind of url
Https://cmts.iqiyi.com/bullet/54/00/7973227714515400_60_2_5f3b2e24.br
two。 Analyze on-screen comment links
Among them, / 5400 Compact 7973227714515400 is useful!
Iqiyi's on-screen comment is obtained at the following address:
Https://cmts.iqiyi.com/bullet/ parameter 1, parameter 300 _ parameter 2.z
Parameter 1 is: / 54 Universe 7973227714515400
Parameter 2 is: numbers 1, 2, 3.
Iqiyi loads new barrage every 5 minutes, and each episode is 46 minutes. 46 divided by 5 is 10.
So the link to the on-screen comment is as follows:
Https://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_1.zhttps://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_2.zhttps://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_3.z......https://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_10.z
3. Decode binary packet
The on-screen comment package downloaded through the on-screen comment link is a file with z as the suffix and needs to be decoded!
Def zipdecode (bulletold): 'decode zip compressed binary content into text' decode = zlib.decompress (bytearray (bulletold), 15 + 32). Decode ('utf-8') return decode
Save the data to xml format after decoding
# write the encoded files into a xml file (similar to the txt file) to facilitate the subsequent fetch of data with open ('. / lyc/zx' + str (x) + '.xml', 'ajar, encoding='utf-8') as f: f.write (xml)
Parsing xml
1. Extract data
By looking at the xml file, we need to extract 1. User id (uid), 2. Comment content (content), 3. Comment on the number of points (likeCount).
# read on-screen comment data from xml.dom.minidom import parseimport xml.dom.minidomdef xml_parse (file_name) in xml file: DOMTree = xml.dom.minidom.parse (file_name) collection = DOMTree.documentElement # get all entry data in the collection entrys = collection.getElementsByTagName ("entry") print (entrys) result = [] for entry in entrys: uid = entry.getElementsByTagName ('uid') [ 0] content = entry.getElementsByTagName ('content') [0] likeCount = entry.getElementsByTagName (' likeCount') [0] print (uid.childNodes [0] .data) print (content.childNodes [0] .data) print (likeCount.childNodes [0] .data) saves data
1. Work before saving
Import xlwt# create a workbook setting Encoding workbook = xlwt.Workbook (encoding = 'utf-8') # create a worksheetworksheet = workbook.add_sheet (' sheet1') # write the excel# parameter corresponding to the row, column, value worksheet.write (0label='uid' 0, label='uid') worksheet.write (0L1, label='content') worksheet.write (0L2, label='likeCount')
Import the xlwt library (write to csv) and define the title (uid, content, likeCount)
two。 Write data
For entry in entrys: uid = entry.getElementsByTagName ('uid') [0] content = entry.getElementsByTagName (' content') [0] likeCount = entry.getElementsByTagName ('likeCount') [0] print (uid.childNodes [0] .data) print (content.childNodes [0] .data) print (likeCount.childNodes [0] .data) # write excel # parameter corresponding to row, column, value worksheet.write (count, 0 Label=str (uid.childNodes [0] .data) worksheet.write (count, 1, label=str (content.childNodes [0] .data)) worksheet.write (count, 2, label=str (likeCount.childNodes [0] .data)) count=count+1
Finally, it is saved into the on-screen comment data set-Li Yunchen .xls.
For x in range (1mem11): l = xml_parse (". / lyc/zx" + str (x) + ".xml") # Save workbook.save ('on-screen comment dataset-Li Yunchen .xls')
After reading the above, do you have any further understanding of how to use Python to crawl video on-screen comments? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.