Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to climb video on-screen comment

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how to use Python to climb video on-screen comment. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Preface

Before iqiyi alone broadcast the hit drama "son-in-law" is very popular, the author has been chasing, with the help of the technology in hand, want to climb the barrage to analyze the specific situation of the play and netizens' comments!

In order to let Xiaobai thoroughly learn how to use python to crawl iqiyi's on-screen comment, this article describes in detail how to crawl, and the following article will analyze the data!

Analyze packet

1. Lookup packet

Press F12 in the browser

Find this kind of url

Https://cmts.iqiyi.com/bullet/54/00/7973227714515400_60_2_5f3b2e24.br

two。 Analyze on-screen comment links

Among them, / 5400 Compact 7973227714515400 is useful!

Iqiyi's on-screen comment is obtained at the following address:

Https://cmts.iqiyi.com/bullet/ parameter 1, parameter 300 _ parameter 2.z

Parameter 1 is: / 54 Universe 7973227714515400

Parameter 2 is: numbers 1, 2, 3.

Iqiyi loads new barrage every 5 minutes, and each episode is 46 minutes. 46 divided by 5 is 10.

So the link to the on-screen comment is as follows:

Https://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_1.zhttps://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_2.zhttps://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_3.z......https://cmts.iqiyi.com/bullet/54/00/7973227714515400_300_10.z

3. Decode binary packet

The on-screen comment package downloaded through the on-screen comment link is a file with z as the suffix and needs to be decoded!

Def zipdecode (bulletold): 'decode zip compressed binary content into text' decode = zlib.decompress (bytearray (bulletold), 15 + 32). Decode ('utf-8') return decode

Save the data to xml format after decoding

# write the encoded files into a xml file (similar to the txt file) to facilitate the subsequent fetch of data with open ('. / lyc/zx' + str (x) + '.xml', 'ajar, encoding='utf-8') as f: f.write (xml)

Parsing xml

1. Extract data

By looking at the xml file, we need to extract 1. User id (uid), 2. Comment content (content), 3. Comment on the number of points (likeCount).

# read on-screen comment data from xml.dom.minidom import parseimport xml.dom.minidomdef xml_parse (file_name) in xml file: DOMTree = xml.dom.minidom.parse (file_name) collection = DOMTree.documentElement # get all entry data in the collection entrys = collection.getElementsByTagName ("entry") print (entrys) result = [] for entry in entrys: uid = entry.getElementsByTagName ('uid') [ 0] content = entry.getElementsByTagName ('content') [0] likeCount = entry.getElementsByTagName (' likeCount') [0] print (uid.childNodes [0] .data) print (content.childNodes [0] .data) print (likeCount.childNodes [0] .data) saves data

1. Work before saving

Import xlwt# create a workbook setting Encoding workbook = xlwt.Workbook (encoding = 'utf-8') # create a worksheetworksheet = workbook.add_sheet (' sheet1') # write the excel# parameter corresponding to the row, column, value worksheet.write (0label='uid' 0, label='uid') worksheet.write (0L1, label='content') worksheet.write (0L2, label='likeCount')

Import the xlwt library (write to csv) and define the title (uid, content, likeCount)

two。 Write data

For entry in entrys: uid = entry.getElementsByTagName ('uid') [0] content = entry.getElementsByTagName (' content') [0] likeCount = entry.getElementsByTagName ('likeCount') [0] print (uid.childNodes [0] .data) print (content.childNodes [0] .data) print (likeCount.childNodes [0] .data) # write excel # parameter corresponding to row, column, value worksheet.write (count, 0 Label=str (uid.childNodes [0] .data) worksheet.write (count, 1, label=str (content.childNodes [0] .data)) worksheet.write (count, 2, label=str (likeCount.childNodes [0] .data)) count=count+1

Finally, it is saved into the on-screen comment data set-Li Yunchen .xls.

For x in range (1mem11): l = xml_parse (". / lyc/zx" + str (x) + ".xml") # Save workbook.save ('on-screen comment dataset-Li Yunchen .xls')

After reading the above, do you have any further understanding of how to use Python to crawl video on-screen comments? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report