How to realize the downloader's automatic crawling and collecting bilibili's on-screen comment by Python programming 07/09 Update SLTechnology News&Howtos

How to realize the downloader's automatic crawling and collecting bilibili's on-screen comment by Python programming

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

It is believed that many inexperienced people do not know what to do about how to realize the downloader to crawl and collect bilibili's bullet screen automatically by Python programming. therefore, this paper summarizes the causes and solutions of the problem. I hope you can solve this problem through this article.

The article briefly introduces the crawling method of bilibili's on-screen comments. You only need to find the parameter cid in the video to capture all the on-screen comments under the video. Although the idea is very simple, I still feel troublesome. For example, one day after that, when I want to capture a certain video on bilibili, I still need to start from scratch: find cid parameters, write code, and repeat monotonously.

So I wonder if it is possible to capture a certain video on-screen comment in one step, such as entering the video link you want to crawl, and the program can automatically identify and download it.

Realize the effect

Based on this, with the help of PyQt5, I wrote a gadget that only needs to provide the url of the target video and the path of the target txt. The program automatically collects the on-screen comment under the video and saves the data to the target txt text. First, take a look at the preview effect:

The official account of PS Wechat has a limit on the number of frames of a motion picture. When making a motion picture, I deleted some of the content, so the effect may not be smooth.

The implementation of the tool is divided into two parts: UI interface and data acquisition, and the Python library is used:

Import requestsimport refrom PyQt5.QtWidgets import * from PyQt5 import QtCorefrom PyQt5.QtGui import * from PyQt5.QtCore import QThread, pyqtSignalfrom bs4 import BeautifulSoupUI interface

With the help of PyQt5, the UI interface places two buttons (start downloading and saving to), enter the editline control of the video link and the debugging window

The code is as follows:

Def _ init__ (self,parent = None): super (Ui_From Self). _ init__ (parent=parent) self.setWindowTitle ("bilibili on-screen comment Collection") self.setWindowIcon (QIcon ('pic.jpg')) # Icon self.top_label = QLabel ("author: Xiao Zhang\ n Wechat official account: Xiao Zhang Python") self.top_label.setAlignment (QtCore.Qt.AlignHCenter) self.top_label.setStyleSheet (' color:red) Font-weight:bold ') self.label = QLabel ("bilibili Video url") self.label.setAlignment (QtCore.Qt.AlignHCenter) self.editline1 = QLineEdit () self.pushButton = QPushButton ("start downloading") self.pushButton.setEnabled (False) # close launch self.Console = QListWidget () self.saveButton = QPushButton ("Save to") self.layout = QGridLayout () Self.layout.addWidget (self.top_label Self.layout.addWidget (self.label,1,0) self.layout.addWidget (self.editline1,1,1) self.layout.addWidget (self.pushButton,2,0) self.layout.addWidget (self.saveButton,3,0) self.layout.addWidget (self.Console,2,1) 3) self.setLayout (self.layout) self.savepath = None self.pushButton.clicked.connect (self.downButton) self.saveButton.clicked.connect (self.savePushbutton) self.editline1.textChanged.connect (self.syns_lineEdit)

When the url is not empty and the target text storage path has been set, you can enter the data acquisition module.

The code that implements this feature:

Def syns_lineEdit (self): if self.editline1.text (): self.pushButton.setEnabled (True) # Open button def savePushbutton (self): savePath = QFileDialog.getSaveFileName (self,'Save Path','/','txt (* .txt)') if savePath [0]: # Select txt file path self.savepath = str (savePath [0]) # to collect assignment data

After the program gets the url, the first step is to visit url to extract the cid parameters (a series of numbers) of the video on the current page.

The API interface for storing the video barrage is constructed by using cid parameters, and then the text acquisition is realized by using conventional requests and bs4 packets.

Data acquisition part code:

F = open (self.savepath, 'wreckage, encoding='utf-8') # Open the txt file res = requests.get (url) res.encoding =' utf-8' soup = BeautifulSoup (res.text) 'lxml') items = soup.find_all (' d') # find the d tag for item in items: text = item.text f.write (text) f.write ('\ n') f.close ()

The cid parameter is not located on the tag of the regular html, so I chose re regular matching when I extracted it, but this step consumes more sub-memory. In order to reduce the impact on the response speed of the UI interface, this step is implemented by a single thread.

Class Parsetext (QThread): trigger = pyqtSignal (str) # signal transmission Def _ init__ (self,text,parent = None): super (Parsetext Self). _ init__ () self.text = text def _ del__ (self): self.wait () def run (self): print ('parse-{}' .format (self.text)) result_url = re.findall ('. *? "baseUrl": "(. *?)", "base_url". *?' Self.text) [0] self.trigger.emit (result_url)

After reading the above, have you mastered the method of how to realize the downloader to automatically crawl and collect bilibili's on-screen comment by Python programming? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.