How to use Python to quickly crawl bilibili videos in batches 07/06 Update SLTechnology News&Howtos

How to use Python to quickly crawl bilibili videos in batches

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to use Python to quickly crawl bilibili videos in batches. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

I. Overview of the project

1. Project background

Bilibili https://www.bilibili.com/ is a magical place. It is a treasure house of everything. It can meet almost all your needs and visual desires. Whether you want to watch animation, drama, games, ghosts or animals, or technology and all kinds of teaching videos, you can basically find anything you can think of in bilibili. For programmers or people who are about to become programmers, bilibili has endless learning resources for programming, but bilibili does not provide the function of downloading. If you want to save the download and look at it when you need it, it will be a problem. I also encountered this problem, so I studied how to download videos with one click, and finally realized it in the magical language of Python.

two。 Environment configuration

This project does not require much environment configuration, the most important thing is to have ffmpeg (a set of open source computer programs that can be used to record, convert digital audio and video, and convert them into streams) and set environment variables. Ffmpeg is mainly used to merge downloaded video and audio to form a complete video.

Download ffmpeg

You can click https://download.csdn.net/download/CUFEECR/12234789 or go to the official website http://ffmpeg.org/download.html to download and unzip to the directory you want to save.

Set environment variabl

Copy the bin path of ffmpeg, such as xxx\ ffmpeg-20190921-ba24b24-win64-shared\ bin

Right-click on the properties of this computer to enter the control panel\ system and security\ system

Click Advanced system Settings → to enter the system properties pop-up window → Click the environment variable → to enter the environment variable pop-up window → Select Path → under the system variable click Edit click → to enter the edit environment variable pop-up window

Click the bin path you copied before pasting the new →.

Click OK, and the example of gradually saving and exiting dynamic operations is as follows:

In addition to ffmpeg, you also need to install the pyinstaller library for program packaging. You can install it with the following command:

Pip install pyinstaller

If you encounter installation failure or slow download speed, you can change the source:

Pip install pyinstaller-I https://pypi.doubanio.com/simple/ II. Project implementation

1. Import required libraries

Import jsonimport osimport reimport shutilimport sslimport timeimport requestsfrom concurrent.futures import ThreadPoolExecutorfrom lxml import etree

Imported libraries include libraries for crawling and parsing web pages, as well as libraries for creating thread pools and other processing, most of which are included with Python. If there are uninstalled libraries, you can use the pip install xxx command to install them.

two。 Set request parameters

# # set request first-class parameters to prevent headers = {'Accept':' * / *', 'Accept-Language':' en-US,en;q=0.5', 'User-Agent':' Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36'} params = {'from':' search', 'seid':' 9698329271136034665'}

Set the request first-class parameters to reduce the possibility of being anti-crawled.

3. Basic treatment

Def re_video_info (text, pattern):''use regular expressions to match video information and convert it into json''' match = re.search (pattern) Text) return json.loads (match.group (1)) def create_folder (aid):''create folder''if not os.path.exists (aid): os.mkdir (aid) def remove_move_file (aid):''delete and move files' file_list = os.listdir ('. /') for file in file_list: # # remove temporary File if file.endswith ('_ video.mp4'): os.remove (file) pass elif file.endswith ('_ audio.mp4'): os.remove (file) pass # # Save the final video file elif file.endswith ('.mp4'): if os.path.exists (aid +') /'+ file): os.remove (aid +'/'+ file) shutil.move (file Aid)

It mainly includes two basic processes to prepare for official crawling and downloading:

Using the regular expression extraction information to get the requested web page through the requests library, which belongs to the text, through the regular expression extraction to get the useful information about the video to be downloaded, to facilitate the next step of processing.

File processing will process the relevant files after the video download is completed, including deleting the generated temporary audio and video separation files and moving the final video files to the specified folder.

4. Download video

Def download_video_batch (referer_url, video_url, audio_url, video_name, index):''batch download series video''# # update request header headers.update ({"Referer": referer_url}) # # get the file name short_name = video_name.split ('/') [2] print ("% d.\ t video download start:% s"% (index) Short_name)) # # download and save video video_content = requests.get (video_url, headers=headers) print ('% d.\ t% s\ t video size:'% (index, short_name), round (int (video_content.headers.get ('content-length', 0)) / 1024 / 1024, 2),'\ tMB') received_video = 0 with open ('% sroomvideo.mp4'% video_name 'ab') as output: headers [' Range'] = 'bytes=' + str (received_video) +'-'response = requests.get (video_url, headers=headers) output.write (response.content) # # download and save audio_content = requests.get (audio_url, headers=headers) print ('% d.\ t% s\ t Audio size:'% (index, short_name) Round (int (audio_content.headers.get ('content-length', 0)) / 1024 / 1024, 2),'\ tMB') received_audio = 0 with open ('% ab' audio.mp4'% video_name, 'ab') as output: headers [' Range'] = 'bytes=' + str (received_audio) +'-'response = requests.get (audio_url) Headers=headers) output.write (response.content) received_audio + = len (response.content) return video_name, indexdef download_video_single (referer_url, video_url, audio_url Video_name):''single video download''# # update request header headers.update ({"Referer": referer_url}) print ("video download start:% s"% video_name) # # download and save video video_content = requests.get (video_url, headers=headers) print ('% s\ t video size:'% video_name Round (int (video_content.headers.get ('content-length', 0)) / 1024 / 1024, 2),'\ tMB') received_video = 0 with open ('% sroomvideo.mp4'% video_name, 'ab') as output: headers [' Range'] = 'bytes=' + str (received_video) +'-'response = requests.get (video_url) Headers=headers) output.write (response.content) # # download and save audio audio_content = requests.get (audio_url, headers=headers) print ('% s\ t audio size:'% video_name, round (int ('content-length', 0)) / 1024 / 1024, 2),'\ tMB') received_audio = 0 with open ('% audio audio.mp4'% video_name 'ab') as output: headers [' Range'] = 'bytes=' + str (received_audio) +'-'response = requests.get (audio_url, headers=headers) output.write (response.content) received_audio + = len (response.content) print ("Video download end:% s"% video_name) video_audio_merge_single (video_name)

This part includes the batch download of a series of videos and the download of a single video, the general principles of which are similar, but because the parameters of the two functions are different, they are implemented respectively. In the specific implementation, first update the request header, request the video link and save the video (no sound), then request the audio link and save the audio, and get the corresponding video and audio file size in this process.

5. Video and audio are merged into a complete video

Def video_audio_merge_batch (result):''use ffmpeg for batch video and audio merging' 'video_name = result.result () [0] index = result.result () [1] import subprocess video_final = video_name.replace (' video' 'video_final') command =' ffmpeg-I "% s_video.mp4"-I "% s_audio.mp4"-c copy "% s.mp4"-y-loglevel quiet'% (video_name, video_name, video_final) subprocess.Popen (command, shell=True) print ("% d.\ t Video download ends:% s"% (index) Video_name.split ('/') [2]) def video_audio_merge_single (video_name):''merge single video and audio using ffmpeg' 'print ("video synthesis start:% s"% video_name) import subprocess command =' ffmpeg-I "% s_video.mp4"-I "% s_audio.mp4"-c copy "% s.mp4"-y-loglevel quiet'% (video_name, video_name) Video_name) subprocess.Popen (command, shell=True) print ("Video composition end:% s"% video_name)

This process is also batch and individual separate, roughly the same principle, is to call the subprogress module to generate sub-processes, Popen class to execute the shell command, since ffmpeg has been added to the environment variable, the shell command can directly call ffmpeg to merge audio and video.

III. Project analysis and description

1. Result test

The results of the three tests are as follows:

Thank you for reading! This is the end of this article on "how to use Python to quickly climb bilibili videos in batches". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.