How to download and synthesize videos with Python 07/11 Update SLTechnology News&Howtos

How to download and synthesize videos with Python

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "how to achieve video download and synthesis in Python". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to achieve video download and synthesis in Python" can help you solve your doubts.

Module use

Requests > pip install requests (data request third party module)

Re # regular expression to match the extracted data

Json

Development environment

Python 3.8interpreter

Pycharm 2021.2 version recommendation

Win + R enter cmd enter installation command pip install module name if there is a hit may be due to the network connection timeout to switch the domestic mirror source

Case realization 1. Define the demand

To collect content, first analyze where a video is obtained.

Grab package analysis through developer tools to analyze where video data can be obtained from content format m3u8 video content

When the video format of our website is m3u8, there is a file dedicated to storing all ts video clips.

two。 Code implementation steps

Send a request

Get data

Parsing data

Save data

1. Send a request for the url address of the video playback page

two。 Get the data, get the response response data returned by the server

3. Parse the data, extract the data we want, video titles and m3u8 links

4. Send request, send request for m3u8 link

5. Get the data, get the response response data returned by the server

6. Parse the data and extract all ts files url [video clips]

7. Save the data, save all the videos, and then synthesize the video content as a whole

Implement the code import requests # data request module pip install requests enter the command import re # into cmd to import the regular expression module built-in module import jsonimport pprint # format output module for page in range (1 17): print (fallow-collecting the data content of page {page} -') list_url = 'https://www.acfun.cn/u/45321802' # batch ctrl + R Select target data = {' quickViewId': 'ac-space-video-list' 'reqID': page + 1,' ajaxpipe':'1, 'type':' video', 'order':' newest', 'page': page,' pageSize':'20 years, 'tween:' 1649944573765,} headers = {# 'cookies':' your cookie' 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0 Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'} # get request has a params parameter # post request data parameter response = requests.get (url=list_url, params=data, headers=headers) # print (response.text) id_list = re.findall ('a href=.*?ac (. *?) "' Response.text) for index in id_list: video_id = index.replace ('\','') "" 1. Send the request, for the video playback page url address send request, use the python code to simulate the browser to send the request for the url address video "" url = f 'https://www.acfun.cn/v/ac{video_id}' # determine the request url address # the request header uses the camouflage python code In order to be identified by the server as a simple anti-crawling method for crawlers, when you add ua to get the data, you may need to log in to cookie # to get the data, and you need to add cookie user information, which is often used to detect whether to log in to the account headers = {# 'cookies':' your cookie'. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0 Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'} # through the get request mode in the requests module, send the request for the url address, and carry the headers request header disguise, and finally use the response custom variable to receive the return data response = requests.get (url=url, headers=headers) # 2. Get data # print (response.text) # 3. Parsing the data through the findall method in the re module to find the title data in response.text re.S matching newline # regular expression extracted data return is the list data type implementation process is not important, there are many ways You can use whichever you like, as long as you can get all the data OK title = re.findall ('(. *?)-AcFun on-screen comment video network-seriously you lose\ (\? ω\?\) "-\ (thanks-comment\), response.text) [0] video_info = re.findall ('window.pageInfo = window.videoInfo = (. *?) 'To Response.text) [0] # print (video_info) # string into a dictionary how to most safely view the data type directly use the function type () to see json_data = json.loads (video_info) # pprint.pprint (json_data) # the dictionary value extracts the content to the right of the colon based on the content (key) to the left of the colon Value) m3u8_url =\ json.loads (json_data ['currentVideoInfo'] [' ksPlayJson']) ['adaptationSet'] [0] [' representation'] [0] ['backupUrl'] [0] # print (title) # print (m3u8_url) # through get request in requests module Send the request for the m3u8_url address, and carry the headers request header disguise to get the response body text data, and use the m3u8_data custom variable to receive the data m3u8_data = requests.get (url=m3u8_url, headers=headers). Text # split () string division m3u8_data = re.sub ('# E. requests,'' M3u8_data) .split () # print (m3u8_data) for ts in m3u8_data: ts_url = 'https://ali-safety-video.acfun.cn/mediacloud/acfun/acfun_video/' + ts ts_content = requests.get (url=ts_url, headers=headers). What does content # ab mean by an additional save B binary data ab appends to save with open ('video\' + title + '.mp4', mode='ab') as f: f.write (ts_content) print ('video saved complete:', title) read here This article "how to download and synthesize videos with Python" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.