In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to crawl web Douyin popular videos through Python, I believe that many inexperienced people are at a loss about this, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
Preface
Tik Tok believes that everyone has heard of it, and it is no stranger, right? You can see a large number of short videos covering all major industries. Personally, I think Douyin is toxic. I can't stop brushing. It's 3: 00 or 4: 00 in the morning. Today, I will take you to climb the video data of Douyin web page! Let's see it.
1. Systematic analysis of the nature of web pages.
2. Regular extraction of data (difficulty)
3. Save massive audio data
Environment introduction:
Python 3.6
Pycharm
Requests
Re
General ideas of reptiles
1. Analyze the target web page and determine the crawled url path and headers parameters.
2. Send a request-- requests simulates the browser to send a request to obtain response data
3. Parsing data-regular expression
4. Save the data-- save it in the destination folder
Steps:
1. Import tool
Base_url = 'http://douyin.bm8.com.cn/d_1.html'headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
2. Analyze the target web page to determine the crawled url path and headers parameters
Base_url = 'http://douyin.bm8.com.cn/d_1.html'headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
3. Send a request-- requests simulates the browser to send a request to obtain response data
Response = requests.get (url=base_url, headers=headers) html_data = response.text
4. Parsing data-regular expression
Pattern = re.compile ('onclick= "open1\ (\' (. *?)\',\'(. *?)\',\'\') result = pattern.findall (html_data) print (result)
5. Build a for loop
For page in range (8,10): print ('= fetching page {} data = '.format (page)) # 1. Analyze the target web page to determine the crawled url path. The headers parameter base_url =' http://douyin.bm8.com.cn/d_{}.html'.format(page) headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
6. Deal with illegal characters of file name
Def change_title (title): pattern = re.compile (r "[[\ /\\:\ *\"\ |] ") #'/\: *?" >
7. Save the data-- save it in the destination folder
For title, url in result: # request Douyin video data data = requests.get (url=url, headers=headers). Content new_title = change_title (title) with open ('videos\\' + new_title + '.mp4', mode='wb') as f: f.write (data) print ('save complete:', title)
After watching the above content, do you know how to crawl the popular Douyin videos on the web page through Python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.