In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how to use Python to crawl bilibili's short video, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
I used Python to crawl bilibili's short video, because it is a small video, the average size is less than 5 megabytes. When waiting in line, I can watch my little sister without the Internet. Cool. The method of obtaining the source code is given at the end of the paper.
Bilibili's short video address:
Http://vc.bilibili.com/p/eden/rank#/?tab= all
I climbed the daily mini video rankings and learned the daily ones. It's very easy to climb this week's and this month's. Just change the tag and we'll talk about it later in the detailed analysis. The following is the crawl result.
Project environment
Language: Python3
Tool: Pycharm
Program structure
It is mainly composed of three parts:
Get_json (): extract the json data information of the target web page.
Downloader (): download the short video and show the download progress.
Main function: download the video cyclically until the download is complete.
Code analysis
Open the website to slide down, the video is dynamically loaded, open the debugging tool, slide down to load the video, view the url in Headers, the link in the front part of url is unchanged, extract it. (swipe left and right to see all the code)
Http://api.vc.bilibili.com/board/v1/ranking/top?
Observing the changes in the parameters below, it is found that only the next_offset field is changing, with 10 more each time than the previous one.
This is easy to do, we take out the parameters separately, write the variable next_offset into variables, and return the json data of the target web page.
Next, I downloaded the short video, and in order to look beautiful, I made a downloader to show the download speed. The effect is as follows.
There is one thing to note here. When you request a target page, you must bring the headers information of this page. The website has done an anti-crawling operation, otherwise the downloaded video is empty. Part of the code is as follows. (ps: when you run the code, change the headers to the headers of your browser on this page.)
In order to extract more videos in the main function, we make the variable next_offset bad, and then extract the video title and downloadable links from the json data. Looking at the json data structure of the page, you can easily get the article title and download the link data.
In order to prevent some videos from not providing download links, I added exception handling. Careful friends should have found that there are only 84 videos in the effect images given in front of the article. That's why. Finally, in order to prevent the ip from being blocked, the random wait time is set. In fact, on the whole, 100 videos can be downloaded in less than 5 minutes.
On how to use Python to crawl bilibili's short video to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.