In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail how to use python to climb the video data of bilibili ranking Top100. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.
1. The third party library imports from bs4 import BeautifulSoup # parses the import re # regular expression of the web page, carries on the text matching import urllib.request,urllib.error # requests the data through the browser import sqlite3 # lightweight database import time # to obtain the current time 2, the program runs the main function
The crawling process mainly includes declaring crawling web page-> crawling web page data and parsing-> saving data.
Def main (): # declare crawling website baseurl = "https://www.bilibili.com/v/popular/rank/all" # crawling web page datalist = getData (baseurl) # print (datalist) # Save data dbname = time.strftime ("% Y-%m-%d ", time.localtime ()) dbpath =" BiliBiliTop100 "+ dbname saveData (datalist,dbpath)
(1) the technology used in the process of crawling is to request data disguised as a browser.
(2) when parsing the crawled web page source code: use Beautifulsoup to parse the required data, and use re regular expression to match the data.
(3) when saving data, considering that the bilibili ranking list is refreshed every day, you can use the current date to save the database name.
3. The result of running the program
The data contained in the database are: ranking, video link, title, number of views, number of comments, author, comprehensive score these 7 data.
4. Program source code from bs4 import BeautifulSoup # parsing web page import re # regular expression Text matching import urllib.request,urllib.errorimport sqlite3import timedef main (): # declare crawling website baseurl = "https://www.bilibili.com/v/popular/rank/all" # crawling web page datalist = getData (baseurl) # print (datalist) # Save data dbname = time.strftime ("% Y-%m-%d ", time.localtime ()) dbpath =" BiliBiliTop100 "+ dbname saveData (datalist) Dbpath) # re regular expression findLink = re.compile (r'
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.