In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how Python crawls bilibili ranking video playback and video comments and other data", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let's let the editor take you to learn "how Python crawls bilibili's ranking video playback and video comments and other data"!
Project background
Xiao Q found that Xiao P spent a very long time in bilibili every day. He wanted to have an in-depth exchange with Xiao P about bilibili, but Xiao Q, who was under great academic pressure some time ago, never saw bilibili. He wanted to know what bilibili is popular now. Can you help him?
Project goal
To crawl the contents of the current bilibili ranking list (just climb any list), including video ranking video BV number, video cover, number of video plays, number of video comments, and up master name
Target web page analysis
Get data content
Title
Playback quantity
Amount of barrage
Author
Comprehensive score
Details page address
As soon as the developer tools take a look, boy, is that it?
When you see a situation like this, you really don't have to analyze anything, and you can start writing code from beginning to end.
It's just the reptile trilogy.
1. Simulate the browser to request the website to obtain web page data
2. Parse the web page data and extract the desired content
3. Save data
Complete code import requestsimport parselimport csvf = open ('bilibili ranking data .csv', mode='a', encoding='utf-8-sig', newline='') csv_writer = csv.DictWriter (f, fieldnames= ['title', 'number of views', 'number of on-screen comments', 'author', 'comprehensive score' 'Video address']) csv_writer.writeheader () url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'headers = {' user-agent': 'Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} response = requests.get (url=url) Headers=headers) selector = parsel.Selector (response.text) lis = selector.css ('.rank-list li') dit = {} for li in lis: title = li.css (' .info aJiang text'). Get () # title bf_info = li.css ('div.content > div.info > div.detail > span:nth-child (1):: text'). Get (). Strip () # playback dm_info = Li.css ('div.content > div.info > div.detail > span:nth-child (2):: text'). Get (). Strip () # on-screen comment bq_info = li.css (' div.content > div.info > div.detail > a > span::text'). Get (). Strip () # author score = li.css ('. Pts div::text'). Get () # Synthesis Score page_url = li.css ('.img a::attr (href)') .get () # Video address dit = {'title': title 'playback': bf_info, 'number of on-screen comments': dm_info, 'author': bq_info, 'Comprehensive score': score, 'Video address': page_url,} csv_writer.writerow (dit) print (dit)
At this point, I believe that "how Python crawled bilibili ranking video playback and video comments and other data" have a deeper understanding, might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.