Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb bilibili's video on-screen comment

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how Python crawls bilibili video barrage". In daily operation, I believe many people have doubts about how Python crawled bilibili video barrage. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubt of "how Python crawls bilibili video barrage". Next, please follow the editor to study!

Basic development environment

Python 3.6

Pycharm

Use of related modules

Requests

Re

Install Python and add it to the environment variable, and pip installs the relevant modules you need.

First, define the needs

Find a video crawl with more on-screen comments.

Second, web page data analysis

The previous bilibili on-screen comment video, click to view the history of the on-screen comment, will return you a json data, including all the on-screen comments.

Now click on the historical barrage data, there is also data loaded out, but the inside is garbled.

If you request this link, you will still get the data content you want.

You only need to use regular expressions to match Chinese characters.

Third, parse the data and crawl multiple pages

On-screen comment paging is based on the date. When you click on the use of 2021-01-01, the data returned to me is not the on-screen comment data, but all the dates.

Then someone will ask if I want to climb the on-screen comment data of 2021-01-01.

The url addresses of the two are different, and seg.so is the url address of the on-screen comment data.

Import requestsimport redef get_response (html_url): headers = {'cookie':' your own cookie', 'origin':' https://www.bilibili.com', 'referer':' https://www.bilibili.com/video/BV19E41197Kc', 'user-agent':' Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',} response = requests.get (url=html_url) Headers=headers) return responsedef get_date (html_url): response = get_response (html_url) json_data = response.json () date = json_data ['data'] print (date) return date if _ _ name__ =' _ _ main__': one_url = 'https://api.bilibili.com/x/v2/dm/history/index?type=1&oid=120004475&month=2021-01' get_date (one_url)

The returned data is json data, and the relevant data can be obtained according to the dictionary key-value pair.

4. Save data (data persistence) def main (html_url): data = get_date (html_url) for date in data: url = f 'https://api.bilibili.com/x/v2/dm/web/history/seg.so?type=1&oid=120004475&date={date}' html_data = get_response (url). Text result = re.findall (". *? ([\ u4E00 -\ u9FA5] +). *?" Html_data) for i in result: with open ('bilibili on-screen comment. Txt', mode='a', encoding='utf-8') as f: f.write (I) f.write ('\ n') 5. Complete code import requestsimport redef get_response (html_url): headers = {'cookie':' your own cookie' 'origin': 'https://www.bilibili.com',' referer': 'https://www.bilibili.com/video/BV19E41197Kc',' user-agent': 'Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',} response = requests.get (url=html_url, headers=headers) return responsedef get_date (html_url): response = get_response (html_url) json_data = response.json () date = json_data ['data'] print (date) return datedef save (content): for i in content: with open (' bilibili on-screen comment .txt' Mode='a' Encoding='utf-8') as f: f.write (I) f.write ('\ n') print (I) def main (html_url): data = get_date (html_url) for date in data: url = f 'https://api.bilibili.com/x/v2/dm/web/history/seg.so?type=1&oid=120004475&date={date}' html _ data = get_response (url). Text result = re.findall (". *? ([\ u4E00 -\ u9FA5] +). *?" Html_data) save (result) if _ _ name__ = ='_ _ main__': one_url = 'https://api.bilibili.com/x/v2/dm/history/index?type=1&oid=120004475&month=2021-01' main (one_url) so far On the "Python how to climb bilibili video barrage" study is over, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report