Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to collect Weibo Video data by Python Crawler

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how the Python crawler collects Weibo video data, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to know about it.

Knowledge point

Requests

Pprint

Development environment

Version: python 3.8

-Editor: pycharm 2021.2

Crawler principle

Function: bulk access to Internet data (text, pictures, audio, video)

Essence: repeated requests and responses

Case realization

1. Import required modules

Import requestsimport pprint

two。 Find the target URL

Open the developer tool, select Fetch/XHR, select the label where the data is located, and find the target url

Https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor

3. Send a network request

Headers= {'cookie':', 'referer':' https://weibo.com/tv/channel/4379160563414111/editor', 'user-agent':',} data = {'data':' {"Component_Channel_Editor": {"cid": "4379160563414111", "count": 9}} url= 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'json_data = requests.post (url=url, headers=headers) Data=data) .json ()

4. Get data

Json_data_2 = requests.post (url=url_1, headers=headers, data=data_1). Json ()

5. Filter data

Dict_urls = json_data_2 ['data'] [' Component_Play_Playinfo'] ['urls'] video_url = "https:" + dict_ urls [list (dict_urls.keys ()) [0]] print (title + "\ t" + video_url)

6. Save data

Video_data = requests.get (video_url). Contentwith open (f'video\ {title} .mp4', mode='wb') as f: f.write (video_data) print (title, "crawled successfully.")

Complete code

Import requestsimport pprintheaders = {'cookie':' add your own', 'referer':' https://weibo.com/tv/channel/4379160563414111/editor', 'user-agent':',} data = {'data':' {"Component_Channel_Editor": {"cid": "4379160563414111" "count": 9}'} url= 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'json_data = requests.post (url=url, headers=headers) Data=data) .json () print (json_data) ccs_list = json_data ['data'] [' Component_Channel_Editor'] ['list'] next_cursor = json_data [' data'] ['Component_Channel_Editor'] [' next_cursor'] for ccs in ccs_list: oid = ccs ['oid'] title = ccs [' title'] data_1 = {'data':' {"Component_Play_Playinfo": { "oid": "'+ oid +'"}}'} url_1 = 'https://weibo.com/tv/api/component?page=/tv/show/' + oid json_data_2 = requests.post (url=url_1) Headers=headers, data=data_1). Json () dict_urls = json_data_2 ['data'] [' Component_Play_Playinfo'] ['urls'] video_url = "https:" + dict_ urls [list (dict_urls.keys ()) [0]] print (title + "\ t" + video_url) video_data = requests.get (video_url). Content with open (f'video\ {title} .mp4' Mode='wb') as f: f.write (video_data) print (title, "crawled successfully") Thank you for reading this article carefully. I hope the article "how the Python crawler collects Weibo video data" shared by the editor will be helpful to you. At the same time, I also hope you will support us and follow the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report