In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces how the Python crawler collects Weibo video data, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to know about it.
Knowledge point
Requests
Pprint
Development environment
Version: python 3.8
-Editor: pycharm 2021.2
Crawler principle
Function: bulk access to Internet data (text, pictures, audio, video)
Essence: repeated requests and responses
Case realization
1. Import required modules
Import requestsimport pprint
two。 Find the target URL
Open the developer tool, select Fetch/XHR, select the label where the data is located, and find the target url
Https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor
3. Send a network request
Headers= {'cookie':', 'referer':' https://weibo.com/tv/channel/4379160563414111/editor', 'user-agent':',} data = {'data':' {"Component_Channel_Editor": {"cid": "4379160563414111", "count": 9}} url= 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'json_data = requests.post (url=url, headers=headers) Data=data) .json ()
4. Get data
Json_data_2 = requests.post (url=url_1, headers=headers, data=data_1). Json ()
5. Filter data
Dict_urls = json_data_2 ['data'] [' Component_Play_Playinfo'] ['urls'] video_url = "https:" + dict_ urls [list (dict_urls.keys ()) [0]] print (title + "\ t" + video_url)
6. Save data
Video_data = requests.get (video_url). Contentwith open (f'video\ {title} .mp4', mode='wb') as f: f.write (video_data) print (title, "crawled successfully.")
Complete code
Import requestsimport pprintheaders = {'cookie':' add your own', 'referer':' https://weibo.com/tv/channel/4379160563414111/editor', 'user-agent':',} data = {'data':' {"Component_Channel_Editor": {"cid": "4379160563414111" "count": 9}'} url= 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'json_data = requests.post (url=url, headers=headers) Data=data) .json () print (json_data) ccs_list = json_data ['data'] [' Component_Channel_Editor'] ['list'] next_cursor = json_data [' data'] ['Component_Channel_Editor'] [' next_cursor'] for ccs in ccs_list: oid = ccs ['oid'] title = ccs [' title'] data_1 = {'data':' {"Component_Play_Playinfo": { "oid": "'+ oid +'"}}'} url_1 = 'https://weibo.com/tv/api/component?page=/tv/show/' + oid json_data_2 = requests.post (url=url_1) Headers=headers, data=data_1). Json () dict_urls = json_data_2 ['data'] [' Component_Play_Playinfo'] ['urls'] video_url = "https:" + dict_ urls [list (dict_urls.keys ()) [0]] print (title + "\ t" + video_url) video_data = requests.get (video_url). Content with open (f'video\ {title} .mp4' Mode='wb') as f: f.write (video_data) print (title, "crawled successfully") Thank you for reading this article carefully. I hope the article "how the Python crawler collects Weibo video data" shared by the editor will be helpful to you. At the same time, I also hope you will support us and follow the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.