Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to capture the resources of Ali cloud disk with Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to grab Alibaba cloud disk resources with Python". Xiaobian shows you the operation process through actual cases. The operation method is simple, fast and practical. I hope this article "how to grab Alibaba cloud disk resources with Python" can help you solve the problem.

web analytics

This site has two search routes: search route 1 and search route 2, which is used in this article.

Open the network under the control panel and see a get request for seach.html.

There are several parameters, four key parameters:

Page: Number of pages,

Keyword: Search keyword

category: file category, all, video, image, doc, audio, zip, others, all by default in script

search_model: The route to search

Also in the control panel, see this page jump to Alibaba cloud disk to get the real link is above the title. Use bs4 to parse the div(class=resource-item border-dashed-yee) tag on the page to get the address of the jump network disk, and parse the p tag under div to get the resource date.

Capture and parse

First install the required bs4 third-party library for parsing the page.

pip3 install bs4

Below is the script code for crawling and parsing web pages, sorted in descending order by date.

import requestsfrom bs4 import BeautifulSoupimport stringword = input ('Please enter the name of the resource to search for:') headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}result_list = []for i in range(1, 11): print ('Searching page {}'. format(i)) params = { 'page': i, 'keyword': word, 'search_folder_or_file': 0, 'is_search_folder_content': 0, 'is_search_path_title': 0, 'category': 'all', 'file_extension': 'all', 'search_model': 0 } response_html = requests.get('https://www.alipanso.com/search.html', headers = headers,params=params) response_data = response_html.content.decode() soup = BeautifulSoup(response_data, "html.parser"); divs = soup.find_all('div', class_='resource-item border-dashed-eee') if len(divs)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report