In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the relevant knowledge of "how to grab Alibaba cloud disk resources with Python". Xiaobian shows you the operation process through actual cases. The operation method is simple, fast and practical. I hope this article "how to grab Alibaba cloud disk resources with Python" can help you solve the problem.
web analytics
This site has two search routes: search route 1 and search route 2, which is used in this article.
Open the network under the control panel and see a get request for seach.html.
There are several parameters, four key parameters:
Page: Number of pages,
Keyword: Search keyword
category: file category, all, video, image, doc, audio, zip, others, all by default in script
search_model: The route to search
Also in the control panel, see this page jump to Alibaba cloud disk to get the real link is above the title. Use bs4 to parse the div(class=resource-item border-dashed-yee) tag on the page to get the address of the jump network disk, and parse the p tag under div to get the resource date.
Capture and parse
First install the required bs4 third-party library for parsing the page.
pip3 install bs4
Below is the script code for crawling and parsing web pages, sorted in descending order by date.
import requestsfrom bs4 import BeautifulSoupimport stringword = input ('Please enter the name of the resource to search for:') headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}result_list = []for i in range(1, 11): print ('Searching page {}'. format(i)) params = { 'page': i, 'keyword': word, 'search_folder_or_file': 0, 'is_search_folder_content': 0, 'is_search_path_title': 0, 'category': 'all', 'file_extension': 'all', 'search_model': 0 } response_html = requests.get('https://www.alipanso.com/search.html', headers = headers,params=params) response_data = response_html.content.decode() soup = BeautifulSoup(response_data, "html.parser"); divs = soup.find_all('div', class_='resource-item border-dashed-eee') if len(divs)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.