In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Python how to climb Aitu.com material download link, I believe that many inexperienced people do not know what to do, so this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
Preface
Usually crawl pictures directly, but sometimes you only want individual pictures. What should I do?
Project goal
Crawl the download address of Aitu.com material
By clicking on the material to enter the material details page, you can see the local download address and copy more download address links for the material:
Http://www.aiimg.com/sucai.php?open=1&aid=126632&uhash=70a6d2ffc358f79d9cf71392http://www.aiimg.com/sucai.php?open=1&aid=126630&uhash=99b07c347dc24533ccc1c144http://www.aiimg.com/sucai.php?open=1&aid=126634&uhash=d7e8f7f02f57568e280190b4
The aid of each link is different. This should be each ID of the material. What is the uhash behind it?
Originally thought whether there is interface data in the web page data can directly find this parameter, search in the developer tool does not have this parameter, check to see if there is this download link in the web source code.
If we have this link, we can download it directly after we get it.
We can find that all the data we need is in the tag of the web page, and we request the web page to get the returned data.
Import requestsurl = 'http://www.aiimg.com/list.php?tid=1&ext=0&free=2&TotalResult=5853&PageNo=1'headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} response = requests.get (url=url, headers=headers) print (response.text)
Parsing crawling data
Import parselselector = parsel.Selector (response.text) lis = selector.css ('.imglist _ d ul li a::attr (href)'). Getall () for li in lis: num_id = li.replace ('.html','). Split ('/') [- 1] new_url = 'http://www.aiimg.com/sucai.php?aid={}'.format(num_id) response_2 = requests.get (url=new_url) Headers=headers) selector_2 = parsel.Selector (response_2.text) data_url = selector_2.css ('.downlist a.down1::attr (href)'). Get () title = selector_2.css ('.toart a Groupe text'). Get () download_url = 'http://www.aiimg.com' + data_url
After reading the above, have you mastered the method of how Python climbs the download link of Aitu.com material? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.