In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "Python asynchronous loading how to crawl pictures". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian slowly and deeply to study and learn "Python asynchronous loading how to crawl pictures" together!
What is asynchronous loading?
To understand asynchronous loading technology, we must first know how traditional web pages are loaded. First of all, traditional web page updates need to refresh and update the entire page, which wastes a lot of resources, but now commonly used asynchronous loading technology (AJAX: JavaScript and XML) refers to an interactive web application technology, such as no page, consistent decline can continuously update part of the web page data information, while the current web page framework and URL information are unchanged refers to asynchronous loading technology, greatly saving network resources.
Another way to determine whether asynchronous loading is used is by looking at whether the data is in the page source code.
After using asynchronous loading of web pages, the content of the web page in HTML is not found text information, this time through the conventional three crawling library is not able to grasp the information, so you need to reverse to find out how the web page is loaded data process, this process is called reverse engineering.
How to reverse engineer?
Taking the Pexels website as an example, let's look at how reverse engineering can be implemented:
Chrome browser corresponds to Pexels website, right click to check, open the source code of the web page, select the Network tab.
Continue to manually slide down the page, you will find that the XHR file is constantly loading updates, through the red box above you can find fixed changes in the URL
https://www.pexels.com/search/book/? format=js&seed=&page=2&type=
https://www.pexels.com/search/book/? format=js&seed=&page=3&type=
https://www.pexels.com/search/book/? format=js&seed=&page=4&type=
Try deleting parts of the URL above and get
https://www.pexels.com/search/book/? page=2 found can also return to normal web pages
In this way, the real URL of the web page is deduced in reverse, and the law of change of the web page is also known, so that data crawling can be started.
Because Pexels image website has set up a more strict anti-theft mechanism, we will talk about how to crack it later. We use hippopx image website to practice. This website also has a large number of beautiful free copyright-free images. You can try to see them.
The detailed codes are as follows:
import requestsfrom lxml import etreeimport oheaders ={"accept": "xxxx","cookie": "xxxx","User-Agent": "xxxx","referrer": "xxxx"}list=[]file ='store path 'url ='https://www.hippopx.com/'html=requests.get(url,headers=headers)selector=etree.HTML(html.text)imgs=selector.xpath('//*[@id="flow"]/li/figure/a/img')for img in imgs: photo=img.get('src') list.append(photo)for item in list: print(item) data=requests.get(item,headers=headers) fp= open(file+'/'+item.split('/')[-1],'wb') fp.write(data.content) fp.close()
The results are as follows:
Store pictures as follows:
Thank you for reading, the above is "Python asynchronous loading how to crawl pictures" content, after the study of this article, I believe we have a deeper understanding of Python asynchronous loading how to crawl pictures this problem, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.