In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to crawl Ajax data), in view of this problem, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.
About Ajax: in fact, many web pages are not loaded at once. But browse and load at the same time. Like the pictures in Jinri Toutiao, after watching the loaded part, slide down, and then load some pictures, but the url has not changed, at this time, the page loading is Ajax loading. His principle is also to send requests, parse content, and render the page, but he ensures that the page will not be refreshed and the url will not be changed. But the page is updating the data.
1. Analyze Ajax
First, open the web page of Jinri Toutiao, enter street photos in the search bar, and switch to developer mode (F12).
Click XMR, because Ajax implements the XmlHttpRequest object at the bottom, abbreviated XMR
So under XMR, there are all Ajax requests.
Refresh the page again, continue to slide down, and we can see some requests:
When we analyze Request URL, we find that only the offset parameter has changed.
We can just write the other parameters the same.
In this way, we can construct a get_html () method that fetches the page as follows.
If you have read all the previous articles, it is not difficult to understand this method.
2. Find the address of the picture
Knowing where the picture is, we write a get_image_url () method to get a link to the picture.
Here, the last article on json data talked about how to obtain values: the storage of data (1)
3. Download pictures and classify and save them.
Here's what we need to say:
Os.chdir () is to change the current path, because I'm using Notepad++, so I need to change it.
Otherwise, the picture will be in the location where Notepad++ is installed.
Os.path.exists () exists and returns True, otherwise vice versa.
Os.mkdir () is the svse_image () method that creates a folder
Using md5 encryption to achieve different picture names
Download the picture in the previous article also talked about: python the second largest artifact requests
4. Centralized integration to achieve multi-process download
To implement multiple processes, you need to import the Pool library
From multiprocessing.pool import Pool
If you have read all the previous articles, this article should not be difficult. It is important to understand the analysis of Ajax. How to find it.
Finally, the following is the effect picture:
This is the answer to the question about how to crawl Ajax data. I hope the above content can be of some help to you. If you still have a lot of doubts to solve, you can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.