Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to crawl Ajax data)

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to crawl Ajax data), in view of this problem, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

About Ajax: in fact, many web pages are not loaded at once. But browse and load at the same time. Like the pictures in Jinri Toutiao, after watching the loaded part, slide down, and then load some pictures, but the url has not changed, at this time, the page loading is Ajax loading. His principle is also to send requests, parse content, and render the page, but he ensures that the page will not be refreshed and the url will not be changed. But the page is updating the data.

1. Analyze Ajax

First, open the web page of Jinri Toutiao, enter street photos in the search bar, and switch to developer mode (F12).

Click XMR, because Ajax implements the XmlHttpRequest object at the bottom, abbreviated XMR

So under XMR, there are all Ajax requests.

Refresh the page again, continue to slide down, and we can see some requests:

When we analyze Request URL, we find that only the offset parameter has changed.

We can just write the other parameters the same.

In this way, we can construct a get_html () method that fetches the page as follows.

If you have read all the previous articles, it is not difficult to understand this method.

2. Find the address of the picture

Knowing where the picture is, we write a get_image_url () method to get a link to the picture.

Here, the last article on json data talked about how to obtain values: the storage of data (1)

3. Download pictures and classify and save them.

Here's what we need to say:

Os.chdir () is to change the current path, because I'm using Notepad++, so I need to change it.

Otherwise, the picture will be in the location where Notepad++ is installed.

Os.path.exists () exists and returns True, otherwise vice versa.

Os.mkdir () is the svse_image () method that creates a folder

Using md5 encryption to achieve different picture names

Download the picture in the previous article also talked about: python the second largest artifact requests

4. Centralized integration to achieve multi-process download

To implement multiple processes, you need to import the Pool library

From multiprocessing.pool import Pool

If you have read all the previous articles, this article should not be difficult. It is important to understand the analysis of Ajax. How to find it.

Finally, the following is the effect picture:

This is the answer to the question about how to crawl Ajax data. I hope the above content can be of some help to you. If you still have a lot of doubts to solve, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report