Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to grab the dynamic loading data of a web page without using selenium plug-in

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to grab the dynamic loading data of a web page without using the selenium plug-in, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

The following is how to get dynamic loading data on a web page without using the selenium plug-in to simulate the browser.

The steps are as follows:

First, find the correct URL.

2. Enter the parameters corresponding to URL.

Third, the parameter is converted to the string data that can be recognized by urllib.

Initialize the Request object.

Fifth, urlopen the Request object to get the data.

Url=' http://www.*****.*****/*********'

Formdata = {'year': year

'month': month

'day': day

}

Data = urllib.urlencode (formdata)

Request=urllib2.Request (url,data = data) # request=urllib2.Request (url) if URL has no parameters

R = urllib2.urlopen (request)

Html=r.read () # html is the data you want, either in html format, or in json format, or in other format

The next steps are all the same, and the key is how to get the URL and parameters. Let's take COVID-19 's epidemic statistics web page as an example (https://news.qq.com/zt2020/page/feiyan.htm#/).

If you directly grasp the URL of the browser, you will see a html with no data content, which only contains the title, column name and so on, and there is no cumulative diagnosis, cumulative death, and so on. Because the data on this page is loaded dynamically, not a static html page. You need to follow the steps I wrote above to get the data, and the key is to get the URL and the corresponding parameter formdata. Here's how to get these two data in Firefox.

Right-click on the pneumonia page and select the check element in the menu that appears.

Click the red arrow network options above, and then refresh the page. As follows

There will be a lot of network transmission records, observe the rightmost red box "size" column, this column indicates the amount of data transmitted by this http request, the amount of data dynamically loaded is generally larger than that of other page elements, 119kb is a large amount of data compared to other byte calculations, of course, some decorative pictures on the web page are also very large, this needs to be screened according to the file type column.

Then click the corresponding line in the domain name column, as follows

You can see the request URL in the header. This is url. Click on the parameters to see the parameters corresponding to url.

Https://view.inews.qq.com/g2/getOnsInfo?name=disease_h6&callback=jQuery341004532487105727312_1584498763134&_=1584498763135

Can you see the tail of url? The parameters have been written down later.

If we use URL with parameters, then

Request=urllib2.Request (url), without the data parameter.

If you use request=urllib2.Request (url,data = data)

So url= "https://view.inews.qq.com/g2/getOnsInfo""

Formdata = {'name':' disease_h6'

'callback':''

'_': current timestamp

}

Name is a page callback function of disease_h6,callback, so we do not need a callback action, so set it to null. _ corresponds to a timestamp (which can be easily obtained by Python), because querying the number of pneumonia patients is closely related to the time.

If it is all written in a url, it is in the following form

Url=' https://view.inews.qq.com/g2/getOnsInfo?name=disease_h6&callback=&_=%d'%int(stamp*1000)

According to this line of thinking, epidemic data can be obtained. There are two options for you to choose.

Finding url and parameters requires patience and analytical ability in order to correctly identify the meaning of url and parameters and carry out correct programming. Whether the parameter can be empty, whether it can be hard-coded, whether there are special requirements, is actually a very test of experience.

Some url is very simple, return a .dat file, which is directly the data in json format, this is the most friendly. Some require you to set a large number of parameters in order to get, and get the html format, need to parse to extract the data.

This is the answer to the question about how to grab the dynamic loading data of the web page without using the selenium plug-in. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report