How to get JS dynamic content in Python 07/15 Update SLTechnology News&Howtos

How to get JS dynamic content in Python

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces how to get JS dynamic content in Python. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

The news of the web page can not be found in the HTML source code, and it is all dynamically generated and loaded by JS.

In this case, how should we crawl the web page? There are two ways:

1. Find the JSON data returned by the JS script from the web page response; 2. Use Selenium to simulate the web page visit

Only the first method is introduced here, and there is a special article on the use of Selenium.

Find the JSON data returned by the JS script from the web page response

Even if the web page content is dynamically generated and loaded by JS, JS still needs to call an interface and load and render according to the JSON data returned by the interface.

So we can find the data interface called by JS and find the last data rendered in the web page from the data interface.

Take Jinri Toutiao as an example to demonstrate:

1. From finding the data interface requested by JS

Undefined

The data is the same as the picture news on the home page, so the data should be in it.

Check out the other links:

This should be a hot search keyword.

This is the news below the photo news.

Let's open an interface link and take a look: http://www.toutiao.com/api/pc/focus/

A string of garbled codes is returned, but the normal encoded data is viewed from the response:

With the corresponding data interface, we can imitate the previous method to request and get response to the data interface.

2. Request and parse data interface data

Start with the complete code:

# coding:utf-8import requestsimport jsonurl = 'http://www.toutiao.com/api/pc/focus/'wbdata = requests.get (url). Textdata = json.loads (wbdata) news = data [' data'] ['pc_feed_focus'] for n in news: title = n [' title'] img_url = n ['image_url'] url = n [' media_url'] print (url,title,img_url)

The result returned is as follows:

As usual, explain the code a little bit:

The code is divided into four parts

Part one: introduce related libraries

# coding:utf-8import requestsimport json

The second part: http request to the data interface

Url = 'wbdata = requests.get (url). Text

The third part: JSON the data in response to HTTP and index to the location of the news data

Data = json.loads (wbdata) news = data ['data'] [' pc_feed_focus']

Part IV: traversing and extracting the indexed JSON data

For n in news: title = n ['title'] img_url = n [' image_url'] url = n ['media_url'] print (url,title,img_url) about how to obtain JS dynamic content in Python is shared here. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.