In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to crawl the ajax dynamic website, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
What is ajax, to put it simply, after loading a web page, some information you still can not see, you need to click a button to see the data, or some pages have many pages of data, and when you click on the next page, the url address of the page has not changed, but the content has changed, these can be said to be ajax. If you still don't understand, I'll show you the explanation of Baidu encyclopedia. Here is.
Ajax, namely "Asynchronous Javascript And XML" (Asynchronous JavaScript and XML), refers to a web page development technology that creates interactive web page applications.
Ajax = Asynchronous JavaScript and XML (a subset of the standard generic markup language).
Ajax is a technology for creating fast dynamic web pages.
Ajax is a technology that can update parts of a web page without reloading the entire page. [
By exchanging a small amount of data with the server in the background, Ajax can update web pages asynchronously. This means that some part of the page can be updated without reloading the entire page.
Traditional web pages (without Ajax) if you need to update the content, you must reload the entire web page.
For the following example, the most difficult thing I have crawled on the ajax page is the comments on NetEYun Music. If you are interested, you can take a look at using python to crawl NetEase Yun Music and store the data in mysql.
The comments here are loaded by ajax, and the other one who grabs the picture of Jinri Toutiao Girl is also loaded by ajax, but I simplified it. There are many more, let's not talk about it, tell me about the ajax website I'm going to talk about today.
Http://www.kfc.com.cn/kfccda/storelist/index.aspx
This is the facade information of KFC
There are many pages of data, each of which is loaded by ajax. If you directly use python to request the above url, you probably won't get any data. If you don't believe it, you can try it. At this point, we open the developer tools as usual. First clear all the requests, tick the continuous log, and then click on the next page, you will see
The above request is the web page of the ajax request, and there will be the data we need. Let's see what kind of request it is.
It is a post request. The success status code of the request is 200. it is also available on the request url. The following from data is the data we need for post. It is easy to guess that pageIndex is the number of pages, so we can change this value to turn the page.
This page is analyzed, this is to solve the ajax dynamic web page, do not feel very simple, in fact, it is not, just this page is relatively simple, because the form (from data) data is not encrypted, if encrypted, then it is estimated that you find the js file to see how the parameters are encrypted, this is what I wrote before the NetEase music review crawl. Looking at these confused js looking for encryption methods can sometimes give you a headache, so often people will choose to use selenium to crawl, but using these will degrade the performance of the crawler, so this method is not allowed at work. So you have to learn how to deal with these ajax.
Paste the code
Import requests
Page = 1
While True:
Url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname'
Data = {
'Guangzhou 'cname':
'pid':''
'pageIndex': page
'pageSize': '10'
}
Response = requests.post (url, data=data)
Print (response.json ())
If response.json () .get ('Table1','):
Page + = 1
Else:
Break read the above content, do you have any further understanding of how to crawl the ajax dynamic website? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.