In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail how to use Cookie to simulate login to browse web pages and resources, the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
In practice, the contents of many websites need to be logged in before they can be seen, so we need to simulate login and crawl using the status after login. You need to use Cookie here.
Use Cookie for simulated login
Nowadays, most websites use Cookie to track the login status of users, and once the site verifies the login information, it will save the login information in the browser's cookie. The website will use this cookie as the authentication credential and return it to the server when browsing the page of the website.
Because the cookie is saved locally, the natural cookie can be tampered with and forged. Let's take a look at what the Cookie looks like.
Open the web debugging tool, open any web page at random, on the "network" tab, open a link in headers:
Let's copy it and have a look:
_ guid=137882464.208312339030071800.1455264073383.613; _ _ huid=10POq43DvPO3U0izV0xej4%2BFDIemVPybee0j1Z1xnJnpQ%3D; _ _ hsid=825c7462cc8195a5; somultiswitch=1; _ _ seedSign=68; count=1; sessionID=132730903.3074093016427610600.1483758834211.764; piao34=1;city_code=101280101;customEng=1-7
It consists of key-value pairs.
Next, let's take a look at the details page of one of Dou's books as an example to explain the use of Cookie.
Look at Dou is an e-book download site, most of the books on their Kindle are found from this.
The example URL is: https://kankandou.com/book/view/22353.html
Normally, unlogged-in users will not see the download link, such as this:
Hide the download link for the book.
The header information is as follows:
Let's take a look at the page after login:
The download link is already displayed. Let's take a look at the Cookie section of the header message.
It is obviously different from the Cookie in the previous unlogged-in state.
Next, we make a HTTP request for the sample URL:
# coding:utf-8
Import requests
From bs4 import BeautifulSoupurl = 'https://kankandou.com/book/view/22353.html'wbdata = requests.get (url). Textsoup = BeautifulSoup (wbdata,'lxml') print (soup)
The results are as follows:
We find the HTML code for the column "Book Guide" where the download link exists:
Book guide went to Amazon to buy "the universe is a cat's dream of sound sleep."
Just as we use a browser to access this URL when we are not logged in, only Amazon's purchase link is displayed, and there is no download link in electronic format.
We try to use the following Cookie after login:
There are two ways to use Cookie
1. Write Cookie directly to the header header.
The complete code is as follows:
# coding:utf-8
Import requests
From bs4 import BeautifulSoupcookie = 'cisession=19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60;CNZZDATA1000201968=1815846425-1478580135murals https% 253A% 252F% 252Fwww.baidu.com% 252F% 7C1483922031 * WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36', 'Connection':' keep-alive', 'accept':' text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Cookie': cookie} url =' https://kankandou.com/book/view/22353.html'wbdata = requests.get (url,headers=header). Textsoup = BeautifulSoup (wbdata,'lxml') print (soup)
The above code returns the response to the page
We search the code for the response
The part of the red ellipse is the same as the HTML returned without Cookie access, which is the purchase link for Amazon.
The red rectangle is the download link for the e-book, which appears only when Cookie is used in the request.
Compared with the appearance of the actual web page, it is consistent with the display page that you log in to view.
Function complete. Let's take a look at the second way.
2. Use the cookies parameter of requests
The complete code is as follows:
# coding:utf-8
Import requestsfrom bs4 import BeautifulSoupcookie = {"cisession": "19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60", "CNZZDATA100020196": "1815846425-1478580135-https%253A%252F%252Fwww.baidu.com%252F%7C1483922031", "Hm_lvt_f805f7762a9a237a0deac37015e9f6d9": "1482722012herald 1483926313", "Hm_lpvt_f805f7762a9a237a0deac37015e9f6d9": "1483926368"} url = 'https://kankandou.com/book/view/22353.html'
Wbdata = requests.get (url,cookies=cookie). Textsoup = BeautifulSoup (wbdata,'lxml')
Print (soup)
What is obtained in this way is also the HTML displayed after login:
In this way, we can easily use Cookie to get the web pages and resources that need login authentication to browse.
About how to get Cookie, manual replication is one way, through code acquisition, you need to use Selenium.
On how to use Cookie simulation login to browse the web and resources to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.