Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze distributed session solution and consistent hash

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to analyze distributed session solutions and consistent hash. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

First, the question is raised 1. What is Session?

When users use the services of the website, they need to use a browser to interact with the Web server for many times. HTTP protocol itself is stateless and needs a mechanism based on HTTP protocol to support session state (Session State). The specific way to do this is to assign a

A unique session ID (SessionID), which is told to the browser through Cookie, and the browser will bring this session ID SessionID to tell the Web server which session the request belongs to each time the request is made. On the Web server, each session has its own storage, which stores the information of different sessions. If Cookie is disabled, the general practice is to put the session ID in the parameter of URL.

two。 What is an Session consistency issue?

Session consistency issues occur when Web servers change from one to more than one.

As shown in the figure above, when a HTTP request with a session ID is sent to the Web server, the corresponding session data (Session) needs to be found during the processing of the HTTP request. However, the problem now is: if my request falls to the server on the left when I visit the site for the first time, then my Session will be created on the server on the left, and if we don't handle it, there is no guarantee that the next request will fall on the same server every time. This is the problem of Session consistency.

2. Session consistency solution 1. Session Stiky

In the case of a stand-alone machine, the session is saved on the stand-alone machine, and the request is processed by this machine, so there will be no problem. When Web servers become multiple, if you ensure that requests for the same session are all processed on the same Web server, the situation is the same for that session as before.

To do so, the load balancer needs to be able to forward requests based on the session ID SessionID of each request, as shown in the following figure. This approach is called the Session Stiky approach.

The solution itself is very simple, for the Web server, it is the same as the stand-alone situation, except that we tampered with the load balancer. This scheme allows requests of the same Session to be sent to the same Web server for processing every time, which is very convenient for server-side local caching for Session.

The problems include:

If one of the Web servers goes down or restarts, session data on that machine will be lost. If there is login status data in the session, the user needs to log in again.

Session identification is the information of the application layer, so if the load balancer wants to save all the requests for the same session to the same Web server, it needs to parse at the application layer (layer 7), which is more expensive than the exchange at layer 4.

The load balancer becomes a stateful node, and the session is saved to the mapping of a specific Web server, so memory consumption will be higher and disaster recovery will be more troublesome.

For example, for Session Stiky, if the Web server is the restaurant where we eat every time, the conversation data is the bowls and chopsticks we eat. To ensure that I use my own bowls and chopsticks every time I eat, I store my cutlery in a certain restaurant and eat in this restaurant every time, which is a good idea.

2. Session Replication

If we continue to use the analogy of going to a restaurant, then in addition to the previous way, if I keep my own set of cutlery in each restaurant, I will be more free to choose a restaurant. Session Replication is one such approach, as shown in the following figure.

As you can see, in the Session Replication scenario, load balancers are no longer required to ensure that multiple requests for the same session must go to the same Web server. On the other hand, we have increased the synchronization of session data between our Web servers. Synchronization ensures the consistency of Session data between different Web servers.

However, there are some problems with the Session Replication scheme, including:

Synchronizing Session data results in the overhead of network bandwidth. As long as the Session data changes, the data needs to be synchronized to all other machines, and the more machines there are, the greater the network bandwidth overhead caused by synchronization.

Each Web server has to store all the Session data, and if there is a large number of Session in the entire cluster, the content used by each machine to hold Session data will be very heavy.

This is the Session Replication scheme. This solution relies on the application container to complete the replication of Session so that the application can solve the Session problem, and the application itself does not care about it. However, this solution is not suitable for scenarios with a large number of cluster machines. If there are only a few machines, it is possible to use this scheme.

3. Centralized storage of Session data

It is also hoped that requests for the same session can be sent to different Web servers. The previous Session Replication is a solution, and another solution is to store the Session data centrally, and then different Web servers get the Session from the same place. Its approximate structure is shown in the following figure:

As you can see, the same part of the Session Replication scenario is that session requests are not pinned to the same Web server after passing through the load balancer. The difference is that there is no replication of Session data between Web servers, and Session data is no longer stored locally, but in another centralized storage location. In this way, no matter which Web server or which Session data is modified, the final modification takes place in this centralized storage place, and when the Web server uses Session data, it is also read from this centralized storage place of Session data. For the specific way of storing Session data, you can use a database or other distributed storage systems. This solution solves the memory problem in the Session Replication scheme, and it is better than Session Replication in terms of network bandwidth.

However, there are still some problems with the programme, including:

Reading and writing Session data introduces network operations, which compared with local data reading, the problem lies in the existence of delay and instability, but because the communication basically occurs in the intranet, the problem is not big.

If there is a problem with the machine or cluster that centrally stores Session, this will affect our application.

Compared with Session Replication, when the number of Web servers is relatively large and the number of Session is relatively large, the advantage of centralized storage scheme is very obvious.

4. Cookie Based

For the Cookie Based scheme, different requests for the same session do not restrict the specific processing machine. Unlike the centralized management of Session Replication and Session data, this scheme passes Session data through Cookie. The details are shown in the following figure.

As you can see, our Session data is stored in Cookie, and then the corresponding Session data is generated from Cookie on the Web server. It's like taking my own bowls and chopsticks with me every time, so I can choose which restaurant I want to eat in. Compared with the previous centralized storage, this scheme does not rely on an external storage system, so there is no network delay and instability for fetching and writing Session data from external systems.

However, there are still shortcomings in the programme, including:

Cookie length limit. Because Cookie has a length limit, this also limits the length of Session data.

Security. Session data is originally server-side data, but this scheme is to let these server-side data to the external network and client, so there is a security problem.

Bandwidth consumption. This does not refer to the consumption of bandwidth between internal Web servers, but to the overall consumption of external loans in our data center.

Performance consumption. Each HTTP request and response comes with Session data, and for the Web server, the less the result output of the response, the more concurrent requests will be supported.

III. Summary

Generally speaking, all the above solutions are solutions to the session problem, and centralized management of Session Sticky and Session is a better solution for large websites.

I. Preface

When solving the problem of load balancing in distributed systems, the Hash algorithm can be used to make a fixed part of the requests fall on the same server, so that each server regularly processes part of the requests (and maintains the information of these requests), which plays the role of load balancing.

However, the ordinary remainder hash (hash (for example, user id)% server machines) algorithm has poor scalability, and the mapping relationship between user id and server will fail massively when new server machines are added or offline. Consistent hash is improved by using hash ring.

II. Overview of consistent Hash

In order to intuitively understand the principle of consistent hash, this paper illustrates it with a simple example, assuming that there are four servers with the address of ip1,ip2,ip3,ip4.

Consistent hash first calculates the hash values hash (ip1), hash (ip2), hash (ip3), and hash (ip3) corresponding to four ip addresses. The calculated hash value is a direct value of 0 to the largest positive integer. These four values are shown in the following figure on the consistent hash ring:

The hash ring starts clockwise from the integer 0 to the largest positive integer, and the hash value calculated based on the four ip will definitely fall to some point on the hash ring. At this point, we map the four ip of the server to the consistent hash ring. When the user makes a request on the client, the routing rule (hash value) is first calculated according to the hash (user id), and then the hash value falls to the hash ring. According to the position of the hash value on the hash ring, the nearest ip is found clockwise as the routing ip.

As shown in the figure above, the user1,user2 request will be processed by the server ip2, the User3 request will be processed by the server ip3, the user4 request will be processed by the server ip4, and the user5,user6 request will be processed by the server ip1.

Let's consider what happens when the ip2 server goes down.

When the server of ip2 goes down, the consistent hash ring looks like the following figure:

According to the clockwise rule, the request of user1,user2 will be processed by the server ip3, while the processing server corresponding to the request of other users remains unchanged, that is, only the mapping relationship of some users previously processed by ip2 has been destroyed, and the request it is responsible for processing is delegated by the next node clockwise.

Let's consider what happens when new machines are added.

When an ip5 server is added, the consistent hash ring is roughly as shown below:

According to the clockwise rule, the previous user1 request should be handled by the ip1 server, but now it is handled by the new ip5 server, and the request processing server of other users remains the same, that is, some of the requests from the clockwise nearest server of the new server will be replaced by the new server.

Third, the characteristics of consistent hash

Monotonicity (Monotonicity), monotonicity means that if some requests have been hashed to the corresponding server for processing, and when a new server is added to the system, it should be guaranteed that the original request can be mapped to the original or new server, but not to the original server. By adding the server ip5 above, it can be proved that after the addition of ip5, the user6 that was originally processed by ip1 is still processed by ip1, and the user5 that was processed by ip1 is now processed by the new ip5. Spread: in a distributed environment, the client may not be aware of the existence of all servers when requesting, and may only know some of them. In the client's view, some of the servers he sees will form a complete hash ring. If multiple clients treat part of the server as a complete hash ring, it may result in requests from the same user being routed to different servers for processing. This situation should obviously be avoided because it does not guarantee that requests from the same user will fall on the same server. The so-called dispersion refers to the severity of the above situation. Balance: balance means load balancing, which means that requests from the client after hash should be able to be distributed to different servers. Consistency hash allows each server to process requests, but there is no guarantee that each server handles roughly the same number of requests, as shown in the following figure

Server ip1,ip2,ip3 falls on the consistent hash ring after hash. From the distribution of hash values in the figure, we can see that ip1 is responsible for processing about 80% of requests, while ip2 and ip3 are only responsible for processing about 20% of requests. Although all three machines are processing requests, it is obvious that the load of each machine is uneven, which is called the tilt of consistent hash. The emergence of virtual nodes is to solve this problem.

5. Virtual node

The problem of consistent hash tilting mentioned in the previous section will occur when there are few server nodes. One solution is to add more machines, but there is a cost to add machines, so add virtual nodes, such as the above three machines. The figure of the consistent hash ring after each machine introduces one virtual node is as follows:

Ip1-1 is the virtual node of ip1, ip2-1 is the virtual node of ip2, and ip3-1 is the virtual node of ip3.

It is known that when the number of physical machines is M and the virtual node is N, the number of nodes on the actual hash ring is M * (Number1). For example, when the hash value calculated by the client is between ip2 and ip3 or between ip2-1 and ip3-1, the ip3 server is used for processing.

6. Uniform hash

In the previous section, the graph after using the virtual node looks more balanced, but if the algorithm for generating the virtual node is not good enough, you will probably get the following ring:

It can be seen that after the introduction of one virtual node to each service node, the situation is better than that before the introduction of the virtual node, but it is not balanced.

The consistent hash of equilibrium should be as follows:

The goal of uniform hash is that if there are N servers and M hash values for clients, then each server should handle about N users. That is, the load of each server is balanced as much as possible. The consistent hash load balancing algorithm provided by dubbo is not uniform. We have implemented the spi extension of dubbo to achieve uniform consistent hash.

Consistent hash plays an important role in distributed systems. Both distributed cache and load balancing strategy of distributed Rpc framework are used.

On how to analyze distributed session solutions and consistency hash to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report