How to solve the replication and sharing of session and the design of distributed cache 07/09 Update SLTechnology News&Howtos

How to solve the replication and sharing of session and the design of distributed cache

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to solve the replication and sharing of session and the design of distributed cache? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

1. Replication and sharing of session

In web applications, in order to cope with large-scale access, cluster deployment of applications must be realized. In order to realize cluster deployment, it is necessary to implement session sharing mechanism to unify sessions among multiple application servers. Most mainstream web servers such as tomcat adopt session replication and session sharing. But the problem is obvious:

When the number of nodes continues to increase, the performance loss caused by session replication will increase rapidly. Especially when large objects are saved in session and the objects change rapidly, the performance degradation is more significant. This feature limits the horizontal expansion of web applications.

Another idea of session sharing is to manage session centrally. The first thought is to use database to store session centrally, but the database is an order of magnitude slower than memory. At the same time, it is bound to increase the burden of the database system. Therefore, we need a service that is fast and can be stored remotely and centrally: memcached.

There are two scenarios for using memcached to store session:

(1) it is realized directly through the extension mechanism of tomcat6.

(2) realize it by writing filter.

Considering the expansion of the system, we adopt this scheme. This can decouple the session sharing mechanism from the middleware.

Main ideas:

1) inheriting and refactoring the HttpServletRequestWrapper,HttpSessionWrapper class and overwriting the original methods related to session access are all implemented through the SessionService class.

2) use filter to intercept the sessionId in cookie, construct a new HttpServletRequestWrapper object through sessionId, and pass it to the following applications.

3) SessionService connects to memcached service, takes sessionId as key, and accesses the content of a map.map, that is, the content of session.

Pay attention to several problems and improve ideas in the process of use:

1. The memory of memcache should be large enough so that there is no problem that the user session is cleared from the Cache (you can turn off the object exit mechanism of memcached).

2. If session reads much more than writes, you can add a local cache such as Oscache before memcache to reduce the read operation to memcache, thus reducing network overhead and improving performance.

3. If you have a large number of users, you can use the memcached group and insert it into a memcached server with hashCode in the set method.

(3) use memcached-session-manager to manage session

There are several scenarios for clearing session:

(1) the memcached can be emptied once when the number of people is least in the early hours of the morning.

(2) set an expiration time for the objects stored in the cache, obtain the value of sessionId through the filter, and refresh the objects in memcached regularly. Objects that have not been refreshed for a long time are automatically cleared. (relatively complex, resource-consuming)

two。 The design of distributed cache: how to deal with the cache and cache changes in the environment of multiple Node?

To be continued...

3. The sharing of the database, when the amount of data is getting larger and larger and the data needs to be migrated, how can the business data processing layer adapt to the underlying changes for different sub-libraries, sub-tables (regions)?

Use DDL:Sharding expansion scheme-global increment + local hash hash

A large Internet application will inevitably go through a process from a single DB server, to Master/salve, to vertical partitioning (sub-database), and then to horizontal partitioning (sub-table, sharding) (with the continuous increase in the number of users, you will find that some tables in the system will become extremely large, such as the friend relationship table, the store parameter configuration table, etc., at this time whether to write or read the data of these tables It is a very energy-consuming thing for the database), and in this process, Master/salve and vertical partitioning is relatively easy, and the impact on the application is not great, but the table will cause some thorny problems, such as not being able to query data across multiple partitions join, how to balance the load of each shards and so on. At this time, we need a general DAL framework to shield the impact of the underlying data storage on the application logic, so that the access to the underlying data is transparent to the application.

Take the current situation of Taobao. Taobao is also switching from expensive high-end storage (minicomputer + ORACLE) to MYSQL. After switching to MYSQL, it is bound to encounter the problems of vertical partition (sub-library) and horizontal partition (Sharding). Therefore, Taobao has also developed its own TDDL (Taobao Distributed Data Layer) framework according to its own business characteristics. This framework mainly solves the transparency of the application of sub-database and table and the data replication between heterogeneous databases.

What technical architecture does Taobao use to achieve a high load on the site? The overall structure of Taobao uses the following measures:

One application is stateless (Taobao session framework)

Efficient use of caching (Tair)

Three applications split (HSF)

4. Database split (TDDL)

Five Asynchronous Communication (Notify)

6. Unstructured data storage (TFS,NOSQL)

7 Monitoring and early warning system

8. Unified management of configuration.

4. Why does the website of the Ministry of Railways log in and hang up, but not after entering?

When logging in, because there is not enough service for the query requests of the corresponding users, the load balancer is not enough, and the server is very busy, resulting in unable to log in. Fewer people log in, and the users who log in are basically within the load range of the site, so they will only be slow after logging in and will not hang up.

Use CDN, enough server clusters, load balancing, cache access to user information, through testing so that the system capacity can reach the 2kw level, you can let more users log in to the system. The real problem is not login, but the query and snatch of ticket after login. Queries can be solved through a separate query cluster service. The most difficult thing is the competition for the most limited resources (1. Train ticket status is real-time calculation, real-time update; 2. Train ticket resources are scarce, requiring tens of thousands of offline purchase ticket points, telephone subscription ticket and so on to be mutually exclusive. Each train ticket is unique, and the online sale of ticket is just one of the tens of thousands of ticket terminals that need to be consistent with other ticket systems.

Solution 1: set tolerance: absolutely not two people can book the same ticket, and see that there is a ticket, and click to place an order and say that there is no ticket this kind of error can be tolerated.

Solution 2: queue, asynchronously tell the front how many people, after the turn, specify the time to place the order (query the required ticket, the ticket to which the order is sent is locked, timeout kicks out)

Solution3: users who click effectively in 100w, randomly shake out the number of users who can load (10w)

After clicking on the order ticket, you go to the front analysis machine, which is responsible for calculating how many users can place an order with the machine behind it. For example, at present, 1 million people have clicked on the subscription ticket at the same time, but only 100000 people can be loaded behind it, then there is a random lottery program that shakes out 100000 people, and others return the prompt that "the system is busy, try again later." These 100000 people are loaded on 10 machines and can be queried, and when the designated car ticket (marked ClickSelectedTicket) is clicked, the ticket is distributed to different machines according to the car (actually MapReduce's idea). For example, 10, 000 people are located to order ticketT1, the system throws out 900 T1ticket, leaving 100 fault-tolerant (as the system gradually stabilizes, the number of fault-tolerant ticket can be reduced), and then everyone scrambles for locks and adopts optimistic offline locks. Detected when the order is finally submitted.

This is the answer to the question about how to solve the replication and sharing of session and the design of distributed cache. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.