If 200000 users access a hotspot cache at the same time, how do you optimize your cache architecture? 07/19 Update SLTechnology News&Howtos

If 200000 users access a hotspot cache at the same time, how do you optimize your cache architecture?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Catalogue

(1) Why to use cache cluster

(2) the problem of 200000 users accessing a hotspot cache at the same time

(3) automatic cache hotspot discovery based on streaming computing technology

(4) the hotspot cache is automatically loaded as the JVM local cache

(5) current-limiting fuse protection

(6) Summary

(1) Why to use cache cluster

In this article, let's talk about the architecture optimization of hotspot caching.

In fact, when using cache clusters, what I fear most is hot key and big value. What is hot key and big value?

To put it simply, hot key means that a key in your cache cluster is instantly overwhelmed by tens of thousands or even tens of thousands of concurrent requests.

Big value means that the value corresponding to one of your key may have the size of GB, which leads to network-related problems when querying value.

In this article, let's talk about the hot key problem. Let's take a look at the picture below.

To put it simply, suppose you have a system on hand, which is deployed by a cluster, and then there is a cache cluster behind it, whether you use redis cluster, memcached, or a company-developed cache cluster.

So what does this system do with cache clusters?

It's very simple, put some unchangeable data in the cache, and then when users query a large amount of data that usually doesn't change, can't they just walk out of the cache?

The concurrency of the cache cluster is very strong, and the performance of the read cache is very high.

For example, if you have 20, 000 requests per second, but 90% of them are read requests, then 18000 requests per second are reading unchanged data rather than writing data.

Do you think it is appropriate for you to put all the data in the database and send 20,000 requests to the database every second to read and write the data?

Of course, it is not appropriate, if you want to use the database to carry 20,000 requests per second, then sorry, you will probably have to do sub-database sub-table + read-write separation.

For example, you score 3 master libraries, carrying 2000 write requests per second, and then each master library hangs 3 slave libraries, a total of 9 slave libraries host 18000 read requests per second.

In this case, you may need a total of 12 highly configured database servers, which is very expensive, very expensive, and very inappropriate.

Let's look at the picture below to experience this situation.

Therefore, at this time, you can put the normally unchanged data in the cache cluster, the cache cluster can use 2 masters and 2 slaves, the master node is used to write to the cache, and the slave node is used to read the cache.

To cache the performance of the cluster, two slave nodes can be used to hold a large number of reads of 18000 per second, and then the three database master libraries can hold 2000 write requests per second and a small number of other read requests.

If you look at the figure below, the machine you consume has instantly turned into 4 cache machines + 3 database machines = 7 machines, is it a lot less resource cost than the previous 12 machines?

Yes, caching is a very important part of the system architecture. In many cases, for those data that rarely change but have a large number of high concurrent reads, it is very appropriate to use cache clusters to resist high concurrent reads.

All the number of machines and the number of concurrent requests here are an example. The main purpose of this is to give a background explanation to some students who are not familiar with caching-related technologies, so that these students can understand what it means to carry read requests with a cache cluster in the system.

(2) the problem of 200000 users accessing a hotspot cache at the same time

Well, the background has been explained to you, so now we can talk about the key issue to be discussed today: hot cache.

Let's assume that you now have 10 cache nodes to resist a large number of read requests. Normally, read requests should fall evenly on 10 cache nodes, right?

These 10 cache nodes are about the same for carrying 10,000 requests per second.

Then let's assume that a node carrying 20, 000 requests is the limit, so generally you limit a node to ok if it normally carries 10, 000 requests, leaving a little bit of buffer out.

Okay, what does the so-called hotspot caching problem mean?

It is very simple that suddenly for inexplicable reasons, a large number of users access the same cached data.

For example, if a star suddenly announces his marriage to so-and-so, will it trigger hundreds of thousands of users to check the news that the star is married to so-and-so in a short period of time?

So suppose that the news is a cache, and then the corresponding is a cache key, which is stored on a cache machine, and instantly assume that there are 200000 requests running to a key on that machine.

What happens now? Let's take a look at the picture below to experience this feeling of despair.

At this time, it is obvious that what we have just assumed is a cache Slave node with a maximum of 20,000 requests per second. Of course, it is also possible for the actual cache to carry 50-100000 read requests on a single machine. Here is an assumption.

At this point, what happens when you suddenly come to this machine with 200000 requests per second?

Quite simply, the cache machine pointed to by the 200000 request in the image above will be overworked and down.

So what happens if the cache cluster starts to experience machine downtime?

Then, when the read request finds that the data cannot be read, the original data is extracted from the database and put into the remaining cache machines. But the ensuing 200000 requests per second will once again overwhelm other cache machines.

And so on, it eventually leads to the complete collapse of the cache cluster and the overall downtime of the system.

Let's look at the picture below and feel the horrible scene again.

(3) automatic cache hotspot based on streaming computing technology

In fact, the key point here is that for this kind of hot cache, your system needs to be able to find it directly when it suddenly occurs, and then immediately achieve automatic load balancing at the millisecond level.

So let's start with, how do you automatically find hotspot caching problems?

First of all, you should know that when there is a cache hotspot, your concurrency per second must be very high, possibly hundreds of thousands or even millions of requests per second, which is possible.

Therefore, at this time, we can count the number of real-time data access based on big data's streaming computing technology, such as storm, spark streaming, flink, these technologies are all possible.

Then, once a piece of data is suddenly accessed more than 1000 times in the process of real-time data access statistics, for example, within a second, the data is immediately determined as hot data, and the discovered hot data can be written into zookeeper, for example.

Of course, how your system determines hot data can be based on your own business and experience values.

Let's take a look at the picture below to see how the whole process works.

Of course, someone will ask, will your streaming computing system also have the problem of saying that a single machine is requested hundreds of thousands of times per second when counting the number of data accesses?

The answer is no, because of streaming computing technology, especially in systems like storm, he can make the request for the same data come over, first disperse in many machines for local calculation, and then summarize the local calculation results to a machine for global summary.

So hundreds of thousands of requests can be spread over, say, 100 machines, each counting thousands of requests for this data.

Then 100 locally calculated results can be summarized into a machine to do global calculation, so there will be no hot issues based on streaming computing technology.

(4) the hotspot cache is automatically loaded as the JVM local cache

Our own system can listen to the znode corresponding to the hotspot cache specified by zookeeper, and he can sense it immediately if there is a change.

At this point, the system layer can immediately load the relevant cached data from the database and put it directly in the local cache within its own system.

For this local cache, you can use ehcache or hashmap, and everything depends on your own business needs. The main point is to change the centralized cache in the cache cluster into each system's own local cache, and each system cannot cache too much data locally.

Because in general, this kind of ordinary system single instance deployment machine may be a 4-core 8G machine, leaving little space for local cache, so the local cache used to put this kind of hot data is the most appropriate, just right.

Suppose your system-level cluster deploys 100 machines, then all of your 100 machines will instantly have a copy of the hotspot cache locally.

Then the next read operation on the hotspot cache is directly read out by the local cache of the system and is returned without having to go to the cache cluster.

In this way, it is not possible to allow 200000 read requests per second to reach one machine of the cache machine to read a hot cache, but instead, each machine carries thousands of requests, so there is no problem that thousands of requests will return data directly from the machine's local cache.

Let's draw another picture and take a look at the process:

(5) current-limiting fuse protection

In addition, a current-limiting circuit breaker protection measure for hot spot data access should be added within each system.

Inside each system instance, you can add a circuit breaker protection mechanism. Assuming that the cache cluster can hold up to 40,000 read requests per second, then you have a total of 100 system instances.

You should limit yourself. Each system instance can request no more than 400 cache cluster reads per second. Once the request is exceeded, it can be melted off, no request for cache cluster is allowed, and a blank message is returned directly. Then the user will refresh the page again later.

Through the system layer to directly add current-limiting circuit-breaker protection measures, we can well protect the cache cluster, database cluster and so on from being killed. Let's take a look at the following figure.

(6) Summary of this paper

Do you want to implement this complex cache hotspot optimization architecture in the system? This depends on whether there is such a scene in your own system.

If your system has hotspot caching issues, implement a complex hotspot caching support architecture similar to this one.

But if not, don't overdesign. In fact, your system may not need such a complex architecture at all.

If it is the latter, then you should take a look at this article to understand the corresponding architectural ideas ^ _ ^

Original link: https://mp.weixin.qq.com/s/RqBla4rg8ut3zEBKhyBo1w

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.