How to realize the capacity expansion of Redis Cluster 07/04 Update SLTechnology News&Howtos

How to realize the capacity expansion of Redis Cluster

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to achieve Redis cluster expansion". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to achieve Redis cluster expansion".

I. background

Ctrip Redis cluster size and data scale have grown rapidly in the past few years. We have solved the problem of rapid deployment of Redis clusters through containerization, and made a series of attempts according to the actual business, such as secondary scheduling, automatic drift, etc., to ensure the reliability of host hosts in the case of excessive memory.

In terms of capacity expansion, we mainly solve the problem of Redis cluster capacity by vertical expansion, but with the expansion of cluster scale, this method gradually encountered a bottleneck. On the one hand, a single Redis instance is too large, which will bring greater risks and difficulties in operation and maintenance; on the other hand, the host capacity is limited and cannot be expanded endlessly. Considering the balance between operation and maintenance convenience and resource utilization, we want the upper limit of a single Redis instance to be 15GB.

However, it is difficult to do so in practice: some businesses develop rapidly and often need to expand the capacity of Redis, resulting in a single instance size much larger than 15GB; some businesses shrink, and the actual usage is much lower than the initial application, resulting in a waste of resources.

How to effectively control the size of Redis instances? Next, with this problem, this paper will gradually explain the evolution of Ctrip Redis governance and capacity expansion.

2. Redis horizontal expansion partition

For a long time since Ctrip started using Redis, there have been only vertical expansion and reduction for two reasons:

First, the scale of the business is relatively small at the beginning, and vertical expansion and reduction can meet the demand. For Redis, vertical scaling is only a configuration change of Maxmemory, which is transparent to the business.

Second, the difficulty and cost of horizontal split / expansion is high.

As mentioned in the previous article "the Evolution of Redis Governance in Ctrip", Ctrip accesses all Redis clusters using self-developed CRedis, while CRedis deployed on the application side accesses Redis instances that actually host data through consistent hash. However, consistent hash cannot support direct horizontal scaling. Because whether you add a node or delete a node, it will lead to the adjustment of the entire hash ring.

Figure 1

As shown in the figure, assume that there are four original shards (figure 1). When a node is added, it causes a part of the key that was originally written to the nodeC but is now written to the nodeE, that is, the node before it cannot be hit. From the client's point of view, key seems to have been lost. The more nodes that change, the more key will be lost. If a cluster is directly added from 10 shards to 20 shards, it will directly result in 50% key loss. Delete a node in the same way, and I won't repeat it.

Therefore, although consistent hash is a relatively simple and excellent cluster solution, the lack of direct horizontal expansion has always perplexed the operation and peacekeeping architecture team. To this end, the CRedis team proposed a horizontal split in 2019.

The idea of CRedis horizontal split is relatively simple, because adding nodes in the same horizontal location of the consistent hash will lead to data loss, so do not change the hash rules of the original hierarchical nodes, take a node as the starting point of the hash, and then conduct a consistent hash, which evolves into a tree structure (figure 2).

Figure 2

As shown in the figure above, the tree structure is extended from one layer to two tiers. If you continue to split the new leaf Group, you can extend the tree structure to three layers, and the split scheme can support ten layers. The leaf Group is a physical shard, which directly corresponds to the Redis instance, and the branch Group is a virtual shard. When the Hash hits the branch Group, there is no corresponding Redis instance that cannot be found. You need to continue searching until the leaf Group is found.

Figure 3

After the horizontal split of CRedis is launched, DBA splits most of the existing instances over 15 GB into smaller instances, relieving the OPS pressure on large memory instances over a period of time. However, with the rapid growth of the scale of Redis, large instance clusters continue to appear. In addition, the shortcomings of horizontal split of CRedis are gradually exposed:

The continuous period is very long, and if you split multiple Group, the data of each Group needs to copy several of the same instances at the same time. For example, for an instance of 60G (figure 3), if you want to split into one instance of 5G, the subordinate Group must have 12. Split must first synchronize the data of this instance into 12 60G instances, and then clean up the missed key of the 12 60G instances according to the hit rules of key, and eventually evolve into 12 5G instances. Generally speaking, it takes 3 hours to 6 hours for a 60 GB group instance to be split. If there are a lot of shards in a cluster and the time taken to observe the impact on the business, it may take several days or a week or two, and it can only be a serial operation with someone on duty.

Two migrations are required in the split process. As mentioned above, the memory requirement of the intermediate instance in the split is very large, and the memory demand will drop sharply after the split is completed, so each split involves two migrations. Although the migration will not affect the business, for the operation and maintenance personnel performing the split, the mental burden is relatively heavy, and carelessness can also lead to online accidents.

It cannot be restored after the split, that is, if the business shrinks after the split, the demand for Redis becomes smaller, but the actual split is still there, and the applied space has not been released, which objectively wastes resources and reduces the overall utilization of Redis.

It only supports capacity expansion, but does not support capacity reduction. As mentioned above, in addition to some clusters that are too large and need to be split, there are also some applications that far exceed the demand for instances that need to be reduced, and horizontal split can do nothing about this.

Each split will result in more performance loss, because the hash needs to be calculated one more time, which is not time-consuming, but it still has an impact on performance-sensitive businesses.

Thus it can be seen that although the horizontal split scheme has solved the problem of too large examples, the malpractice of not being able to scale down has gradually emerged. Especially in the context of the need to reduce costs and increase efficiency due to the impact of the epidemic this year, on the one hand, there are sufficient resources, and on the other hand, there are all examples that cannot be scaled down on the host. So is there a better solution? The answer is yes.

III. Horizontal expansion and reduction of Redis

1. Design ideas

Figure 4

Since shredding is difficult, the first thing we think of is the method of business double-writing clusters, that is, businesses write two new and old clusters at the same time. The number of shards of new and old clusters is different, and the size configuration is also different. For example, before applying for 4 shards, it is found that there is a surplus of resources, so let the business innovation apply for a new 2-shard cluster, and the business controls which cluster is written in grayscale (figure 4). Will eventually migrate to the new cluster, and the new cluster size is to meet the current business needs, so as to achieve the purpose of capacity reduction.

Although the solution of double-write cluster solves some of our problems, it invades the business deeply. In addition, because the double-write cluster introduces the time for business cooperation and observation, the overall process is also relatively long. Therefore, we need to find a better solution.

Since the business double-write cluster can meet the requirements, wouldn't it be better for the infrastructure to complete this part instead of the business? Drawing on the idea of business double-write cluster and the concept of cloud native immutable infrastructure, the first thing we think of is to replace the old cluster with the new cluster instead of modifying the cluster in place; in addition, in order to save Redis costs on the public cloud, we have accumulated practical experience of kvrocks, and designed an efficient solution for horizontal scale-up and reduction.

The core of this scheme is the introduction of an intermediate binlogserver based on kvrocks transformation, which is not only the Slave node of an old cluster, but also acts as a client of the new cluster. On the one hand, it replicates full and incremental data from Redis Master; on the other hand, it acts as a client, writing the replicated data to the new cluster according to the consistent HASH rules of the new cluster. The general steps are as follows, and the specific step flow can be seen in the figure below (figure 5).

According to the corresponding number of binlogserver of the current V1 cluster, and obtain the consistent HASH rules and group of the V2 cluster.

Each binlogserver becomes the Slave of Master in a single shard of V1 cluster. After executing salveof, the RDB file sent by Master in V1 is saved and parsed. For each RDB file, the parsing is restored to Redis command and written to V2 according to the consistent hash rules of CRedis. The commands transmitted by subsequent V1 cluster are also synchronized to V2.

When this process is completed and the binlog chase is about the same, for the sake of data consistency, you can stop writing V1 (the client reports an error) and then push the configuration of V2 by CRedis or directly (the client does not report an error but the data may be lost or inconsistent), and the app will switch to V2 sequentially. This process is completely transparent to the user and does not need to do anything.

Figure 5

Through Redis's horizontal scale-up program, we have solved several previous pain points:

The duration is greatly shortened, which is basically positively related to the size of the largest instance of V1 cluster, because it is executed concurrently and has nothing to do with the number of cluster shards. According to the actual OPS data, the single instance of the cluster is 20 GB, and the expansion and reduction of the cluster capacity can be completed within 10 minutes, while those less than 10 G can be completed in 5 minutes, which greatly shortens the period of expansion and reduction. And the business can complete the expansion and reduction without awareness. Since you can switch clusters in seconds, you can fall back quickly even if the expansion or reduction has an impact on the business, because the fallback only changes the routing direction of the cluster.

The expansion and reduction process only needs to switch the cluster direction once, migrate 0 times, and there is no intermediate state, and there is no need to split through large memory host.

For the expanded cluster, it is very convenient to reduce the capacity again and restore it back to the same way. Clusters that have been split horizontally can also be restored in this way.

You can either expand or scale down, or even migrate by cluster without capacity expansion or reduction, such as the cloud native network security control pilot project mentioned in "Ctrip Cilium+BGP Cloud Native Network practice". Since the instances under the original Redis cluster may be deployed on both the openstack network and the cilium network, the cloud native security can only control the instances under the cilium network. In this case, you need to migrate the Redis instances. If you migrate in groups according to the previous OPS mode, the whole project may take a long time and cost a lot of manpower, while horizontal expansion and downsizing can quickly migrate a cluster to the cilium network at one time, saving time and effort.

There is no performance loss after expansion and reduction.

2. Operation and maintenance data

In the four months since the horizontal expansion and reduction program was launched, it has successfully completed more than 200 capacity expansion and reduction. This year, the number of sudden requests for a business has skyrocketed more than ten times, and the relevant clusters have experienced many expansions, most of which are completed within 10 minutes, effectively supporting the development of the business.

On the other hand, for the clusters that apply for a very large number of shards but the actual usage is very small, we also quickly reduce the number of shards and the number of applications with the help of the ability of horizontal expansion and reduction. Through these capacity reduction, the overall resource utilization is effectively improved.

3. Some pits

(1) excessive size of a single key causes key to be expelled

In the actual scale-up process, we found that in some clusters, there may be a huge key (larger than 3G) in a single instance. Because the size of V2 cluster is the average calculated in real time based on V1 size, if an instance in V1 is too large, it may cause an instance written to V2 to be larger than the expected average size, thus causing some key to be expelled. Therefore, in view of this situation:

Strengthen the detection logic of large key, for more than 512m key will have an alarm email to inform the owner.

The maxmemory of all instances in V2 does not set a limit before the split, and is uniformly adjusted to 60G to prevent the uneven distribution of key in V2 from causing key expulsion.

After horizontal capacity expansion and reduction, during the switching process between V1 and V2, check whether the instance in V2 has been expelled. If so, the default split fails and the switch is not carried out.

(2) the expansion of mget will lead to performance degradation.

For very few scenarios, we also found that the time consuming of mget requests will increase significantly, mainly because the number of mget instances that need to be accessed before capacity expansion is less, while the number of instances accessed after split increases. In general, we recommend that the business control the number of key in a single mget, or change the string type to hash type, and access the data through hmget to ensure that only one instance is accessed at a time. In this way, the throughput increases linearly with the number of shards, but the delay does not increase after the expansion.

IV. Summary and future planning

1. Xpipe support

At present, a series of governance tools and strategies, such as horizontal expansion, drift and secondary scheduling, form a relatively perfect closed loop, which effectively supports the operation and maintenance management of producing thousands of hosts and tens of thousands of Redis instances with over-dividing capacity.

However, it is currently subject to the architecture of xpipe. For clusters connected to xpipe, the xpipe on Dr must be scaled up first and then completed manually, which is insufficient in automation, while it takes a long time to complete xpipe. For example, APP of Redis cluster that used to be the nearest server room may only be read across data centers for a period of time after capacity expansion, which will inevitably lead to an increase in delay. On the other hand, the rise of this delay will affect our judgment as to whether the logic of horizontal expansion is correct and whether it needs to be backed back. Therefore, in the future, we will focus on the xpipe cluster, which is the same as the ordinary cluster, that is, the V2 cluster is a cluster with DR architecture before expanding and shrinking write traffic.

2. Support for persistent KV storage

In addition to the fact that Redis itself is popular and widely used in business, we also find that some businesses need a more reliable KV storage method than Redis, such as keeping data on disk instead of memory, or business needs to support some increase or decrease inventory logic, exclusive access to a certain key to achieve semantic similar INCRBY operation, but actually merge operation on some strings. In addition, higher data reliability requirements, master downtime can not lose data and so on.

Thank you for reading, the above is the content of "how to achieve Redis cluster expansion". After the study of this article, I believe you have a deeper understanding of how to achieve Redis cluster expansion, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.