How to save more data with Redis 07/01 Update SLTechnology News&Howtos

How to save more data with Redis

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to save more data with Redis". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

I have encountered such a requirement: to use Redis to save 50 million key-value pairs, each key-value pair is about 512B, in order to be able to quickly deploy and provide services to the outside, we use cloud hosts to run Redis instances, then, how to choose the memory capacity of cloud hosts?

I roughly calculated that the memory space occupied by these key-value pairs is about 25GB(50 million *512B). So, at the time, the first solution I thought of was to choose a cloud host with 32GB of memory to deploy Redis. Because 32GB of memory can save all the data, and there are still 7GB left, which can ensure the normal operation of the system. At the same time, I used RDB to persist the data to ensure that if the Redis instance failed, the data could still be recovered from the RDB.

However, in the process of using it, I found that Redis 'response was sometimes very slow. Later, we used INFO to check Redis's latest_fork_usec metric value (indicating the time taken for the last fork), which turned out to be extremely high, almost to the second level.

This is related to Redis persistence mechanism. When persisting with RDB, Redis forks child processes to complete, the time of fork operations is positively correlated with the amount of Redis data, and fork blocks the main thread when executed. The larger the amount of data, the longer the fork operation causes the main thread to block. Therefore, when using RDB to persist 25GB of data, the amount of data is large, and the child process running in the background blocks the main thread when the fork is created, thus causing Redis to respond slowly.

It seems that the first option is obviously not feasible and we must look for other options. At this point, we noticed the slice clusters of Redis. Although tiling clusters are cumbersome, they can hold large amounts of data with little blocking impact on the Redis main thread.

Slice cluster, also known as fragmentation cluster, refers to starting multiple Redis instances to form a cluster, and then dividing the received data into multiple parts according to certain rules, each of which is stored with an instance. Going back to our scenario, if you divide 25GB of data into 5 equal parts (of course, you can also not do equal parts) and use 5 instances to save, each instance only needs to save 5 GB of data. As shown below:

Then, in a slice cluster, when an instance generates an RDB for 5GB of data, the data volume is much smaller, and the fork child process generally does not cause a long block to the main thread. By using multiple instances to store slices of data, we were able to store 25GB of data without the sudden slow response caused by fork child processes blocking the main thread.

When Redis is actually implemented, as users or businesses scale, it is often inevitable to save a large amount of data. Cluster slicing is a very good solution. In this lesson, we will learn.

How do I save more data?

In the case just now, in order to save a large amount of data, we used two methods: large memory cloud host and slice cluster. In fact, these two methods correspond to Redis's two solutions to cope with the increase in data volume: scale up and scale out.

Scale-up: Upgrade the resource configuration of a single Redis instance, including increasing memory capacity, increasing disk capacity, and using a higher configuration CPU. As shown in the figure below, the original instance memory is 8GB and the hard disk is 50GB. After vertical expansion, the memory increases to 24GB and the disk increases to 150GB.

Scale-out: Scale-out increases the number of current Redis instances, just like in the figure below, where one instance with 8GB memory and 50GB disk is now used with three instances with the same configuration.

So, what are the advantages and disadvantages of these two methods?

First of all, the advantage of vertical scaling is that it is simple and straightforward to implement. However, this plan also faces two potential problems.

The first problem is that when persisting data using RDB, the main thread may block while forking child processes if the amount of data increases and the memory required increases (as in the previous example). However, if you don't require persistence of Redis data, scaling up is a good option.

But then you have a second problem: vertical scaling is limited by hardware and cost. This is easy to understand; after all, scaling memory from 32GB to 64GB is easy, but scaling to 1TB comes with hardware capacity and cost constraints.

Horizontal scaling is a more scalable solution than vertical scaling. This is because if you want to save more data, you only need to increase the number of Redis instances, and you don't have to worry about the hardware and cost of a single instance. Scale-out Redis tile clusters are a good choice for millions and tens of millions of users.

However, when using only a single instance, it is very clear where the data exists and where the client accesses it, but slicing clusters inevitably involves distributed management of multiple instances. To make tiling clusters work, we need to solve two major problems:

After slicing, how is the data distributed among multiple instances?

How does the client determine on which instance the data it wants to access resides?

Next, we'll take it one by one.

Correspondence between data slices and instances

In a tile cluster, data needs to be distributed across different instances, so how do data and instances correspond? This is related to the Redis Cluster scheme that I'm going to talk about next. However, we need to understand the connection and difference between Slice Cluster and Redis Cluster first.

In fact, tile clustering is a general mechanism for storing large amounts of data, and this mechanism can be implemented in different ways. Prior to Redis 3.0, there was no official solution for tiling clusters. Starting with 3.0, officials provide a solution called Redis Cluster for implementing tiled clusters. The Redis Cluster schema specifies the rules for mapping data and instances.

Specifically, the Redis Cluster scheme uses hash slots (I will call them slots directly in the future) to handle the mapping relationship between data and instances. In the Redis Cluster scheme, a tile cluster has 16384 hash slots, which are similar to data partitions, and each key-value pair is mapped to a hash slot based on its key.

The mapping process is divided into two steps:

Firstly, according to the key of the key value pair, a 16-bit value is calculated according to the CRC16 algorithm;

This 16-bit value is then used to modulo 16384 to obtain moduli in the range 0 to 16383, each moduli representing a corresponding number of hash slots.

About CRC16 algorithm, if interested! Googel can be searched by itself.

So, how are these hash slots mapped to specific Redis instances?

When we deploy Redis Cluster solution, we can use cluster create command to create cluster. At this time, Redis will automatically distribute these slots evenly on cluster instances. For example, if there are N instances in the cluster, then the number of slots on each instance is 16384/N.

Of course, we can also use the cluster meet command to manually establish connections between instances to form clusters, and then use the cluster addslots command to specify the number of hash slots on each instance.

How does the client locate data?

When locating key-value data, the hash slot in which it is located can be calculated, and this calculation can be performed when the client sends the request. However, to locate instances further, you also need to know on which instance the hash slots are distributed.

In general, after the client and cluster instance establish a connection, the instance sends the hash slot allocation information to the client. However, when the cluster is first created, each instance only knows which hash slots it has been assigned, and does not know the hash slot information owned by other instances.

So, why can the client get all the hash slot information when accessing any instance? This is because Redis instances distribute their hash slot information to other instances connected to it to complete the diffusion of hash slot allocation information. When instances are connected to each other, each instance has a mapping of all hash slots.

After the client receives the hash slot information, it will cache the hash slot information locally. When a client requests a key-value pair, it computes the hash slot corresponding to the key, and then sends the request to the corresponding instance.

"How to save more data with Redis" is introduced here. Thank you for reading it. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.