How to build a highly available Redis service architecture for three machines 07/08 Update SLTechnology News&Howtos

How to build a highly available Redis service architecture for three machines

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article is about how three machines build a highly available Redis service architecture. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Memory-based Redis should be the most commonly used key-value database in all kinds of Web development business.

We often use it to store user login status (Session storage) in our business, accelerate the query of some hot data (compared with MySQL, the speed is an order of magnitude higher), do simple message queues (LPUSH and BRPOP), subscribe to publish (PUB/SUB) systems, and so on.

Large-scale Internet companies generally have special teams to provide Redis storage to various business calls in the form of basic services.

But one of the questions that any provider of a basic service will be asked by the caller is: is your service highly available? Don't cause my business to suffer because of the frequent problems with your service.

Recently, I have built a small set of "highly available" Redis services in my project. I would like to make my own summary and reflection here.

First of all, we need to define what is highly available for Redis services, that is, the service can still be provided normally in case of various exceptions, or it can be restored to normal service after a very short period of time in case of exceptions.

Exceptions should include at least the following three possibilities:

A process of a node server suddenly down down, for example, a developer disabled, a server's redis-server process kill.

If the down of a node server goes down, it means that all processes on this node have stopped. For example, a certain operator has disabled hands and unplugged a server. For example, some old machines have hardware failures.

The communication between any two node servers is interrupted, for example, a temporary worker is disabled and the optical cable used for communication between the two computer rooms is cut off.

In fact, any of the above exceptions are small probability events, and the basic guiding idea of achieving high availability is that the probability of multiple low probability events occurring at the same time can be ignored, as long as the system we design can tolerate a single point of failure in a short period of time, high availability can be achieved.

For building highly available Redis services, there are many solutions on the Internet, such as Keepalived, Codis, Twemproxy, Redis Sentinel.

Codis and Twemproxy are mainly used in large-scale Redis clusters, and they are also open source solutions provided by Twitter and Pea Pod before the official release of Redis Sentinel by Redis.

The amount of data in my business is not large, so it is a waste of machines to do cluster service. In the end, I made a choice between Keepalived and Redis Sentinel and chose the official solution, Redis Sentinel.

Redis Sentinel can be understood as a process of monitoring whether the Redis Server service is normal, and once an anomaly is detected, the slave Redis Server can be enabled automatically, making external users unaware of the exceptions that occur within the Redis service.

Let's follow the steps from simple to complex to build the smallest highly available Redis service.

Option 1: stand-alone Redis Server, no Sentinel

In general, we build a personal website or usually do development, there will be a single instance of Redis Server.

The caller can connect directly to the Redis service, even if Client and Redis themselves are on the same server.

This combination is only suitable for personal learning and entertainment, after all, there will always be a single point of failure that cannot be solved.

Once the Redis service process dies, or server 1 goes down, the service is unavailable. And if Redis data persistence is not configured, the data already stored inside Redis will be lost.

Solution 2: master-slave synchronous Redis Server, single instance Sentinel

In order to achieve high availability and solve the single point of failure problem described in solution 1, we must add a backup service, that is, we must start a Redis Server process on each of the two servers, which is typically served by master, and slave is only responsible for synchronization and backup.

At the same time, start an additional Sentinel process to monitor the availability of the two Redis Server instances, so as to promote slave to the role of master and continue to provide services in time when master dies, thus achieving the high availability of Redis Server.

This is based on the design of a highly available service, that is, a single point of failure itself is a small probability event, while multiple single points of failure (that is, master and slave are down at the same time) can be considered (basically) impossible.

For the caller of the Redis service, it is now the Redis Sentinel service that needs to be connected, not the Redis Server.

A common invocation process is that client first connects to Redis Sentinel and asks which service in the current Redis Server is master and which is slave, and then connects to the corresponding Redis Server to operate.

Of course, the current third-party libraries generally have implemented this calling process, and we no longer need to implement it manually (for example, Nodejs's ioredis,PHP, predis,Golang 's go-redis/redis,Java 's jedis, etc.).

However, after we have implemented the master-slave switching of Redis Server services, a new problem has been introduced, that is, Redis Sentinel itself is a single point service, and once the Sentinel process dies, the client will not be able to link Sentinel. Therefore, the configuration of scenario 2 cannot achieve high availability.

Solution 3: master-slave synchronous Redis Server, dual-instance Sentinel

To solve the problem of solution 2, we also start an additional Redis Sentinel process, and the two Sentinel processes provide the client with the function of service discovery at the same time.

For the client, it can connect to any Redis Sentinel service to get basic information about the current Redis Server instance.

Usually, we will configure multiple Redis Sentinel link addresses on the Client side. Once Client finds that one address cannot be connected, it will try to connect to other Sentinel instances.

Of course, we don't need to do this manually, and the popular Redis connection libraries in various development languages help us achieve this function.

Our expectation is that even if one of the Redis Sentinel is down, there is another Sentinel that can provide the service.

However, the vision is beautiful, the reality is very cruel. Under this architecture, it is still impossible to achieve the high availability of Redis services.

In the diagram of scenario 3, the red line is the communication between the two servers, and the exception scenario (exception 2) we envision is: if a server is down as a whole, you might as well assume that server 1 is down, and only the Redis Sentinel and slave Redis Server processes on server 2 are left.

At this time, Sentinel will not switch the remaining slave to master to continue the service, which makes the Redis service unavailable, because the Redis setting is that master-slave switching will only occur when more than 50% of the Sentinel processes can be connected and vote for a new master.

In this example, only one of the two Sentinel can be connected, which is equal to 50% and is not in a scenario where you can switch between master and slave.

You might ask, why does Redis have this 50% setting? Suppose we allow master-slave switching in scenarios where Sentinel connectivity is less than or equal to 50%?

Imagine exception 3, that is, a network outage between server 1 and server 2, but the server itself can run, as shown in the following figure:

In fact, for server 2, the direct downtime of server 1 has the same effect as the failure of server 1 to connect to the network, and all of a sudden no communication can be made.

Suppose we allow server 2's Sentinel to switch from slave to master when the network goes down, and the result is that you now have two Redis Server that can provide services.

Any addition, deletion or modification of Client may fall on the Redis of server 1 or on the Redis of server 2 (depending on which Sentinel the Client is connected to), causing data confusion.

Even if the network between server 1 and server 2 is restored, we will not be able to unify the data (two different pieces of data, who should we trust? The data consistency is completely broken

Solution 4: master-slave synchronous Redis Server, three instances Sentinel

Since there is no way to achieve high availability in scenario 3, our final version is scenario 4 shown in the figure above, which is actually the architecture we finally built.

We introduced server 3 and built another Redis Sentinel process on top of 3, and now three Sentinel processes manage two Redis Server instances.

In this scenario, whether it is a single process failure, a single machine failure, or a network communication failure of two machines, you can continue to provide Redis services.

In fact, if your machine is idle, you can also turn on a Redis Server on server 3 to form a 1 master + 2 slave architecture.

There are two backups for each data, and the availability will be improved a little. Of course, it is not that the more slave, the better. After all, master-slave synchronization also takes time.

In scenario 4, servers 2 and 3 switch slave to master once the communication between server 1 and other servers is completely interrupted.

For the client, there will be two master services at this moment, and once the network is restored, all new data that falls on server 1 during the outage will be lost.

If you want to partially solve this problem, you can configure the Redis Server process to stop service immediately when it detects a problem with its network, so as to avoid new data coming in during a network failure (see Redis's min-slaves-to-write and min-slaves-max-lag configuration items).

At this point, we have built a highly available Redis service with three machines. In fact, there is a more machine-saving way on the Internet, which is to put a Sentinel process on the Client machine, rather than on the service provider's machine.

It's just that in the company, the general service provider and caller do not come from the same team. When two teams operate the same machine together, it is easy to misoperate because of communication problems, so for this human consideration, we still adopt the architecture of option 4.

And because there is only one Sentinel process running on server 3, it does not consume much server resources, so you can also use server 3 to run some other services.

Ease of use: use Redis Sentinel like a stand-alone version of Redis

As service providers, we always talk about user experience. Among the above solutions, there is always a place where the Client side is not so comfortable.

For the stand-alone Redis,Client end to connect directly to Redis Server, we only need to give an ip and port,Client to use our service.

After being transformed into Sentinel mode, Client has to adopt some external dependency packages that support Sentinel mode, and has to modify its own Redis connection configuration, which is obviously unacceptable to "hypocritical" users.

Is there a way to provide services to Client with only a fixed ip and port, just like using a stand-alone version of Redis?

The answer is, of course, yes. This may involve the introduction of virtual IP (Virtual IP,VIP), as shown in the figure above.

We can point the virtual IP to the server where the Redis Server master resides. When the Redis master / slave switch occurs, a callback script will be triggered to switch the VIP to the server where the slave resides.

So for the Client side, it seems that he is still using a stand-alone version of the highly available Redis service.

Thank you for reading! This is the end of the article on "how to build a highly available Redis service architecture for three machines". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.