How to build a highly available Redis Service Architecture 03/20 Update SLTechnology News&Howtos

How to build a highly available Redis Service Architecture

2026-03-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to build a highly available Redis service architecture. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Redis is an open source API that is written in ANSI C language, supports the network, can be memory-based and persistent, Key-Value database, and provides multiple languages. Since March 15, 2010, the development of Redis has been presided over by VMware. Since May 2013, the development of Redis has been sponsored by Pivotal.

But one of the questions that any provider of a basic service will be asked by the caller is: is your service highly available? It is best not to suffer from the business on my side because of the frequent problems with your service. Recently, I have also built a small set of "high availability" Redis services in my project. Here, I would like to make my own summary and reflection.

First of all, we need to define what is highly available for Redis services, that is, the service can still be provided normally in the event of various exceptions. Or a little more relaxed, in the event of abnormal circumstances, only after a very short period of time to restore normal services. Exceptions should include at least the following possibilities:

[exception 1] A process on a node server suddenly down down (for example, a developer disabled and kill the redis-server process of a server)

[exception 2] when the down of a node server goes down, all processes on this node are stopped (for example, if an operator has a disabled hand, unplug a server; for example, some old machines have hardware failures)

[exception 3] the communication between any two node servers is interrupted (for example, a temporary worker is disabled and the optical cable used for communication between the two computer rooms is cut off)

In fact, any of the above exceptions are low-probability events, and the basic guiding idea of high availability is that the probability of multiple low-probability events occurring at the same time can be ignored. As long as the system we designed can tolerate a single point of failure in a short period of time, high availability can be achieved.

For building highly available Redis services, there are many solutions on the Internet, such as Keepalived,Codis,Twemproxy,Redis Sentinel. Codis and Twemproxy are mainly used in large-scale Redis clusters, and they are also open source solutions provided by twitter and Pea Pod before the official release of Redis Sentinel by Redis. The amount of data in my business is not large, so it is a waste of machines to do cluster service. In the end, I made a choice between Keepalived and Redis Sentinel and chose the official solution, Redis Sentinel.

Redis Sentinel can be understood as a process of monitoring whether the Redis Server service is normal, and once an anomaly is detected, the slave Redis Server can be enabled automatically, making external users unaware of the exceptions that occur within the Redis service. We follow the steps from simple to complex to build the smallest highly available Redis service.

Solution 1: master-slave synchronous Redis Server, three instances Sentinel

Since there is no way to achieve high availability in option 3, our final version is scenario 4 shown in the figure above. In fact, this is the architecture we finally built. We introduced server 3 and built another Redis Sentinel process on top of 3, and now three Sentinel processes manage two Redis Server instances. In this scenario, whether it is a single process failure, a single machine failure, or a network communication failure of two machines, you can continue to provide Redis services.

In fact, if your machine is idle, you can also turn on a Redis Server on server 3 to form a 1 master + 2 slave architecture, with two backups for each data, which will improve the availability. Of course, it is not that the more slave, the better. After all, master-slave synchronization also takes time.

In scenario 4, servers 2 and 3 switch slave to master once the communication between server 1 and other servers is completely interrupted. For the client, there will be two master services at this moment, and once the network is restored, all new data that falls on server 1 during the outage will be lost. If you want to partially solve this problem, you can configure the Redis Server process to stop service immediately when it detects a problem with its network, so as to avoid new data coming in during a network failure (see Redis's min-slaves-to-write and min-slaves-max-lag configuration items).

At this point, we have built a highly available Redis service with three machines. In fact, there is a more machine-saving way to save machines online, which is to put a Sentinel process on the Client machine, rather than on the service provider's machine. It's just that in the company, the general service provider and caller do not come from the same team. When two teams operate the same machine together, it is easy to misoperate because of communication problems, so for this human consideration, we still adopt the architecture of option 4. And because there is only one Sentinel process running on server 3, it does not consume much server resources, so you can also use server 3 to run some other services.

Ease of use: use Redis Sentinel like a stand-alone version of Redis

As service providers, we always talk about user experience. There is always a place where the Client side is not so comfortable in the above scheme. For stand-alone Redis, the client connects directly to Redis Server, so we only need to give an ip and port,Client to use our service. After being transformed into Sentinel mode, Client has to adopt some external dependency packages that support Sentinel mode, and has to modify its own Redis connection configuration, which is obviously unacceptable to "hypocritical" users. Is there a way to provide services to Client with only a fixed ip and port, just like using a stand-alone version of Redis?

The answer is, of course, yes. This may involve the introduction of virtual IP (Virtual IP,VIP), as shown in the figure above. We can point the virtual IP to the server where the Redis Server master resides. When the Redis master / slave switch occurs, a callback script will be triggered to switch the VIP to the server where the slave resides. In this way, for the client side, it seems that he is still using a stand-alone version of the highly available Redis service.

Option 2: stand-alone Redis Server, no Sentinel

In general, we build a personal website, or usually do development, there will be a single instance of Redis Server. The caller can connect directly to the Redis service, even if Client and Redis themselves are on the same server. This combination is only suitable for personal learning and entertainment, after all, there will always be a single point of failure that cannot be solved. Once the Redis service process dies, or server 1 goes down, the service is unavailable. And if Redis data persistence is not configured, the data already stored inside Redis will be lost.

Solution 3: master-slave synchronous Redis Server, single instance Sentinel

In order to achieve high availability and solve the single point of failure problem described in solution 1, we must add a backup service, that is, we must start a Redis Server process on each of the two servers, which is generally served by master, and slave is only responsible for synchronization and backup. At the same time, start an additional Sentinel process to monitor the availability of the two Redis Server instances, so that when the master dies, the slave can be promoted to the role of master to continue to provide services, thus achieving the high availability of Redis Server. This is based on the design of a highly available service, that is, a single point of failure itself is a small probability event, while multiple single points of failure (that is, master and slave are down at the same time) can be considered (basically) impossible.

For the caller of the Redis service, it is now the Redis Sentinel service that needs to be connected, not the Redis Server. A common invocation process is that client first connects to Redis Sentinel and asks which service in the current Redis Server is master and which is slave, and then connects to the corresponding Redis Server to operate. Of course, the current third-party libraries generally have implemented this calling process, and we no longer need to implement it manually (for example, Nodejs's ioredis,PHP, predis,Golang 's go-redis/redis,JAVA 's jedis, etc.).

However, after we have implemented the master-slave switching of Redis Server services, a new problem has been introduced, that is, Redis Sentinel itself is a single point service, and once the Sentinel process dies, the client will not be able to link Sentinel. Therefore, the configuration of scenario 2 does not achieve high availability.

Solution 4: master-slave synchronous Redis Server, dual-instance Sentinel

To solve the problem of solution 2, we also start an additional Redis Sentinel process, and the two Sentinel processes provide the client with the function of service discovery at the same time. For the client, it can connect to any Redis Sentinel service to get basic information about the current Redis Server instance. Usually, we will configure multiple Redis Sentinel link addresses on the client side. Once Client finds that an address cannot be connected, it will try to connect to other Sentinel instances. Of course, we do not need to implement this manually. The popular redis connection libraries in various development languages help us to achieve this function. Our expectation is that even if one of the Redis Sentinel is down, there is another Sentinel that can provide the service.

However, the vision is beautiful, the reality is very cruel. Under this architecture, it is still impossible to achieve the high availability of Redis services. In the diagram of scenario 3, the red line is the communication between the two servers, and the exception scenario we envision ([exception 2]) is that a server is down. assume that server 1 is down, and only the Redis Sentinel and slave Redis Server processes on server 2 are left. At this time, Sentinel will not switch the remaining slave to master to continue the service, which makes the Redis service unavailable, because the Redis setting is that master-slave switching will only occur when more than 50% of the Sentinel processes can be connected and vote for a new master. In this example, only one of the two Sentinel can be connected, which is equal to 50% and is not in a scenario where you can switch between master and slave.

You might ask, why does Redis have this 50% setting? Suppose we allow master-slave switching in scenarios where Sentinel connectivity of less than or equal to 50% is allowed. Imagine [exception 3], that is, a network outage between server 1 and server 2, but the server itself is operational. As shown in the following figure:

In fact, for server 2, server 1 directly down down and server 1 network is not connected is the same effect, anyway, it is suddenly unable to carry out any communication. Suppose we allow server 2's Sentinel to switch from slave to master when the network goes down, and the result is that you now have two Redis Server that can provide services. Any addition, deletion or modification of Client may fall on the Redis of server 1 or on the Redis of server 2 (depending on which Sentinel the Client is connected to), causing data confusion. Even if the network between server 1 and server 2 is restored, we can't unify the data (two different pieces of data, who should we trust? The data consistency is completely broken

Conclusion

It is actually very simple to build any service and make it "usable", just like running a stand-alone version of Redis. But once you want to be "highly available", things get complicated. Two additional servers are used in the business, 3 Sentinel processes + 1 Slave process, just to ensure that the service is still available in the event of that low probability of accident. In the actual business, we also enable supervisor to monitor the process. Once the process exits unexpectedly, it will automatically try to restart.

This is the end of the plan for building a highly available Redis service architecture. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.