What is the meaning of high availability and high concurrency mechanism in Redis 07/13 Update SLTechnology News&Howtos

What is the meaning of high availability and high concurrency mechanism in Redis

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the meaning of high availability and high concurrency mechanism in Redis. It has a certain reference value, and interested friends can refer to it. I hope you can learn a lot after reading this article.

1. High concurrency mechanism

We know that redis is based on single thread, and it can only carry tens of thousands in stand-alone mode, so how to improve its hundreds of thousands of high concurrent requests under big data, through the master-slave architecture of redis and the separation of read and write.

1. Master-slave replication

The configuration of redis master-slave replication is not emphasized, it mainly depends on the principle and process of master-slave replication: in the process of redis master-slave replication, a master host is needed as an administrator to build multiple slave slaves. When the slave slave tries to start, a command PSYNC is sent to the master host. If the slave slave is reconnected at this time, the data that the slave slave does not have will be copied from the master host, and full resynchronization will be triggered if it is connected for the first time. After triggering, the master host starts a process in the background to generate a RDB snapshot file, and stores the write operations of this time period in the cache. When the RDB file is generated, the RDB file is sent to the slave slave. After getting the file from the slave, it is first written to the disk and then loaded into memory. Finally, the master host sends the data cached in the memory to the slave at the same time. If multiple slave slaves are reconnected in the event of a master-slave network failure, then master will restart only one RDB to serve all slave.

Breakpoint resume: there is a replica offset in both master and slave and a master id in which offset is kept in backlog. When a network failure reconnects between the master and slave, it will find the corresponding place of the last replica offset to copy, and trigger full resynchronization if the corresponding offset is not found.

Complete process of ① replication

(1) when slave node starts, only master node information is saved, including master node host and ip, but the replication process does not start.

Where did master host and ip come from? the slaveof configuration in redis.conf

(2) there is a scheduled task within slave node to check every second whether there is a new master node to connect and replicate. If found, establish a socket network connection with master node.

(3) slave node sends ping command to master node

(4) password authentication. If requirepass is set for master, then salve node must send masterauth password to authenticate.

(5) master node performs full copy for the first time and sends all data to slave node

(6) master node continues to write commands and copy them asynchronously to slave node.

Core mechanisms related to ② data synchronization

Refers to the full copy performed when slave connects to msater for the first time, and the mechanism for some of your details in that process.

(1) both master and slave maintain an offset

Master will continue to accumulate in itself, offset,slave will continue to accumulate offset in itself.

Slave reports its own offset to master every second, and master saves the offset of each slave.

This is not to say that it is specifically used for full replication. It is mainly that both master and slave need to know the offset of their respective data in order to know the data inconsistency between each other.

(2) backlog

Master node has a backlog, which defaults to 1MB size

When master node copies data to slave node, it will also write a copy of the data synchronously in backlog.

Backlog is mainly used to do incremental replication with full replication interruption.

(3) master run id

Info server, you can see master run id

If you locate master node according to host+ip, it is unreliable. If master node restarts or the data changes, then slave node should be distinguished according to different run id, and full replication should be done if run id is different.

If you need to restart redis without changing run id, you can use the redis-cli debug reload command

(4) psync

Copy from the master node using psync from the slave node, psync runid offset

Master node will return response information according to its own situation. It may be that FULLRESYNC runid offset triggers full replication, or CONTINUE triggers incremental replication.

Full ③ replication

(1) master executes bgsave to generate a rdb snapshot file locally

(2) master node sends the rdb snapshot file to salve node. If the rdb replication time exceeds 60 seconds (repl-timeout), then slave node will think that the replication failed, and you can adjust this parameter appropriately.

(3) for machines with gigabit network cards, 100MB of 6G files are generally transmitted per second, which is likely to exceed 60s.

(4) when master node generates rdb, it caches all new write commands in memory. After salve node saves the rdb, it copies the new write commands to salve node.

(5) client-output-buffer-limit slave 256MB 64MB 60, if the memory buffer consumption continues to exceed 64MB during replication, or if it exceeds 256MB at one time, then replication stops and replication fails.

(6) after receiving the rdb, slave node clears its old data, then reloads rdb into its own memory, and provides services based on the old data version.

(7) if AOF is enabled in slave node, BGREWRITEAOF will be executed immediately and AOF will be rewritten

It takes time to generate rdb, copy rdb over the network, clean up old slave data, and slave aof rewrite.

If the amount of replicated data is between 4G~6G, it is likely that the full replication time will take from one and a half to two minutes.

④ incremental replication

(1) if the master-slave network connection is disconnected during the full replication process, incremental replication will be triggered when salve reconnects to master.

(2) master obtains part of the lost data directly from its own backlog and sends it to slave node. The default backlog is 1MB.

(3) msater acquires data from backlog according to the offset in psync sent by slave

⑤ heartbeat

Master and slave nodes send heartbeat messages to each other.

By default, master sends every 10 seconds. Heartbeat,salve node sends a heartbeat every 1 second.

⑥ asynchronous replication

Every time master receives a write command, it now writes data internally and then asynchronously sends it to slave node

two。 Read-write separation: master is responsible for write operations, and slave is responsible for helping master reduce the amount of access queries

II. High availability mechanism

In the case of high concurrency, equipped with multiple clusters with one master and multiple backups can solve the problem of high concurrency, but there is only one host. If the master is down, the whole system cannot be written, and the slave cannot synchronize data. The whole system will be paralyzed and the whole system will be unavailable. Redis high availability mechanism Sentinel mechanism, Sentinel is an important component of the redis cluster, he is responsible for cluster monitoring, information notification, failover, configuration center.

(1) Cluster monitoring, which is responsible for monitoring whether redis master and slave processes are working properly.

(2) message notification: if a redis instance fails, the sentry is responsible for sending a message to the administrator as an alarm notification.

(3) failover. If the master node fails, it will be automatically transferred to the slave node.

(4) configuration center, if a failover occurs, notify the client client of the new master address

The Sentinel itself is distributed, works as a cluster, and needs to work together.

When master node is found to be down, it will require the consent of most sentinels, which involves distributed elections.

The sentinel mechanism needs to guarantee at least three nodes to ensure its robustness. If we only give two nodes in the test, one master node and one slave node, then there is a sentinel in both nodes responsible for monitoring. When the master host goes down, then a sentry is needed to conduct the election. Then the Sentinel in master node can no longer work and can only be elected by the S2 sentry in the slave node. Then a sentry is required to fail over after the election, and its majority parameter specifies the number of sentinels required for failover. In this case, only one S2 sentry has no majority to fail over. So at least 3 nodes are needed to ensure its robustness.

III. High availability and high concurrency of data loss

(1) data loss caused by asynchronous replication

Because the replication of master-> slave is asynchronous, some of the data may go down before it is replicated to the slave,master, and some of the data will be lost.

(2) data loss caused by cerebral fissure

Brain fissure, that is, a master machine suddenly disconnects from the normal network and cannot connect to other slave machines, but in fact master is still running

At this point, the Sentinel may think that the master is down, then start the election and switch the other slave to master.

At this time, there will be two master in the cluster, the so-called brain fissure.

At this point, although a slave has been switched to master, the client may not have time to switch to the new master, and the data that continues to write to the old master may also be lost.

So when the old master is restored again, it will be hung on the new master as a slave, and its own data will be emptied and copied again from the new master.

Resolve data loss caused by asynchronous replication and brain fissure

Min-slaves-to-write 1 min-slaves-max-lag 10

At least one slave is required, and the delay for data replication and synchronization cannot exceed 10 seconds.

Once all slave, data replication and synchronization delays exceed 10 seconds, then master will not receive any requests at this time.

The above two configurations can reduce data loss caused by asynchronous replication and brain fissure.

(1) reduce data loss in asynchronous replication

With the configuration of min-slaves-max-lag, you can ensure that if slave replicates data and ack latency is too long, it is thought that master may lose too much data after downtime, so write requests are rejected, which can reduce the data loss caused by some data not synchronized to slave in the event of master downtime within the controllable range.

(2) reduce the data loss of cerebral fissure.

If a master has a brain fissure and loses its connection with another slave, the above two configurations can ensure that if you cannot continue to send data to a specified number of slave, and the slave does not give itself an ack message for more than 10 seconds, then the client's write request will be rejected directly.

In this way, the old master after cerebral fissure will not accept the new data from client, thus avoiding data loss.

The above configuration ensures that if you lose a connection with any slave and find that no slave gives you ack after 10 seconds, then reject a new write request.

Thank you for reading this article carefully. I hope the article "what is the meaning of high availability and high concurrency mechanism in Redis" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.