You call this a sentry? 07/03 Update SLTechnology News&Howtos

You call this a sentry?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

This article comes from the official account of Wechat: low concurrency programming (ID:dibingfa). Author: flash.

I am a miserable operation and maintenance staff, once the boss came to see me.

Boss: now there are four redis nodes in front of you, one master and three followers, you are responsible for keeping an eye on the point, the master node is hung up, you quickly find a way to bring it to the top of the node, give it to you!

It's not easy!

First of all, I will connect these four redis nodes respectively.

Redis-cli-h 10.232.0.0-p 6379redis-cli-h 10.232.0.1-p 6379redis-cli-h 10.232.0.2-p 6379redis-cli-h 10.232.0.3-p 6379 then send the redis-specific command PING every other second

I just kept sending PING commands day after day.

Finally, one day, the PING command sent to the master node received an invalid reply!

I immediately cheered up and began to operate.

But I didn't panic, and I soon sorted out the three things I was going to do.

Select a slave node and turn it into a master node. Which node should I choose? Don't worry about it, just pick one and take this one: 10.232.0.30.3.

I sent a command to this node.

10.232.0.3 slaveof no oneOK 6379 I think this node should have become the master node, but I'm not sure, so I sent another command to confirm.

10.232.0.3 inforole:slave 6379, it hasn't become the master node yet, so give him a little more time. A second later, I checked again.

10.232.0.3 inforole:master 6379 well, this time it has successfully become the master node, proceed to the next step!

It is easy to modify the subordinate master of the other slave nodes and send commands to the other two slave nodes.

10.232.0.1 slaveof 6379 6379OK10.232.0.2:6379 slaveof 10.232.0.3 6379OK the step of changing a dead master node into a slave node fully reflects my many years of operation and maintenance experience, which many people do not expect.

I can't ignore the original master node. if he comes back to life again, he has to be the slave node of the new master node.

10.232.0.0 6379 slaveof 10.232.0.3 6379 but I can't send this command directly to it, because it's still hanging, so I save the command and send it as soon as it comes back to life.

The whole three steps look like this.

After doing this many times, I finally became familiar with the whole process.

In order to liberate my own hands, I wrote this fixed process into a program.

This program can monitor the status of these redis nodes in real time, and can automatically report and handle emergencies. I named it the Sentinel program.

And this Sentinel program I deploy with a separate server, this server is called Sentinel node.

The Sentinel connected the four redis nodes from the beginning and continued what I had just done.

Optimization I also found a small optimization point. I don't need to know all the information about these four nodes, I just need to know the master node.

The information of the slave node can be obtained by sending the info command to the master node, and it can be updated constantly.

10.232.0.0 slave0:ip=10.232.0.3,port=6379,state=online 6379 > info...role:master...slave0:ip=10.232.0.1,port=6379,state=online... slave0:ip=10.232.0.2,port=6379,state=online. In this way, when I start the sentry, I only need to know the master node, and the slave node information obtained in this way is more accurate and real-time, so I don't have to keep asking the boss.

Although I can liberate my hands, I still didn't stop when I was in high spirits.

Just after the master node hung up, I randomly selected one of the three slave nodes as the master node. I might as well make this random node smarter, otherwise I always feel too low.

First of all, I list the main information of all the slave nodes (assuming that more nodes are convenient for analysis)

Node

Status

Time since the last reply

Copy offset

Uid

one

DISCONNECTED

eight

fifty

12345

two

DOWN

eight

fifty

12346

three

√

seven

fifty

12347

four

√

one

fifty

12348

five

DOWN

eight

fifty

12349

six

√

one

fifty

12350

Remove all disconnected or offline nodes first.

Node

Status

From the last time

Time to reply

Copy offset

Uid

one

DISCONNECTED

eight

fifty

12345

two

DOWN

eight

fifty

12346

three

√

seven

fifty

12347

four

√

one

fifty

12348

five

DOWN

eight

fifty

12349

six

√

one

fifty

12350

After the last ping request is removed, the unanswered time is more than 5s.

Node

Status

From the last time

Time to reply

Copy offset

Uid

one

DISCONNECTED

eight

fifty

12345

two

DOWN

eight

fifty

12346

three

√

seven

fifty

12347

four

√

one

fifty

12348

five

DOWN

eight

fifty

12349

six

√

one

fifty

12350

The remaining two, which are at least healthy nodes, continue to be selected for admission.

We compare the value of its replication offset, which represents how much data it successfully copied from the primary node, and choose the one with the largest replication offset, that is, the one closest to synchronization with the primary node.

Node

Status

From the last time

Time to reply

Copy offset

Uid

four

√

one

fifty

12348

six

√

one

fifty

12350

But we found that the offset is the same.

Up to now, the two nodes are exactly the same in terms of health or synchronization, and there is no way to tell who is good and who is bad, so what should we do?

It doesn't matter, there is also an ultimate weapon, that is, the unique identity uid, these two uid are guaranteed to be different when starting the node, we choose a relatively small one.

Node

Status

From the last time

Time to reply

Copy offset

Uid

four

√

one

fifty

12348

OK, which can finally identify a slave node only, turns it into a master node!

I wrote this complex process as a method, sentinelSelectSlave (), in the Sentinel program to select a slave node.

Well, now this program seems to be very perfect!

I relieved to start the Sentinel program, and for a long time after that, I relied on my Sentinel program to automatically respond to many emergencies, and once even quickly discovered and solved the problem at more than two o'clock in the morning.

The boss always praised me for sticking to my post and being so responsible in the middle of the night. I was quickly promoted.

Until once, when I was fishing happily, the boss came angrily.

Boss: redis has been hanging up for an hour! Why don't you deal with it! Uh? What are you looking at? Leetcode? Are you ready to change jobs?!

I looked confused and hastened to take a look at my Sentinel process. I wiped it, and the Sentinel server hung up!

I was demoted, but still responsible for watching these redis nodes, this time I dare not snub.

I continued to monitor the life and death of these nodes with the Sentinel program, but I had another task, which was to monitor the status of the Sentinel nodes as if they were back before liberation overnight.

How can I liberate my hands again and let the program help me monitor and handle the health of this Sentinel node?

I had an idea to deploy multiple sentinel nodes to become a Sentinel cluster! As long as one node is alive, the probability of dying at the same time is very small.

Of course, when there are three sentinels, each sentry should not be too selfish and have to listen to the unified arrangement of the organization.

Subjective and objective problems, for example, when Sentinel 1 thinks that the master node is dead, it cannot be thought that the master node is really dead. This judgment is called subjective downline.

When Sentinel 1 subjectively thinks that the primary node is offline, it needs to ask other nodes whether the primary node has gone offline.

If Sentinel 2 replies, the master node goes offline, Sentinel 3 replies, and the master node does not go offline.

At this time, a total of two sentinels in the Sentinel cluster subjectively think that the main node is offline.

When the number of subjective downlines reaches a certain value, for example, > = 2, we can think that the primary node is objectively offline.

Once the master node is objectively offline, it can follow the previous fault handling process, that is, select a slave node to become the master node.

Next, the lead question will be changed from the node to the master node, that is, which Sentinel will complete the subsequent fault handling process?

You can't do it at the same time.

Then it is necessary to elect a leader to do it.

How to elect a leader? I can't use another sentinel to do it, so there will be an infinite set of dolls, the best way is to let the three of them decide spontaneously.

This part is a little complicated, it is not appropriate to expand here, you can explain it in a separate article, interested students can take a look at the Raft algorithm, the Sentinel cluster is through this algorithm to elect the leader.

OK, I finally liberated my hands again!

I call this stupid thing the Sentinel system, or Sentinel Cluster!

I'll give the Sentinel an English name, Sentinel.

Postscript: the redis code selected this time is redis-3.0.0.

The Sentinel can be written from the "I" perspective precisely because the Sentinel program can be easily done by constantly typing redis commands without the support of any other protocol.

For example, the ping to judge the health status of the node, the info to get the node information, the slaveof to set the master-slave node, and even the command sentinel is-master-down-by addr to ask other sentinel nodes whether they are online, are all client commands supported by redis, which are very user-friendly.

The source code of redis is also very clean and beautifully designed, so it is not difficult for interested readers to go deep into the source code.

For example, how to select one from a cluster of slave nodes as the master node.

If you search this knowledge point online, you will find a lot of explanations in the clouds, and if you look at the source code, you will find that the process is very clear.

SentinelRedisInstance * sentinelSelectSlave () {. / / remove some nodes while ((de = dictNext (di))! = NULL) {. If (slave- > flags & (DOWN | | DISCONNECTED)) continue; if (mstime ()-slave- > last_avail_time > 5000) continue; if (slave- > slave_priority = = 0) continue; if (...) Continue;.} / / the remaining nodes are sorted in qsort (., compareSlavesForPromotion); / / take the first return instance [0];} / / how to sort it? Int compareSlavesForPromotion (const void * a, const void * b) {/ / first if ((* sa)-> slave_priority! = (* sb)-> slave_priority) return (* sa)-> slave_priority-(* sb)-> slave_priority; / / priority if ((* sa)-> slave_repl_offset > (* sb)-> slave_repl_offset) {return-1 } else if ((* sa)-> slave_repl_offset

< (*sb)->

Slave_repl_offset) {return 1;}. / / offset: sort return strcasecmp (sa_runid, sb_runid) according to the unique logo;} I believe that if you stop and read it carefully for a few seconds, even if you are not familiar with the c language, you will have a very intuitive impression, combined with the description of this piece online or in a book.

On the in-depth study of redis source code, I suggest you first read Huang Jianhong's "Redis Design and implementation", this book has a small amount of code, but the logical description is completely based on the thinking of writing code, you will know after reading it.

After reading this book, directly start reading the redis source code, you can choose redis-1.0.0 code, very little, mainly read its entire network IO and command processing process.

Then, starting from redis-3.0.0, targeted research on its master-slave, cluster, sentinel and other characteristics.

In this way, you have redis, and it is no longer vague.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.