In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Introduction:
Redis officially claims to support concurrent 110000 read operations and 80 000 write operations. Due to its excellent performance and convenient operation, I believe many people have used Redis in their projects. In order to prevent applications from relying too much on Redis services, Redis is only used to improve application concurrency and reduce application response time. Even if Redis is abnormal, applications should not fail to provide services. Paixin recently arranged an environment-wide Redis Cluster downtime exercise.
The author of this article is Zhu Rongsong, the person in charge of the Paipai letter architecture, and Xu Bin, the Paipai letter architecture development engineer, who authorized the release of "Technical Lock words".
I. exercise process
Redis cluster environment:
1. Test environment:
Redis Cluster configuration: Redis 3 master 3 slave a total of 6 nodes.
two。 Pre-release environment:
Redis Cluster configuration: Redis 3 master 3 slave a total of 6 nodes.
The following is the timeline of our operation:
First day
Any slave node is closed during the operation of the program, and there is no exception for one day of the test.
The second day
Any slave node was closed while the program was running, no exception was found in the program, and no exception was found in the test for one day.
The third day
There is an application release version in the pre-release environment, and the program cannot be started if there is an exception.
……
Second, the description of the problem begins with several premises:
1. The test and pre-launch environment is currently shutting down any Redis slave node.
two。 After repeated tests, the test environment began to shut down the pre-issued environment node.
3. The exception disappears after the pre-issued environment restarts the closed Redis node.
4. The Redis client is connected using Jedis, which is widely used in the Java language.
So why is there no problem in the test environment after repeated testing, but there will be problems in the pre-release environment?
Third, principle
Before analyzing the problem, briefly explain the principle of Redis Cluster implementation. To put it simply, there are 16384 hash slots built into Redis Cluster. When you need to access a key or value in Redis Cluster, the Redis client first uses the crc16 algorithm to calculate a result on the key, and then calculates the remainder of the result to 16384 (algorithm: crc16 (key) mod 16384), so that each key corresponds to a hash slot numbered between 0 and 16383. It is worth noting that the operation on which the calculation key is performed by the Redis client. The client commonly used in Java is Jedis, which is also recommended by Spring.
Note: if anyone is wondering why Redis Cluster uses 16384, that is, 2 ^ 14 slots. You can see that the author of Github https://github.com/antirez/redis/issues/2576 explains this.
Fourth, the first step of the analysis is to check the program startup exception information. Figure 1 below is the program exception information.
Figure 1 the exception obviously throws a connection exception
After looking at the source code of Jedis, an exception occurs when calling the initializeSlotsCache () method when you find the slot information that initializes Redis Cluster. Figure 2 is the specific implementation of this method. The analysis code shows that the purpose of this code should be to require cache Redis Cluster slot information. Because there is break in the code, you only need to connect to Redis to get the information once. If you take a closer look at this code, it should be an operation in which the scope of Bug,Try does not cover the Jedis connection. If the Jedis connection fails and throws a connection failure exception directly, the loop will exit directly, which is not in line with the actual expectation of the code.
Figure 2
This leads to another question of whether the node I closed happens to be the first node of the loop causing this problem. The program starts normally after trying to shut down another slave node. So what is the order of nodes loaded by Jedis? it seems that Jedis sorts the order of nodes. After looking at the source code, it is found that Jedis overrides the hashCode method of the Redis node configuration class.
Figure 3
Figure 4
Here's a simple test of what the output order is if configured to: jedis-01.test.com, jedis-02.test.com, jedis-03.test.com, jedis-04.test.com, jedis-05.test.com, jedis-05.test.com.
Figure 5
Output result:
[redis-06.test.com:6379,redis-04.test.com:6379, redis-01.test.com:6379, redis-03.test.com:6379, redis-02.test.com:6379,redis-05.test.com:6379]
In other words, if you shut down the redis-06.test.com:6379 node, the program will fail to start.
V. settlement
After locating the problem, first go to the Github to see if the problem has been encountered. After querying, it is found that someone mentioned PR to solve the problem in November last year. The link is as follows:
Https://github.com/xetorthio/jedis/pull/1633
Officials have currently released 2.10.0-m1 and 3.0.0-m1 to solve this problem, but it should be noted since it is not used in the Release version. The solution is figure 6, and compared with figure 2, you can see that figure 6 also try catch the instantiation of Jedis.
Figure 6
VI. Thinking
Because Redis Cluster uses the idea of decentralization, figure 7 shows the state of the Redis Cluster cluster, so if some of the nodes in the Redis Cluster are abnormal, it will cause the whole cluster to be abnormal.
Figure 7
So the question is how many node exceptions will lead to program read and write operation exceptions. Below, we also do a simple test to count the program errors after closing the Redis node. The following test table 1 is for reference only.
Scene
Operation (multiple nodes operate at the same time)
Total number of Redis writes
Total Redis readings
Error amount
Total time spent (s)
Error rate
The program is running
Guan master (Guan any master)
100000
100000
3084
one hundred
0.031
Guan master (Guan any master)
100000
100000
1482
one hundred and two
0.015
Guan master (Guan any master)
100000
100000
3053
97.6
0.031
Guan Cong (Guan Yi Cong)
100000
100000
0
109.2
0
Guan Cong (Guan Yi Cong)
100000
100000
0
90.1
0
Guan Cong (Guan Yi Cong)
100000
100000
0
88.9
0
Master and follower close together (Guan any pair)
100000
100000
32613
210.1
0.326
Master and follower close together (Guan any pair)
100000
100000
29148
169.8
0.291
Master and follower close together (Guan any pair)
100000
100000
32410
173.7
0.324
All the masters are closed.
100000
100000
100000
353.4
one
All from the whole customs
100000
100000
0
87.7
0
Only one master is left.
100000
100000
100000
357.1
one
Table 1
From the test results, the election process of cluster Master is participated by Master.
1. If more than half of the Master is turned off, the entire cluster is unavailable.
two。 Shutting down any pair of master and slave nodes will result in partial failure (about 1A3 of the entire cluster).
3. Shutting down any master will cause some write operations to fail because the slave node cannot perform write operations and there will be a small number of failures during the upgrade from Slave to Master.
4. Shutting down the slave node has no effect on the entire cluster.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.