What are the exotic data types and cluster knowledge of redis 07/19 Update SLTechnology News&Howtos

What are the exotic data types and cluster knowledge of redis

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article shows you what redis strange data types and cluster knowledge are, the content is concise and easy to understand, and it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Multiple data types

The string type is simple and convenient, and supports space pre-allocation, that is, more space is allocated each time, so that if the string becomes longer next time, there is no need to apply for space, provided that there is enough space left.

The List type can implement a simple message queue, but note that there may be message loss. Oh, it does not hold the ACK pattern.

The Hash table is a bit like a relational database, but when the hash table gets bigger, be careful not to use statements such as hgetall, because requesting large amounts of data can cause redis blocking, so the brothers will have to wait.

The set collection type can help you do some statistics. For example, if you want to count the number of active users on a certain day, you can directly throw the user ID into the collection, and the collection supports some coquettish operations. For example, sdiff can obtain the difference between sets, and sunion can obtain the union between collections. There are many functions, but you must be careful, because powerful functions have a price. These operations require some CPU and IO resources, which may lead to blocking. Therefore, the coquettish operation between large sets should be used with caution.

Zset can be said to be the brightest star, and it can be sorted because it can be sorted, so there are many application scenarios, such as pre-like xx users, delay queue, and so on.

The advantage of bitmap bitmap is to save space, especially in doing some statistical aspects, such as counting how many users have checked in on a given day and whether a user has checked in or not. If you do not use bitmap, you may think of using set.

SADD day 1234 check-in is added to the collection SISMEMBER day 1234 check-in / determine whether 1234 check-in SCARD day / / how many check-ins there are

Set can be satisfied functionally, but compared with bitmap, set consumes more storage space. The bottom layer of set is mainly composed of integer sets or hashtable. Integer sets are used only when the amount of data is very small, generally less than 512 elements, and elements must be integers. For set, the data of integer sets is more compact, they are continuous in memory, and the query can only be binary search. The time complexity is O (logN), but hashtable is different. The hashtable here is the same as the hash in the 5 big data type of redis, except that there is no value. Value points to a null, and there is no conflict, because it is a collection, but you need to consider rehash-related issues. Ok has gone a little too far. We are talking about the problem of user check-in. When there are a lot of users, set will definitely use hashtable,hashtable. In fact, every element is a dictEntry structure.

Typedef struct dictEntry {/ / key void * key; / / value union {void * val; uint64_t u64; int64_t s64;} v; / / point to the next hash table node to form a linked list struct dictEntry * next;} dictEntry

What can be seen from this structure? First of all, although the value union (no value) and next (no conflict) are empty, the structure itself needs space, and you need to add a key, which takes up real space, and if you use bitmap, a bit bit can represent a number, which saves a lot of space. Let's take a look at how bitmap is set and counted.

SETBIT day 1234 1max / check in GETBIT day 1234max / determine whether 1234 signs in BITCOUNT day// and how many check-ins there are

Bf this is the Bloom filter RedisBloom supported by redis4.0, but we need to load the corresponding module separately. Of course, we can also implement our own Bloom filter based on the above bitmap, but now that redis has been supported, we can reduce our development time through RedisBloom. What the Bloom filter is for? I'm not going to dwell on it here, but let's take a look at the usage related to RedisBloom.

# you can quickly pull the image through docker to play with docker run-p 6379 name redis-redisbloom redislabs/rebloom:latestdocker exec 6379-- name redis-redisbloom redislabs/rebloom:latestdocker exec-it redis-redisbloom bashredis-cli# related operations bf.reserve sign 0.001 10000bf.add sign 99MB / 99 this user joins bf.add exists 99max / determines whether 99 exists.

Because the Bloom filter is misjudged, all bf supports a custom error rate. 0.001 represents the error rate, and 10000 represents the number of elements that can be stored in the Bloom filter. When the actual number of elements stored exceeds this value, the error rate will increase.

HyperLogLog can be used for statistics, and its advantage is that it takes up very little storage space. It only needs the memory of 12KB to count 2 ^ 64 elements. What does it mainly count? In fact, it is mainly cardinality statistics, such as UV. Functionally, UV can be stored in set or hash, but the disadvantage is that it consumes storage and is easy to turn it into a large key. If you want to save space, bitmap can also. The bitmap of 12KB space can only count 121024898304 elements, while HyperLogLog can count 2 ^ 64 elements, but such a powerful technology actually has errors. HyperLogLog statistics is based on probability. The standard miscalculation rate is 0.81%. In scenarios where massive data are counted and precision is not so high, HyperLogLog is very good at saving space.

PFADD uv 1 2 3 / / 1 2 3 is active user PFCOUNT uv / / Statistics

GEO can be applied to geographic business, such as people near Wechat or vehicles nearby, etc. First, let's take a look at how do you know people near you without the data structure of GEO? First of all, you have to report your geographic location information, such as longitude 116.397128 and latitude 39.916527, which can be stored in string and hash data types, but if you want to find people near you, string and hash are incompetent as an example. You can't go through all the data every time, which is too time-consuming, and of course you can't use zset as a data structure to take latitude and longitude information as a weight. But if we can convert latitude and longitude information into a number in some way, and then it seems to be OK to use it as a weight, then we only need to use zrangebyscore key v1 v2 to find people nearby. Do you really need so much trouble? So GEO appeared, and GEO's method of converting latitude and longitude into numbers is "binary interval, interval coding". What does this mean? Take longitude as an example, its range is [- 180180], if you want to use a 3-digit coding value, then you need to divide it three times, and the one that falls on the left after two minutes is represented by 0, and the one on the right is expressed by 1, and the longitude is 121.48941. The first time is in the range of [90180], so the second time is in [90180], so the third time is in [90135], so write down 0. Latitude is the same logic, assuming that the corresponding latitude is 010 after coding, and finally merging longitude and latitude together, it is important to note that each value of longitude is in even digits and each value of latitude is in odd digits.

Longitude 01 0 / latitude-101100 / / values corresponding to longitude and latitude

Here's how it works. Let's take a look at how redis uses GEO:

GEOADD location 112.123456 41.112345 99 / / report the geographic location information of user 99 GEORADIUS location 112.123456 41.112345 1 km ASC COUNT 10 / / people who get nearby 1KM understand the cluster

Single instance redis should be rarely used in production environment. The risks of single instance are:

Single point of failure is service failure, no backup

There is a lot of pressure on a single instance to provide both reading and writing.

So the first thing we think of is the classic master-slave mode, and it is often one master and more slaves. This is because most applications read more and write less. Our master is responsible for updating, and the slave is responsible for providing readings. Even if our master goes down, we can also choose one who can always act as the master, so that the whole application can still provide services.

Details of the replication process

When a redis instance becomes a master slave for the first time, the master has to send it the data, that is, the rdb file. The process master is to fork a child process, which will execute bgsave to re-save the current data, and then prepare to send it to the new slave. The essence of bgsave is to read the data in the current memory and save it to the rdb file. This process involves a lot of IO. If you deal with it directly in the main process, there is a good chance that normal requests will be blocked, so it is wise to use a child process.

What if the fork child process has a new change request during the bgsave process?

Strictly speaking, the moment the child process comes out, the data to be saved should be the snapshot data of that point at that time, so is it a direct copy of the memory at that time? If you don't copy it, what if there are changes during this period? In fact, when it comes to the realistic copy (COW) mechanism, first of all, from the appearance, the memory is a whole piece of space, in fact, it is not easy to maintain, so the operating system will divide the memory into small pieces, that is, memory paging management, the size of a page is generally 4K, 8K or 16K, and so on. Redis data are distributed on these pages, for efficiency reasons. The child process out of fork shares the same memory as the main process, and will not copy the memory. If there are data changes in the main process during this period, then in order to distinguish, the quickest way is to copy the corresponding data page again, and then the main change is modified on this new data page, not the modified data page, thus ensuring that the child process is still dealing with the snapshot at that time.

The changes mentioned above are considered from the perspective of snapshots. In terms of data consistency, how can the changes during this period be synchronized to the slave database after the snapshot's rdb is applied from the library? The answer is the buffer, which is called replication buffer. After receiving the command that requires synchronization, the master database will save all changes during the period in this buffer, so that after sending the rdb to the slave database, it will then send the replication buffer data to the slave database, and finally the master and slave will keep the same.

Replication buffer is not a panacea.

Let's take a look at how long replication buffer continues to write.

We know that when the master and slave synchronize, the master library will execute fork to make the child process complete the corresponding work, so between the start of the bgsave execution and the completion of the execution of the child process, the changes will be written to the replication buffer.

After the rdb is generated, it needs to be sent to the slave library. Does this network transfer also take some time, during which time it is written to the replication buffer?

The slave library needs to apply the rdb to memory after receiving the rdb. During this period, the slave library is blocked and cannot provide services, so it is also written to the replication buffer during this period.

Since replication buffer is a buffer, its size is limited. If only one of the above three steps takes a long time, it will lead to the rapid growth of replication buffer (provided that there are normal writes). When the replication buffer exceeds the limit, it will cause the connection between the master library and the slave library to be disconnected. After the disconnection, if you connect from the library again, it will cause replication to start again, and then repeat the same long replication steps. Therefore, the size of the replication buffer is still very critical, and it generally needs to be judged comprehensively according to factors such as the speed of writing, the amount of writes per second and the speed of network transmission.

What should I do if the network of the slave library is not good and the main library is broken?

Normally, as long as the connection between the master and slave is established, the subsequent changes to the master database can be sent directly to the slave database for direct playback, but we cannot guarantee that the network environment is 100% unobstructed. Therefore, the disconnection between the slave library and the master database should also be considered.

It should be before redis2.8, as long as the connection from the library is disconnected, even for a short time, when the secondary library is connected again, the main library will directly carry out full synchronization without a brain. In version 2.8 and later, incremental replication is supported. The principle of incremental replication is that there must be a buffer to hold the record of changes. Here, this buffer is called repl_backlog_buffer. This buffer is logically a circular buffer. When it is full, it will be overwritten from scratch, so there is a size limit. When reconnecting to the slave library, the slave library will tell the master database: "I have copied to the xx location". After receiving the message from the slave library, the master database starts to check whether the data in the xx location is still in the repl_backlog_buffer. If so, just send the data behind the xx to the slave database directly. If not, there is nothing you can do but to synchronize all the data again.

Need a manager.

In master-slave mode, if the master library is down, we can upgrade a slave database to a master database, but this process is manual and manual, and the loss cannot be minimized. It still requires a set of automatic management and election mechanism. This is the Sentinel. Sentinel itself is also a service, but it does not handle data reading and writing. It is only responsible for managing all redis instances. The sentry communicates with each redis at regular intervals (ping operations), and each redis instance can state its position as long as it replies in a timely manner within a specified period of time. Of course, Sentinels themselves may have downtime or network failure, so generally Sentinels will also set up a Sentinel cluster. The number of this cluster is preferably odd, such as 3 or 5. The purpose of odd numbers is mainly for election (the minority is subordinate to the majority).

When a sentry does not receive the pong in time after initiating the ping, the redis instance will be marked offline. At this time, whether it is really offline or not, other sentinels will also determine whether the current sentinel is really offline. When most Sentinels believe that the redis is offline, they will kick it out of the cluster. If the sentry is offline from the library, then it is OK to kick it out of the ok. If it is the main bank to trigger the election, the election is not a blind election, it must be to choose the most suitable one that will always serve as the new main bank. The library that is most suitable for serving as the primary library is generally determined according to the following priorities:

Weight, each slave library can actually set a weight, the higher the weight, the slave library will be preferred.

The progress of replication, the progress of each replication from the library may be different, giving priority to the one with the smallest data gap between the current and the main database.

The ID of the service. In fact, each redis instance has its own ID. If the above conditions are the same, the library with the smallest ID will be selected as the main library.

Greater horizontal scalability

The master-slave mode solves the problem of single point of failure, and the read-write separation technology makes the application support more powerful. the sentinel mode can automatically supervise the cluster, automatically select the master and eliminate the fault nodes automatically.

Normally, as long as the pressure of reading is increasing, we can add a slave library to alleviate it. What if the main library is under a lot of pressure? This brings us to the sharding technology that we are going to talk about next. We only need to cut the main library into several pieces and deploy them to different machines. This shard is the concept of slots in redis. When sharding, redis will be divided into 16384 slots by default, and then evenly distribute these slots to each shard node to play the role of load balancing. The specific slot in which each key should be divided is to first CRC16 to get a number of 16bit, and then take the module for 16384:

Crc16 (key) 384

The client then caches the slot information so that whenever a key arrives, it only needs to be calculated to know which instance to send for processing. However, the slot information cached by the client is not immutable. For example, when an instance is added, it will lead to re-sharding, so the original cached information on the client will be inaccurate. Generally speaking, two common errors occur at this time, which are not errors, strictly speaking, but more like a kind of information. One is called MOVED and the other is called ASK. Moved means that the data originally responsible for instance An is now migrated to instance B, which represents the completion of migration, but ASK represents the process of migration. For example, part of the data originally responsible for instance An is now migrated to instance B, and the rest is still waiting for migration. When the data is migrated, ASK will become MOVED. After receiving the MOVED message, the client will update the local cache again, so that these two errors will not occur next time.

What are the above contents of redis exotic data types and cluster knowledge? have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.