How to analyze Redis knowledge points 07/04 Update SLTechnology News&Howtos

How to analyze Redis knowledge points

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

How to analyze Redis knowledge points, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Is a data structure rather than a type

Many articles will say that redis supports five commonly used data types, which is actually very ambiguous. What is stored in redis is binary data, which is actually a byte array (byte []). These byte data have no data type, and only after decoding them according to a reasonable format can they be converted into a string, integer or object.

This must be borne in mind. So anything that can be converted into a byte array (byte []) can be stored in redis. Whether you are a string, a number, an object, a picture, a sound, a video, or a file, just change to a byte array.

So the String in redis does not refer to a string, it actually represents the simplest data structure, that is, a key can only correspond to one value. The key and value here are both byte arrays, but key is usually a byte array converted from a string, and value depends on actual needs.

In certain cases, there will be some requirements for value, such as self-increment or self-subtraction, the byte array corresponding to value must be decoded into a number, otherwise an error will be reported.

Then the data structure of List actually means that a key can correspond to multiple value, and there is a sequence between values, and the value values can be repeated.

Set, a data structure, means that a key can correspond to multiple value, and there is no sequence between value, and the value value can not be repeated.

Hash, a data structure, indicates that a key can correspond to multiple key-value pairs, and the order of these key-value pairs is generally of little significance. It is a data structure accessed according to name semantics rather than location semantics.

The data structure of Sorted Set means that a key can be sorted by size among multiple value,value, and the value value cannot be repeated. Each value is associated with a floating point number called score. The element sorting rule is: sort by score, and then sort by value.

I believe that now that you have a clearer understanding of these five data structures, their corresponding commands are small case to you.

Problems brought by clusters and their Solutions

The benefits of clustering are obvious, such as increased capacity and enhanced processing capacity, as well as dynamic capacity expansion and reduction as needed. But at the same time, some new problems will be introduced, at least the following two.

One is data allocation: which node should be placed when storing data and which node should go to when taking data. The second is data movement: when the cluster capacity is expanded, where does the data on the node come from when the new node is added; when the cluster capacity is reduced, where does the data on the node go when you want to eliminate the node.

What the above two problems have in common is how to describe and store the mapping relationship between data and nodes. And because the location of the data is determined by the key, the problem evolves into how to establish the relationship between each key and all nodes in the cluster.

The nodes of the cluster are relatively fixed and few, although there are additional nodes and eliminated nodes. However, the key stored in the cluster is completely random, irregular, unpredictable, numerous and trivial.

This is like the relationship between a university and all its students. If the university is directly linked to the students, it will be quite chaotic. The reality is that several layers have been added between them, first, there are departments, then there are majors, and then there are grades, and classes. After these four layers of mapping, the relationship is much cleaner.

This is actually a very important conclusion that there is no problem in the world that cannot be solved by adding a layer. If so, add another layer. It's the same with computers.

Redis adds another layer between the data and the node, which is called slot, because the slot is mainly related to hash, also known as hash slot.

* becomes that there are slots on the nodes and data in the slots. Slots solve the problem of granularity, which is equivalent to making the granularity larger, so that the data can be moved easily. The hash solves the mapping problem and uses the hash value of key to calculate the slot to facilitate data allocation.

It can be understood this way, your study desk is full of books, it is very messy, it is very difficult to find a book. So you buy several large storage boxes, put the books in different boxes according to the length of the title, and then put these boxes on the table.

In this way, there are storage boxes on the table and books in the storage boxes. In this way, it is very convenient for books to move, so they pick up a box and leave. It is also very convenient to find books, just count the length of the title and look for it in the corresponding box.

In fact, we didn't do anything. We just bought a few boxes and packed the books into boxes according to some rules. Such a simple move has completely changed the original situation of scattered sand. Isn't it a little magical.

A cluster can only have 16384 slots, numbered 0-16383. These slots are allocated to all primary nodes in the cluster, and the allocation policy is not required. You can specify which numbered slots are assigned to which primary node. The cluster records the correspondence between nodes and slots.

Next, you need to hash the key, then take the remainder of 16384, and the remainder falls into the corresponding slot after a few key. Slot = CRC16 (key)% 16384.

Move data in slots, because the number of slots is fixed, so it is easier to deal with, so the problem of data movement is solved.

Use the hash function to calculate the hash value of key, so that you can calculate its corresponding slot, and then use the mapping relationship between the slot and node stored in the cluster to query the node where the slot is located, so that the data and nodes are mapped, so the data allocation problem is solved.

What I want to say is that the average person will only learn all kinds of technology, and the master is more concerned about how to jump out of the technology and find a solution or train of thought. If you go along this direction, you will almost be able to find the answer you want.

Cluster's choice of command operation

As long as the client establishes a link with one node in the cluster, it can get all the node information of the whole cluster. In addition, the corresponding information for all hash slots and nodes is obtained, and the information data is cached on the client side because it is very useful.

The client can send a request to any node, so which node should it send a request to after getting a key? In fact, it is just to transfer the mapping theory of key and nodes in the cluster to the client.

Therefore, the client needs to implement the same hash function as the cluster side, first calculate the hash value of the key, and then take the remainder of 16384, so that the corresponding hash slot of the key can be found, and the corresponding node of the key can be found by using the corresponding information between the slot cached by the client and the node.

Then send the request. You can also cache the mapping relationship between the key and the node, and the next time you request the key, you will directly get its corresponding node without having to calculate it again.

There is always a gap between theory and reality, the cluster has changed, and the client cache has not yet been updated. It is certain that you will get a key to send a request to the corresponding node, but in fact, this key is no longer on that node. What should this node do at this time?

This node can get the data from the node where the key actually resides and return it to the client, or you can directly tell the client that the key is no longer with me, and attach the information of the node where the key is located, so that the client can request it again, similar to HTTP's 302 redirection.

This is actually a question of choice, as well as a philosophical one. The result is that the redis cluster chooses the latter. Therefore, the node only processes the key it owns, and for the key that it does not own, it returns a redirect error, that is,-MOVED key 127.0.0.1 key 6381, and the client sends the request to the new node again.

So choice is not only a kind of philosophy, but also a wisdom. We will talk about this later. Let's take a look at another situation, which has something in common with this problem.

Redis has a command that can take more than one key at a time, such as MGET, which I call multi-key commands. This request for multiple key commands is sent to one node, and there is a potential problem. Have you thought that multiple key in this command must be on the same node?

It is divided into two cases. If multiple key are not on the same node, the node can only return a redirection error, but multiple key may be on multiple different nodes, and the redirection error returned will be very messy. Therefore, redis cluster selection does not support this situation.

If multiple key are on the same node, there is no problem in theory. Whether the redis cluster supports it or not has something to do with the version of redis. Just test it when you use it.

In this process, we found a meaningful thing, that is, it is necessary to map a set of related key to the same node, which can improve efficiency and get more than one value at a time through multiple key commands.

So the question is, how to give these key names so that they can fall on the same node? is it difficult for Chengdu to calculate a hash and then take a remainder? it's too troublesome. Of course not. Redis has already figured it out for us.

By simple reasoning, for two key to be on the same node, their hashes must be the same. To be the same as the hash value, the string passed in the hash function must be the same. Then we can only pass in two identical strings, which will become the same key, and the latter will overwrite the previous data.

The problem here is that we all use the entire key to calculate the hash value, which causes key to be coupled with the string involved in calculating the hash value, and we need to decouple them, that is, key is related to but different from the string involved in calculating the hash value.

Redis provides us with a solution based on this principle, called the key hash tag. Let's take a look at the example, {user1000} .followers, {user1000}. Followers, I believe you have seen the way to use only the string between {and} in Key to participate in calculating the hash value.

This ensures that the hash value is the same and falls on the same node. But key is different and will not cover each other. Hash tags are used to associate a set of related key, and the problem is solved easily and happily.

I believe you have found that the solution to the problem depends on ingenious ideas, not on powerful algorithms. This is Xiaoqiang, small and powerful.

Finally, let's talk about the philosophy of choice. The core of redis is the key/value access to common data structures and the operations around these data structures at the fastest speed. For those that have nothing to do with the core or will drag down the core, we choose to weaken or not to deal with it, in order to ensure the simplicity, speed and stability of the core.

In fact, in front of breadth and depth, redis chose depth. So the node does not deal with key that it does not own, and the cluster does not support multi-key commands. On the one hand, it can quickly respond to the client, on the other hand, it can avoid a large number of data transmission and merging within the cluster.

Single thread model

There is only one thread in each node of the redis cluster responsible for receiving and executing all requests sent by the client. Technically, the use of multiplexing Istroke O and the epoll function of Linux allows a single thread to manage many socket connections.

In addition, there are the following reasons for choosing single threading:

1. Redis operates on memory, which is extremely fast (10W+QPS)

2. The overall time is mainly spent on the transmission of the network

3. If multithreading is used, multithreading synchronization is required, so the implementation will become more complex.

4. The locking time of the thread is even longer than that of the memory operation.

5. Frequent context switching of multithreads consumes more CPU time.

6. in addition, single-thread naturally supports atomic operations, and single-threaded code is easier to write

Business

Transactions are known as bundling multiple operations together, either performing them all (successful) or none of them (rollback). Redis also supports transactions, but it may not be what you want. Let's take a look.

The transaction of redis can be divided into two steps, defining transaction and executing transaction. Start a transaction with the multi command, and then list all the commands to be executed in turn. This defines a transaction. At this point, use the exec command to perform the transaction, or use the discard command to discard the transaction.

You may want the key you care about not to be manipulated before your transaction starts, so you can use the watch command to monitor these key, and if these key are manipulated by other commands before starting execution, the transaction will be cancelled. You can also use the unwatch command to unmonitor these key.

Redis transactions have the following characteristics:

1. If an error occurs before starting the transaction, none of the commands will be executed

2. Once started, ensure that all commands are executed sequentially at once without interruption

3. If errors are encountered in the execution process, the execution will continue and will not stop.

4. Errors encountered during execution will not be rolled back

After reading these, I really want to ask a question, can you call this a business? Obviously, this is not what we usually think of as a business, because it cannot even guarantee atomicity. Atomicity is not guaranteed because redis does not support rollback, but it also gives reasons for not supporting it.

Reasons why rollback is not supported:

1. Redis believes that the failure is caused by the improper use of commands.

2. Redis does this in order to keep the internal implementation simple and fast

3. Redis also believes that rollback can not solve all problems.

Haha, this is the overlord clause, so it seems that not many people use redis transactions.

Pipeline

The interaction between the client and the cluster is serialized blocking, that is, the client must wait until the response comes back before sending a second command, which is a round trip time. If you have a lot of orders, all of which are carried out one by one, it will become very slow.

Redis provides a pipeline technology that allows the client to send multiple commands at a time without waiting for a response from the server, waiting for all the commands to be sent, and then receiving all the responses from these commands in turn. This greatly saves a lot of time and improves efficiency.

Are you aware of another problem? multiple commands are multiple key. Isn't this the multi-key operation mentioned above? then the question is, how do you ensure that these key are all on the same node? , the redis cluster has given up support for pipes.

However, it can be simulated on the client side, that is, multiple connections are used to send commands to multiple nodes at the same time, and then wait for all nodes to return a response, and then sort them out in the order in which they are sent, and return them to the user code. Oh, how troublesome.

Agreement

Simply understand the protocol of redis and know the data transmission format of redis.

The protocol for sending the request:

* number of parameters CRLF$ parameter 1 bytes of CRLF parameter 1 data CRLF...$ parameter N number of bytes CRLF parameter N data CRLF

For example, SET name lixinjie, the actual data sent is:

* 3\ r\ nroom3\ r\ nSET\ r\ nroom4\ r\ nname\ r\ nroom8\ r\ nlixinjie\ r\ n

Agreement to accept the response:

For a single line reply, * bytes are +

Error message, * bytes are-

Integer numbers, * bytes are:

Batch reply, * bytes is $

Multiple batch replies, * bytes are *

For example,

+ OK\ r\ n

-ERR Operation against\ r\ n

: 1000\ r\ n

$6\ r\ nfoobar\ r\ n

* 2\ r\ nroom3\ r\ nfoo\ r\ nroom3\ r\ nbar\ r\ n

It can be seen that the design of redis protocol is very simple.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.