What is the underlying data structure of redis 07/19 Update SLTechnology News&Howtos

What is the underlying data structure of redis

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

What is the underlying data structure of redis? In response to this problem, the editor summed up this article today, hoping to help more friends who want to solve this problem to find a more simple and feasible way.

1. Overview

I believe all of you who have used Redis are well aware that Redis is a distributed storage system based on key-value pairs (key-value), which is similar to Memcached but superior to Memcached's high-performance key-value database.

Described in "Redis Design and implementation":

Each key-value in the Redis database is made up of objects (object):

Database key is always a string object (string object)

The value of the database can be one of five objects: string object, list object (list), hash object (hash), collection object (set), and ordered collection (sort set) object.

Why do we say that Redis is better than Memcached? because the emergence of Redis enriches the insufficient storage of key-value in memcached, and in some cases can play a good complementary role to the relational database, and these data types support push/pop, add/remove and take intersection union and difference sets and richer operations, and these operations are atomic.

What we're talking about today is not the data types of value in Redis, but their implementation-- the underlying data types.

The underlying data structure of Redis has the following data types:

1. Simple dynamic string

2. Linked list

3. Dictionary

4. Jump table

5. Set of integers

6. Compress the list

7. Object

2. Simple dynamic string (simple dynamic string) SDS

2.1 Overview

Redis is an open source key-value database written in ANSI C language. We may subjectively think that the string in Redis is represented by the traditional string in C language, but in fact, Redis does not directly use the traditional string representation in C language. Instead, Redis constructs an abstract type called simple dynamic string (simple dynamic string SDS) and uses SDS as the default string representation of Redis:

Redis > SET msg "hello world" OK

Set a new key-value pair of key= msg,value = hello world, whose underlying data structure will be:

The key is a string object, and the underlying implementation of the object is a SDS that holds the string "msg"

The value is also a string object, and the underlying implementation of the object is a SDS that holds the string "hello world"

From the above example, we can see very intuitively what kind of data type the string is created when we usually use redis. In addition to saving strings, SDS is also used as an AOF buffer in the buffer (buffer) AOF module.

2.2 definition of SDS

Define the structure of dynamic strings in Redis:

/ * * Save the structure of the string object * / struct sdshdr {/ / length of occupied space in int len; / / buf length of remaining available space in int free; / / data space char buf [];}

1. The len variable is used to record the length of space that has been used in buf (here, the length of Redis is 5).

2. Free variable, which is used to record the space that is still free in the buf (when space is allocated for the first time, there is usually no space available, and when the string is modified, the remaining space will appear)

3. Buf character array, used to record our string (record Redis)

2.3 the difference between SDS and C string

Traditional C strings use an array of strings of length Number1 to represent strings of length N, which is inefficient in obtaining string length, string extension and other operations. C language uses this simple string representation, which can not meet the security, efficiency and functional requirements of Redis for strings.

2.3.1 get string length (SDS O (1) / C string O (n))

Traditional C strings use an array of strings of length Number1 to represent strings of length N, so in order to get the length of a C string, you must traverse the entire string.

Unlike the C string, SDS's data structure has variables specifically used to hold the string length, and we can directly know the string length by getting the value of the len attribute.

2.3.2 eliminate buffer overflow

The C string does not record the length of the string. In addition to the high complexity of getting it, it is also easy to cause a buffer overflow.

Suppose there are two strings S1 and S2 next to each other in memory, where S1 holds the string "redis" and S2 saves the string "MongoDb":

If we change the content of S1 to redis cluster now, but forget to re-allocate enough space for S1, it will appear that the content in S2 has been occupied by the content of S1, and S2 is now cluster, not "Mongodb".

The space allocation strategy of SDS in Redis completely eliminates the possibility of buffer overflow:

When we need to modify a SDS, redis will check whether the given SDS space is sufficient before performing the splicing operation. If not, it will expand the SDS space before performing the stitching operation.

2.3.3 reduce the number of memory redistributions caused by string modification

When C language string expands and shrinks, it will face the problem of memory space reallocation.

1. String concatenation will result in the expansion of the memory space of the string. In the process of concatenation, the size of the original string is likely to be smaller than that of the concatenated string, so that once you forget to apply for allocation space, it will lead to memory overflow.

two。 When the string shrinks, the memory space shrinks accordingly, and if the memory space is not reallocated when the string is cut, then the extra space becomes a memory leak.

For example, if we need to expand the following SDS, we need to expand the space. In this case, redis will change the length of the SDS to 13 bytes and the unused space to 1 byte.

Because the space has been expanded the last time you modified the string, you will find that there is enough space when you modify the string again, so there is no need for space expansion.

Through this pre-allocation strategy, SDS reduces the number of memory reallocations required to grow a string for N consecutive times from a certain N to a maximum of N.

2.3.4 release of inert space

When we look at the structure of the SDS, we can see that the free property is used to record the free space. In addition to using free to record the free space when expanding the string, we can also use the free attribute to record the remaining space when shrinking the string, which has the advantage of avoiding the need to expand the space of the string the next time we modify the string.

However, we are not saying that we cannot release the free space in SDS. SDS provides the corresponding API so that we can release the free space of SDS on our own when necessary.

Through lazy space release, SDS avoids the memory reallocation operations required to shorten strings and does not provide optimizations for possible future growth operations

2.3.5 binary security

The characters in the C string must conform to a certain encoding, and the string cannot contain empty characters except at the end of the string, otherwise the empty characters first read by the program will be mistaken for the end of the string. These restrictions make the C string can only save text data, but not binary data such as pictures, audio, video, compressed files.

But in Redis, the end of a string is not judged by an empty character, but by the attribute len. Then, even if there is a null character in the middle, it is still possible for SDS to read that character.

For example:

2.3.6 compatible with some C string functions

Although SDS's API is binary secure, they also follow the convention that C strings end with an empty string.

2.3.7 Summary

3. Linked list

3.1 Overview

The linked list provides efficient node rearrangement ability and sequential node access, and the length of the linked list can be flexibly adjusted by adding and deleting nodes.

Linked lists are widely used in Redis. For example, one of the underlying implementations of list keys is linked lists. When a list key contains a large number of elements, or when the elements in the list are long strings, Redis uses the linked list as the underlying implementation of the list key.

3.2 data structure of linked list

Each linked list node is represented by a listNode structure (adlist.h/listNode):

Typedef struct listNode {struct listNode * prev; struct listNode * next; void * value;}

A double-ended linked list consisting of multiple linked list nodes:

It is more convenient for us to manipulate linked lists by directly manipulating list:

Typedef struct list {/ / header node listNode * head; / / footer node listNode * tail; / / list length unsigned long len; / / Node value replication function void * (* dup) (void * ptr); / / Node value release function void (* free) (void * ptr); / / Node value comparison function int (* match) (void * ptr, void * key);}

Structure diagram composed of list:

3.3 characteristics of linked lists

Double-ended: the linked list node has prev and next pointers, and the time complexity of getting the front and back nodes of a node is O (N).

Acyclic: both the prev pointer of the header node and the next of the footer node point to NULL, and the access to the filing table is cut off by NULL.

Header and footer: because the linked list has head pointer and tail pointer, the time complexity of getting the header node and tail node of the linked list is O (1)

Length counter: the property len that records the length of the linked list is stored in the linked list.

Polymorphism: linked list nodes use void* pointers to hold node values, and you can set type-specific functions for node values through the dup, free, and match attributes of the list structure.

4. Dictionary

4.1 Overview

A dictionary, also known as a symbol table (symbol table), an associative array (associative array), or a mapping (map), is an abstract data structure used to hold key-value pairs.

In a dictionary, a key can be associated with a value, and each key in the dictionary is unique. There is no such data structure in C, but its own dictionary implementation is built in Redis.

Take a simple example:

Redis > SET msg "hello world" OK

Creating such key-value pairs ("msg", "hello world") is stored in the database as a dictionary

4.2 definition of the dictionary

4.2.1 Hash table

The hash table used by the Redis dictionary is defined by the dict.h/dictht structure:

Typedef struct dictht {/ / Hash table array dictEntry * * table; / / Hash table size unsigned long size; / / Hash table size mask, used to calculate the index value unsigned long sizemask; / / the number of nodes in the hash table unsigned long used;}

The structure of an empty dictionary is as follows:

We can see that there is a pointer to the dictEntry array in the structure, and the space we use to store the data is both dictEntry

4.2.2 Hash table node (dictEntry)

DictEntry structure definition:

Typeof struct dictEntry {/ / key void * key; / / value union {void * val; uint64_tu64; int64_ts64;} struct dictEntry * next;}

In the data structure, we know that key is unique, but the key we put into it is not a direct string, but a hash value. Through the hash algorithm, the string is converted to the corresponding hash value, and then the corresponding location is found in the dictEntry.

At this point, we will find a problem, what if the hash value is the same? Redis uses the chain address method:

When the hash values of K1 and K0 are the same, point the next in K1 to K0 as a linked list.

4.2.3 Dictionary

Typedef struct dict {/ / Type-specific function dictType * type; / / Private data void * privedata; / / Hash table dictht ht [2]; / / rehash index in trehashidx;}

The type property and the privdata property are set for different types of key-value pairs to create polymorphic dictionaries.

The ht property is an array of two items (two hash tables)

A dictionary in a normal state:

4.3 resolve hash conflicts

In the above analysis of the hash node, we mentioned that when inserting a new piece of data, the hash value will be calculated. If the hash value is the same, the separate chaining method is used in Redis to solve the key conflict.

Each hash table node has a next pointer, multiple hash table nodes can use next to form an one-way linked list, and multiple nodes assigned to the same index can use this one-way linked list to solve the problem of hash value conflicts.

For example:

Now the hash table has the following data: K0 and K1

We are now going to insert K2, which is calculated by the hash algorithm that the hash value of K2 is 2, that is, we need to insert K2 into dictEntry [2]:

After the insertion, we can see that dictEntry points to K2 next of K2 and points to K1, thus completing an insert operation (header insertion is selected here because there is no record chain footer node location in the hash table node)

4.4 Rehash

With the continuous operation of the hash table, the key-value pairs stored in the hash table will gradually change. In order to keep the load factor of the hash table within a reasonable range, we need to expand or compress the size of the hash table accordingly. At this time, we can do it through rehash (re-hashing) operation.

4.4.1 current hash table status:

We can see that each node in the hash table has been used, so we need to extend the hash table.

4.4.2 allocate space for the hash table

Hash table space allocation rules:

If an extension operation is performed, then the size of ht [1] is the first greater than or equal to the n power of 2 of ht [0].

If the contraction operation is performed, then the size of ht [1] is the first one greater than or equal to the n-power of 2 of ht [0].

So here we allocate space 8 for ht [1].

4.4.3 data transfer

Transfer the data from ht [0] to ht [1]. In the process of transfer, it is necessary to re-calculate the hash value of the data of the hash table node.

Results of data transfer:

4.4.4 release ht [0]

Release ht [0], then set ht [1] to ht [0], and finally assign a blank hash table to ht [1]:

4.4.5 Progressive rehash

As mentioned above, when expanding or compressing, we can directly rehash all the key-value pairs into ht [1] because the amount of data is relatively small. In the actual development process, this rehash operation is not an one-time, centralized completion, but multiple, gradual completion.

Detailed steps for progressive rehash:

1. Allocate space for ht [1] so that the dictionary holds both ht [0] and ht [1] hash tables.

2. What time to maintain an index counter variable rehashidx, and set its value to 0, indicating that rehash starts

3. During the rehash, each time the CRUD operation is performed on the dictionary, the program will, in addition to performing the specified operation, rehash the data in ht [0] to the ht [1] table and add one to the rehashidx.

4. When all the data in ht [0] is transferred to ht [1], set rehashidx to-1, indicating the end of rehash.

The advantage of adopting progressive rehash is that it adopts divide-and-conquer approach and avoids the huge amount of computation brought by centralized rehash.

So much for sharing the underlying data structure of redis. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.