How to implement the bottom layer of Redis data structure 07/19 Update SLTechnology News&Howtos

How to implement the bottom layer of Redis data structure

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to implement the underlying Redis data structure. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Redis is also a popular part of the interview. What I'm talking about here is the underlying data structure of redis, not the five big data structure that you understand. Have you ever wondered what the underlying data structure of redis is, and how they differ from the data structures used by HashMap, List, and so on in our java?

1. String processing (string)

We all know that redis is written in C, but the cost of dealing with strings and arrays in C is very high, so let me give you a few examples.

Several problems without data structure support

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

It is extremely easy to cause buffer overflow problems, such as using strcat (), which must allocate enough space to the target variable before using this function, otherwise it will overflow.

If you want to get the length of a string, you may need to traverse it without the support of a data structure, and its complexity is O (N).

Memory redistribution. Each change (long or shortened) of the C string results in memory reallocation of the array. Similarly, if it is shortened and the excess space is not handled properly, it will also cause memory leaks.

Well, Redis built his own data structure called Simple dynamic string (SDS), and he dealt with these problems separately. Let's first take a look at its structural source code:

Struct sdshdr {/ / record the number of bytes used in the buf array / equal to the length of the SDS saved string int len; / / record the number of unused bytes in the buf array int free; / / byte array, used to hold the string char buf [];}

Let's talk about its advantages:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Developers do not have to worry about memory overflows caused by string changes.

Constant time complexity gets the string length len field.

Space is pre-allocated to the free field, which leaves enough space by default to prevent memory from being reallocated multiple times.

Learn more: https://redis.io/topics/internals-sds

This is not only the underlying implementation of string, but also the way redis handles all string data (SDS will be nested into other data structures).

two。 Linked list

Redis's linked list extends attributes such as header, trailing node, number of elements, and so on, on a two-way linked list.

2.1 Source Code

ListNode node data structure:

Typedef struct listNode {/ / Front node struct listNode * prev; / / Post node struct listNode * next; / / value of the node void * value;} listNode

Linked list data structure:

Typedef struct list {/ / header node listNode * head; / / number of nodes contained in the footer node listNode * tail; / / Node value copy function void (* free) (void * ptr); / / Node value release function void (* free) (void * ptr) / / Node value comparison function int (* match) (void * ptr,void * key);} list

As you can see from the above, the linked list of Redis has these characteristics:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The head and tail nodes can be obtained directly.

The constant time complexity gets the length of the linked list.

It's a two-way linked list.

3. Dictionary (Hash)

Redis Hash, that is, on the basis of array + linked list, has made some rehash optimization and so on.

3.1 data structure source code

Hash table:

Typedef struct dictht {/ / Hash table array dictEntry * * table; / / Hash table size unsigned long size; / / Hash table size mask, which is used to calculate the index value / / is always equal to size-1 unsigned long sizemask; / / the number of nodes already in the hash table unsigned long used;} dictht

Hash Node:

Typedef struct dictEntry {/ / key void * key; / / value union {void * val; uint64_t u64; int64_t s64;} v; / / points to the next hash table node to form a linked list struct dictEntry * next; / / single linked list structure} dictEntry

Dictionary:

Typedef struct dict {/ / type-specific function dictType * type; / / private data void * privdata; / / hash table dictht ht [2]; / / rehash index / / when rehash is not in progress, the value is-1 int rehashidx; / * rehashing not in progress if rehashidx = =-1 * /} dict

As can be seen:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Reids's Hash uses the chain address method to handle conflicts, and then it does not use red-black tree optimization.

The hash table node adopts a single linked list structure.

Rehash optimization.

Let's talk about its rehash optimization.

3.2 rehash

When the keys of the hash table are too few in Thailand, the hash table needs to be resized. How does redis adjust?

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

We can see in detail that there is a field dictht ht [2] in the dict structure that represents two dictht arrays. The first step is to allocate space for the ht [1] hash table, depending on how ht [0] is currently being used.

Rehash the data stored in ht [0] (recalculate the hash) to ht [1].

When all key-value pairs in ht [0] are migrated to ht [1], release ht [0], set ht [1] to ht [0], and initialize ht [1] in preparation for the next rehash.

3.3Progressive rehash

We saw the flow of redis dealing with rehash in 3. 2, but in more detail, how does it migrate data?

This involves progressive rehash,redis taking into account the busy cpu caused by a large number of data migration (which may cause service to be stopped for a period of time), so it adopts the scheme of progressive rehash. The steps are as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Allocate space for ht [1] and hold two hash tables (one empty table and one with data).

Maintain a technician rehashidx with an initial value of 0.

Every time you add, delete and change the dictionary, the data in ht [0] will be migrated to ht [1], rehashidx++ (note: the data in ht [0] is only reduced but not increased).

Until the rehash operation is complete, the rehashidx value is set to-1.

Its advantage: it adopts the idea of divide and conquer, divides the huge migration workload into each CURD, and avoids busy service.

4. Jump table

This data structure is the most I have seen in the interview, it is actually very simple. Anyone who has studied it may know that its performance is similar to that of a balanced tree, but why not use skipList instead of a balanced tree?

4.1Choices between skipList and AVL

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

In terms of the difficulty of implementing the algorithm, skiplist is much simpler than balanced tree.

The insertion and deletion of the balanced tree may lead to the adjustment of the subtree, which is logically complex, while the insertion and deletion of the skiplist only need to modify the pointers of the adjacent nodes, which is simple and fast.

The time complexity of finding a single key,skiplist and a balanced tree is O (log n), which is roughly the same.

When doing range lookups, balanced trees are more complex than skiplist operations.

The elements of skiplist and various balance trees (such as AVL, red-black tree, etc.) are arranged in order.

As you can see, the elements in skipList are ordered, so jump tables are used in redis for ordered collection of keys and internal data structures within cluster nodes.

4.2 Source Code

Jump table node:

Typedef struct zskiplistNode {/ / backward pointer struct zskiplistNode * backward; / / score double score; / / member object robj * obj; / / layer struct zskiplistLevel {/ / forward pointer struct zskiplistNode * forward; / / Span unsigned int span;} level [];} zskiplistNode

Jump table:

Typedef struct zskiplist {/ / header node and footer node struct zskiplistNode * header, * tail; / / number of nodes in the table unsigned long length; / / the number of layers of the node with the largest number of layers in the table int level;} zskiplist

It has several concepts:

4.2.1 layer (level [])

Layers, that is, the level [] field, the more layers, the faster the access node. Because it is equivalent to an index, the more layers it has, the finer its index will be, and the index value can be found quickly.

4.2.2 forward pointer (forward)

There is a forward field in the layer for access from the header to the footer.

4.2.3 Span (span)

Used to record the distance between two nodes.

4.2.4 back pointer (backward)

Used for access from the footer to the header.

Case

Level0 1-> 5 level1 1 murmuri-> 3 murmuri-> 5 level2 1-> 2-> 3-> 4-> 5-> 6-> 7-> 8

For example, I want to find an element with a key of 6, navigate directly to 5 in level0, and then go back one element to find it.

5. Set of integers (intset)

Reids specifically optimizes integer storage. Intset is the collection data structure that redis uses to hold integer values. When a union contains only integer elements, redis uses this to store it.

127.0.0.1 sadd number 6379 [2] > object encoding number "intset"

Source code

Intset data structure:

Typedef struct intset {/ / Encoding uint32_t encoding; / / the number of elements contained in the collection uint32_t length; / / an array of saved elements int8_t contents [];} intset

You must be curious about what the encoding field is for.

If the value of the encoding attribute is INTSET_ENC_INT16, then contents is an array of type int16_t, and each item in the array is an integer value of type int16_t (minimum value is-32768, maximum value is 32767).

If the value of the encoding attribute is INTSET_ENC_INT32, then contents is an array of type int32_t, and each item in the array is an integer value of type int32_t (minimum value is-2147483648, maximum value is 2147483647).

If the value of the encoding attribute is INTSET_ENC_INT64, then contents is an array of type int64_t, and each item in the array is an integer value of type int64_t (the minimum value is-9223372036854775808, the maximum value is 9223372036854775807).

To put it bluntly, it is based on the contents field to determine which int type is better, that is, the int storage is optimized.

Speaking of optimization, how does redis do it? It involves upgrading.

5.1 encoding upgrade

If we have a collection of integers of type Int16, we will now add 65535 (int32) to this collection. Int16 cannot be stored, so we need to upgrade the set of integers.

How is it upgraded (process)?

If you now have two elements of int16: 1 and 2, add a new element 65535 with 1 int32 bit.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Memory redistribution, the newly added should be 3 elements, so the allocation of 3 "32-1" 95 bits.

Select the maximum number 65535, put it in the memory segment of (95-32) bits, and then put 2 into (95-32-32) bits. And so on.

What are the benefits of upgrading?

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The flexibility of the set of integers is improved.

Save memory as much as possible (if you can use the small one, you don't need the big one).

5.2 demotion is not supported

According to the above example, if I delete 65535 again, will encoding go back to Int16? the answer is no. The official did not give a reason, I think it should be to reduce performance consumption, after all, an adjustment is O (N) time complexity.

6. Compressed list (ziplist)

Ziplist is a sequential data structure developed by redis to save memory. It is used in list keys and hash keys. Generally used for small data storage.

Refer to two diagrams in https://segmentfault.com/a/1190000016901154:

6.1 Source Code

Ziplist does not clearly define the structure, so here is only a rough demonstration.

Typedef struct entry {/ * previous element length requires space and previous element length * / unsigned int prevlengh; / * element content encoding * / unsigned char encoding; / * element actual content * / unsigned char * data;} zlentry;typedef struct ziplist {/ * memory allocated by ziplist * / uint32_t zlbytes / * offset to the tail * / uint32_t zltail; / * number of storage element entities * / uint16_t zllen; / * storage content entity element * / unsigned char* entry []; / * tail identification * / unsigned char zlend;} ziplist

It may be very deceptive for the first time, and you will certainly understand it after reading this paragraph carefully.

Analysis of Entry

There are three important fields in the entry structure:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Previous_entry_length: what does this field mean by recording the length of the previous node in the ziplist? That is to say, through this property, you can perform a pointer operation to traverse the footer to the header, and there is another big problem in this field, which will be discussed below.

Encoding: records the data type (int16? String?) And length.

Data/content: record data.

Chain renewal

Analysis of previous_entry_length field

As mentioned above, the previous_entry_length field stores the length of the last node, so how much is the default length allocated? redis is divided in this way, if the length of the previous node is less than 254,1 byte is allocated, and if it is greater than 5 bytes, then the problem arises.

If the length of the previous node is less than 254 bytes at first and then greater than 254 bytes, won't there be no room for storage? This involves the update of previous_entry_length, but it is definitely not possible to change one. The memory information of the following nodes needs to be changed. So you need to reallocate memory, and then cascading updates include all nodes behind the affected node.

In addition to adding new nodes will trigger cascading updates, deleting nodes will also be triggered.

7. Quick list (quicklist)

A two-way linked list consisting of ziplist. But a quicklist can have multiple quicklist nodes, much like the storage of a B-tree. Is a new data structure added in the redis3.2 version, used in the underlying implementation of the list.

Structural body source code

Header structure:

Typedef struct quicklist {/ / pointer to head (leftmost) quicklist node quicklistNode * head; / / pointer to tail (rightmost) quicklist node pointer quicklistNode * tail; / / ziplist entry node counter unsigned long count; / * total count of all entries in all ziplists * / / quicklist quicklistNode node counter unsigned int len / * number of quicklistNodes * / / Save ziplist size, configuration file settings, account for 16bits int fill: 16; / * fill factor for individual nodes * / / save compression degree values, configuration file settings: 16 bits unsigned int compress: 16; / * depth of end nodes not to compress;0=off * /} quicklist

Quicklist node structure:

Typedef struct quicklistNode {struct quicklistNode * prev; / / precursor node pointer struct quicklistNode * next; / / successor node pointer / / when the compressed data parameter recompress is not set, it points to a ziplist structure / / sets the compressed data parameter recompress to the total length unsigned int sz of the quicklistLZF structure unsigned char * zl; / / compressed list ziplist / * ziplist size in bytes * / / the number of nodes in the package, accounting for 16 bits length unsigned int count: 16; / * count of items in ziplist * / / indicates whether LZF compression algorithm is used to compress quicklist nodes. 1 means compressed, 2 means uncompressed, and 2 bits length unsigned int encoding: 2 / * RAW==1 or LZF==2 * / / indicates whether a quicklistNode node uses ziplist structure to save data. 2 means it is compressed. 1 means it is not compressed. The default is 2, which accounts for unsigned int container: 2 of the 2bits length; / * NONE==1 or ZIPLIST==2 * / / marks whether the ziplist of the quicklist node has been decompressed before, accounting for the 1bit length / / if the recompress is 1, it is waiting to be compressed again. / * was this node previous compressed? * / use unsigned int attempted_compress: 1; / * node can't compress; too small * / extra extension bits, accounting for the length of 10bits unsigned int extra: 10; / * more bits to steal for future usage * /} quicklistNode

Related configuration

In the ADVANCED CONFIG section of redis.conf:

List-max-ziplist-size-2 list-compress-depth 0

List-max-ziplist-size parameter

Let's explain in detail the meaning of the parameter list-max-ziplist-size. It can take a positive value or a negative value.

When a positive value is taken, the length of ziplist on each quicklist node is limited according to the number of data items. For example, when this parameter is configured to 5, it means that the ziplist of each quicklist node contains up to five data items.

When a negative value is taken, the length of the ziplist on each quicklist node is limited according to the number of bytes occupied. At this point, it can only take five values from-1 to-5, each with the following meaning:

-5: the ziplist size on each quicklist node cannot exceed 64 Kb. (note: 1kb = > 1024 bytes)

-4: the ziplist size on each quicklist node cannot exceed 32 Kb.

-3: the ziplist size on each quicklist node cannot exceed 16 Kb.

-2: the ziplist size on each quicklist node cannot exceed 8 Kb. (- 2 is the default value given by Redis)

List-compress-depth parameter

This parameter represents the number of nodes that are not compressed at both ends of an quicklist. Note: the number of nodes here refers to the number of nodes in the quicklist two-way linked list, not the number of data items in the ziplist. In fact, the ziplist on a quicklist node, if compressed, is compressed as a whole.

The value of parameter list-compress-depth is as follows:

0: is a special value, which means that it is not compressed. This is the default value for Redis. 1: it means that one node at each end of the quicklist is not compressed, and the node in the middle is compressed. 2: indicates that there are 2 nodes at each end of the quicklist that are not compressed, and the nodes in the middle are compressed. 3: it means that there are 3 nodes at each end of the quicklist that are not compressed, and the nodes in the middle are compressed. And so on

Redis adopts LZF--, a lossless compression algorithm, for the compression algorithm of quicklist internal nodes.

On how to carry out the underlying implementation of the Redis data structure is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.