What is the function of the internal data structure ziplist in Redis 04/21 Update SLTechnology News&Howtos

What is the function of the internal data structure ziplist in Redis

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you what the role of the internal data structure ziplist in Redis is, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

What is ziplist?

Redis's official definition of ziplist is (from the file header comments of ziplist.c):

The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as actual integers instead of a series of characters. It allows push and pop operations on either side of the list in O (1) time.

Ziplist is a specially coded bi-directional linked list, which is designed to improve storage efficiency. Ziplist can be used to store strings or integers, where integers are encoded in a true binary representation rather than into a sequence of strings. It can provide push and pop operations at both ends of the table with O (1) time complexity.

In fact, ziplist fully embodies Redis's pursuit of storage efficiency. An ordinary two-way linked list in which each item occupies a separate piece of memory and the items are connected by address pointers (or references). This approach results in a lot of memory fragmentation, and address pointers take up extra memory. On the other hand, ziplist stores each item in the table in a contiguous address space, and a ziplist takes up a large chunk of memory. It is a list, but not really a linked list (linked list).

In addition, in order to save memory in detail, ziplist uses a variable length encoding for the storage of values, which roughly means that for large integers, more bytes are used to store, while for small integers, fewer bytes are used. We will discuss these implementation details soon.

Data structure definition of ziplist

The data structure composition of ziplist is the focus of this paper. In fact, ziplist is a little more complex, and its complexity lies in its data structure definition. Once you understand the data structure, some of its operations are easier to understand.

Let's start with a general introduction to the definition of the data structure of ziplist, and then give a practical example to explain the composition of ziplist. If you understand this part, the task of this article is more than half complete.

From a macro point of view, the memory structure of ziplist is as follows:

...

Each part is adjacent to each other in memory, and their meanings are as follows:

: 32bit, which represents the total number of bytes occupied by ziplist (including 4 bytes occupied by itself)

: 32bit, which represents the offset bytes of the last item in the ziplist table (entry) in ziplist. Makes it easy to find the last item (without having to traverse the entire ziplist), so that you can quickly perform push or pop operations at the end of the ziplist.

16bit, which represents the number of data items (entry) in the ziplist Because there is only 16bit in the zllen field, the maximum value that can be expressed is 2 ^ 16-1. It is important to note that if the number of data items in ziplist exceeds the maximum value that 16bit can express, ziplist can still express it. How do you say that? It is stipulated here that if it is less than or equal to 2 ^ 16-2 (that is, not equal to 2 ^ 16-1), it means the number of data items in ziplist; otherwise, if the 16bit is all 1, then it does not represent the number of data items. At this time, if you want to know the total number of data items in ziplist, you must traverse each data item for ziplist from beginning to end before counting it.

Represents the data item that actually stores the data, with varying lengths A data item (entry) also has its own internal structure, which will be explained later.

The last byte of the ziplist, which is a closing tag, is always equal to 255.

It is also worth noting in the above definition that since it occupies multiple bytes, there is a difference between big endian and little endian when storing. Ziplist uses a small-end mode for storage, which will be explained in more detail when we introduce specific examples below.

Let's take a look at the composition of each data item:

We see that there are two more fields in front of the real data ():

Represents the total number of bytes occupied by the previous data item The purpose of this field is to enable ziplist to traverse from back to front (from the position of the latter item, just offset prevrawlen bytes forward to find the previous item). This field is encoded with variable length.

Represents the data length of the current data item (that is, the length of the part). Variable length coding is also used.

So how do you and how do you encode the variable length? Readers cheer up, and we finally come to the most tedious part of the definition of ziplist.

Go first. It has two possibilities, either 1 byte or 5 bytes:

If the previous data item occupies less than 254 bytes, it is represented by only one byte, and the value of this byte is the number of bytes occupied by the previous data item.

If the previous data item occupies more than or equal to 254 bytes, it is represented by 5 bytes, where the value of the first byte is 254 (as a token of this case), and the next four bytes form an integer value. to actually store the number of bytes occupied by the previous data item.

Some people may ask, why is there no case of 255?

This is because: 255 has been defined as the value of the ziplist closing tag. In the implementation of many operations in ziplist, it is determined whether the first byte of the data item has reached the end of ziplist according to whether the first byte of the data item is 255. therefore, the first byte of a normal data (that is, the first byte) cannot take the value of 255, otherwise it will conflict.

The field is even more complex, which is divided into nine cases according to the first byte (the following representation is in binary):

| 00pppppp |-1 byte. The first byte has a maximum of two bit of 00, so the field has only one byte, and the remaining six bit are used to represent the length value, up to 63 (2 ^ 6-1).

| 01pppppp | qqqqqqqq |-2 bytes. The first byte has a maximum of two bit of 01, so the field occupies 2 bytes, and a total of 14 bit are used to represent the length value, up to 16383 (2 ^ 14-1).

| 10 minutes _ | qqqqqqqq | rrrrrrrr | ssssssss | tttttttt |-5 bytes. The first byte has a maximum of two bit of 10, so the len field occupies 5 bytes, using a total of 32 bit to represent the length value (6 bit discarded), and up to 2 ^ 32-1. It is important to note that in the first three cases, it is stored as a string; starting from the fourth case below, it begins to be stored as an integer.

| 11000000 |-1 byte. The field occupies 1 byte, the value is 0xC0, and the following data is stored as a 2-byte int16_t type.

| 11010000 |-1 byte. The field occupies 1 byte, the value is 0xD0, and the following data is stored as a 4-byte int32_t type.

| 11100000 |-1 byte. The field occupies 1 byte, the value is 0xE0, and the following data is stored as an 8-byte int64_t type.

| 11110000 |-1 byte. The field occupies 1 byte, the value is 0xF0, and the following data is stored as 3-byte integers.

| 11111110 |-1 byte. The field occupies 1 byte, the value is 0xFE, and the following data is stored as a 1-byte integer.

| 1111xxxx |-(the value of xxxx is between 0001 and 1101). This is a special case, where xxxx has a total of 13 values from 1 to 13, and these 13 values are used to represent the real data. Note that this represents the real data, not the length of the data. That is, in this case, a separate field is no longer needed to represent the real data, but is merged into one. In addition, because xxxx can only take the 13 values of 0001 and 1101 (other possible values conflict with other situations, such as 0000 and 1110 conflict with the eighth case of the seventh above, respectively, and 1111 conflicts with the closing tag), and the decimal value should start at 0, so these 13 values represent 0 to 12, that is, the value of xxxx minus 1 is the value of the integer data it wants to represent.

All right, we're done with the definition of the data structure of ziplist. Now let's look at a concrete example.

The picture above is a real ziplist data. Let's read it item by item:

This ziplist contains 33 bytes in total. The byte number ranges from byte [0] to byte [32]. The value of each byte in the figure is represented in hexadecimal.

The first 4 bytes (0x21000000) are fields stored in little endian mode. What is Xiao Duan? It means that the low bytes of data are stored in the low address of memory (see Wikipedia entry Endianness). Therefore, the value here should be parsed to 0x00000021, which is exactly 33 in decimal terms.

The next 4 bytes (byte [4... 7]) are interpreted in small-end storage mode, whose value is 0x0000001D (value 29), indicating the location of the last data item in byte [29] (that data item is 0x05FE14).

The next 2 bytes (byte [8... 9]) have a value of 0x0004, indicating that there are four pieces of data in the ziplist.

The next 6 bytes (byte [10.. 15]) is the first data item. Where prevrawlen=0, because there are no data items in front of it, and len=4, equivalent to the first of the nine cases defined earlier, means that the next four bytes store data as a string, and the value of the data is "name".

The next 8 bytes (byte [16. 23]) is the second data item, which is similar to the previous data item storage format, storing a string "tielei".

The next five bytes (byte [24. 28]) is the third data item, which is similar to the previous data item storage format, storing a string "age".

The next 3 bytes (byte [29. 31]) is the last data item, which is in a different format from the previous data item storage format. The first byte prevrawlen=5 indicates that the previous data item occupies 5 bytes, and the second byte = FE, which is equivalent to the eighth of the nine cases defined earlier, so there is another byte to represent the real data, which is expressed as an integer. Its value is 20 (0x14).

The last byte (byte [32]) represents a fixed value of255 (0xFF).

To sum up, there are four data items stored in this ziplist, which are:

String: "name"

String: "tielei"

String: "age"

Integer: 20

(well, you found out ~ ~ tielei is not actually 20 years old, of course, he is not so young. )

In fact, this ziplist is created with two hset commands. We will talk about this again in the second half.

Well, now that you've read this, it means you're still very patient (in fact, I'm very tired when I write about it). You can collect this article first, have a rest, and then look back at the second half.

Next I'm going to post some code.

Interface of ziplist

Let's not rush to look at the implementation, let's pick a few important interfaces of ziplist and see what they look like:

Unsigned char * ziplistNew (void); unsigned char * ziplistMerge (unsigned char * * first, unsigned char * * second); unsigned char * ziplistPush (unsigned char * zl, unsigned char * s, unsigned int slen, int where); unsigned char * ziplistIndex (unsigned char * zl, int index); unsigned char * ziplistNext (unsigned char * zl, unsigned char * p); unsigned char * ziplistPrev (unsigned char * zl, unsigned char * p); unsigned char * ziplistInsert (unsigned char * zl, unsigned char * p, unsigned char * s, unsigned char) Unsigned char * ziplistDelete (unsigned char * zl, unsigned char * * p); unsigned char * ziplistFind (unsigned char * p, unsigned char * vstr, unsigned int vlen, unsigned int skip); unsigned int ziplistLen (unsigned char * zl)

We can roughly guess their functions from the names of these interfaces. Here is a brief explanation:

The data type of ziplist is not expressed in terms of custom struct and the like, but simply unsigned char *. This is because ziplist is essentially a piece of continuous memory, and the internal structure is a highly dynamic design (variable length coding), which can not be expressed by a fixed data structure.

ZiplistNew: create an empty ziplist (contains only).

ZiplistMerge: merge two ziplist into a new ziplist.

ZiplistPush: insert a piece of data (to generate a new data item) at the head or tail of the ziplist. Notice that the return value of this interface is a new ziplist. The caller must replace the old ziplist variable passed in with the new ziplist returned here, and after being processed by this function, the old ziplist variable becomes invalid. Why would a simple insert result in a new ziplist? This is because ziplist is a contiguous space, and appending operations to it will cause realloc of memory, so the memory location of ziplist may change. In fact, we mentioned a similar interface usage pattern in our previous article introducing sds (see the description of the sdscatlen function).

ZiplistIndex: returns the memory location of the data item specified by the index parameter. Index can be a negative number, indicating that it is indexed forward from the trailing end.

ZiplistNext and ziplistPrev return the latter and previous items of the specified data item p in a ziplist, respectively.

ZiplistInsert: insert a new data item in front of any data item in ziplist.

ZiplistDelete: deletes the specified data item.

ZiplistFind: finds the given data (specified by vstr and vlen). Notice that it has a skip parameter that indicates that several data items are skipped between each comparison when looking up. Why is there such a parameter? In fact, the main purpose of this parameter is that when using ziplist to represent the hash structure, it is stored in ziplist according to a field and a value in turn. That is, the data items of the even index store field, and the data items of the odd index store value. When looking up according to the value of field, you need to skip the odd items.

ZiplistLen: calculates the length of the ziplist (that is, the number of items containing data).

Analysis of insertion Logic of ziplist

The specific implementation of the relevant interfaces of ziplist is still somewhat complicated, and because of the limited space, we only combine the code here to explain the logic of insertion. Insertion is a very representative operation, through this part to take a look at the internal implementation of ziplist, other parts of the implementation will be easy to understand.

Both ziplistPush and ziplistInsert are inserts, but there are different restrictions on where to insert. Their internal implementations all rely on an internal function named _ _ ziplistInsert with the following code (from ziplist.c):

Static unsigned char * _ ziplistInsert (unsigned char * zl, unsigned char * p, unsigned char * s, unsigned int slen) {size_t curlen = intrev32ifbe (ZIPLIST_BYTES (zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; / * initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. * / zlentry tail; / * Find out prevlen for the entry that is inserted. * / if (p [0]! = ZIP_END) {ZIP_DECODE_PREVLEN (p, prevlensize, prevlen);} else {unsigned char * ptail = ZIPLIST_ENTRY_TAIL (zl); if (ptail [0]! = ZIP_END) {prevlen = zipRawEntryLength (ptail) }} / * See if the entry can be encoded * / if (zipTryEncoding) {/ * 'encoding' is set to the appropriate integer encoding * / reqlen = zipIntSize (encoding);} else {/ *' encoding' is untouched, however zipEncodeLength will use the * string length to figure out how to encode it. * / reqlen = slen;} / * We need space for both the length of the previous entry and * the length of the payload. * / reqlen + = zipPrevEncodeLength (NULL,prevlen); reqlen + = zipEncodeLength (NULL,encoding,slen); / * When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. * / nextdiff = (p [0]! = ZIP_END)? ZipPrevLenByteDiff (pforce reqlen): 0; / * Store offset because a realloc may change the address of zl. * / offset = pMurzl; zl = ziplistResize (zl,curlen+reqlen+nextdiff); p = zl+offset; / * Apply memory move when necessary and update tail offset. * / if (p [0]! = ZIP_END) {/ * Subtract one because of the ZIP_END bytes * / memmove. / * Encode this entry's raw length in the next entry. * / zipPrevEncodeLength (paired reqlenlast reqlen); / * Update offset for tail * / ZIPLIST_TAIL_OFFSET (zl) = intrev32ifbe (intrev32ifbe (ZIPLIST_TAIL_OFFSET (zl)) + reqlen); / * When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the * tail* offset. * / zipEntry (p+reqlen, & tail); if (p [reqlen+tail.headersize+tail.len]! = ZIP_END) {ZIPLIST_TAIL_OFFSET (zl) = intrev32ifbe (intrev32ifbe (ZIPLIST_TAIL_OFFSET (zl)) + nextdiff);} else {/ * This element will be the new tail. * / ZIPLIST_TAIL_OFFSET (zl) = intrev32ifbe (p-zl);} / * When nextdiff! = 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist * / if (nextdiff! = 0) {offset = pMuzl; zl = _ _ ziplistCascadeUpdate (zl,p+reqlen); p = zl+offset;} / * Write the entry * / p+ = zipPrevEncodeLength (pMagneLen) P + = zipEncodeLength; if (ZIP_IS_STR (encoding)) {memcpy;} else {zipSaveInteger (zl,1); return zl;}

Let's parse this code briefly:

This function inserts a new piece of data at the specified location p. The address pointer of the data to be inserted is s and the length is slen. After insertion, a new data item is formed, which occupies the configuration of the original p. The data item originally located at p position and all the data items behind need to move backward uniformly to make room for the newly inserted data item. The parameter p points to the starting position of a data item in the ziplist, or to the end tag of the ziplist when inserted toward the tail.

The function starts by calculating the length prevlen of a data item before the position to be inserted. This length is to be stored in the field of the newly inserted data item.

Then calculate the total number of bytes occupied by the current data item reqlen, which consists of three parts:, and the real data. The data part will try to convert to an integer first by calling zipTryEncoding.

Due to the new memory requirements of ziplist caused by insertion, in addition to the reqlen occupied by the data items to be inserted, we should also consider the changes in the fields of the data items in the original p position (now after the data items to be inserted). Instead of saving the total length of the previous item, it now saves the total length of the currently inserted data item. In this way, the storage space required by its field itself may also change, which may be larger or smaller. The value nextdiff that has changed is calculated by calling zipPrevLenByteDiff. If it gets bigger, nextdiff is positive, otherwise it's negative.

Now it's easy to figure out how many bytes the new ziplist needs after insertion, and then call ziplistResize to resize. Allocator's zrealloc is called in the implementation of ziplistResize, which may cause a copy of the data.

Now that there is extra space, the next step is to move the data item in the original p position and all the data behind it back and set a new field for it. In addition, you may need to adjust the fields of ziplist.

Finally, assemble the new data item to be inserted and place it in position p.

Hash and ziplist

Hash is an ideal data type in Redis that can be used to store an object structure. The properties of an object correspond to each field of a hash structure.

We can easily find some technical articles on the Internet that say that storing an object and using hash saves more memory than string. In fact, there is a premise to say this, depending on how the object is stored. If you store multiple properties of an object on multiple key (each property value is stored as string), it certainly takes up more memory. But if you use some serialization methods, such as Protocol Buffers, or Apache Thrift, to serialize the object into a byte array and then store it in Redis's string, it is uncertain which is more memory-efficient than hash.

Of course, hash has an advantage over serialized and then stored in string in terms of supported operation commands: it supports both multiple field simultaneous access (hmset/hmget) and individual access according to a particular field (hset/hget).

In fact, with the increase of data, the implementation of the underlying data structure of hash will change, of course, the storage efficiency is also different. When field is small and each value is small, hash is implemented by ziplist; with the increase of field and value, hash may become dict. When the underlying hash becomes dict to implement, its storage efficiency can't be compared with those serialization methods.

When we execute the hset key field value command for a key for the first time, Redis creates a hash structure, and the underlying layer of the newly created hash is a ziplist.

Robj * createHashObject (void) {unsigned char * zl = ziplistNew (); robj * o = createObject (OBJ_HASH, zl); o-> encoding = OBJ_ENCODING_ZIPLIST; return o;}

The above createHashObject function, from object.c, is responsible for creating a new hash structure. As you can see, it creates a robj object with type = OBJ_HASH but encoding = OBJ_ENCODING_ZIPLIST.

In fact, the ziplist instance given earlier in this article is built from the following two commands.

Hset user:100 name tieleihset user:100 age 20

Each time the hset command is executed, the inserted field and value are inserted into the ziplist as a new data item (that is, each hset produces two data items).

As the data is inserted, the underlying ziplist of the hash may be converted to dict. So how much is inserted before it will turn?

Remember the two Redis configurations mentioned at the beginning of this article?

Hash-max-ziplist-entries 512hash-max-ziplist-value 64

This configuration means that ziplist will be converted to dict when one of the following two conditions is met:

When the number of data items (that is, field-value pairs) in hash exceeds 512, that is, when the ziplist data item exceeds 1024 (please refer to the hashTypeSet function in t_hash.c).

When any value inserted in hash is longer than 64 (refer to the hashTypeTryConversion function in t_hash.c).

Redis's hash is designed this way because when ziplist becomes very large, it has the following disadvantages:

The realloc operation triggered by each insert or modification has a higher probability of causing a memory copy, which degrades performance.

Once a memory copy occurs, the cost of memory copying increases accordingly, because a larger piece of data is copied.

When there are too many ziplist data items, the performance of finding specified data items on it becomes very low, because the lookup on ziplist needs to be traversed.

The above is what is the function of the internal data structure ziplist in Redis. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.