Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of SDS simple dynamic string of redis Internal data structure

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you the redis internal data structure of SDS simple dynamic string example analysis, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Preface

Instead of directly using the traditional string representation of C language (an array of characters ending with null characters), reids constructs an abstract type called simple dynamic string, which is represented as the default string of redis, because C string does not meet the security, efficiency and functional requirements of redis for strings.

1. SDS definition

In C, strings are stored as an array of characters ending with the'\ 0' character (NULL Terminator), usually expressed as a character pointer (char *). It does not allow byte 0 to appear in the middle of the string, so it cannot be used to store arbitrary binary data.

Type definition of sds

Typedef char * sds

Each sds.h/sdshdr structure represents a value of SDS struct sdshdr {/ / record the number of bytes used in the buf array / / equal to the length of the string saved by sds int len; / / record the unused data int free in the buf / / character array The value used to save the string} * free attribute is 0, which means that the SDS has not allocated any unused space * the length of the len attribute is 5, which means that the SDS saves a five-byte string * the buf attribute is an array of type char, and the first five bytes of the array hold the five characters of 'Renewable, grammatical, and grammatical, respectively. The last byte holds the empty string'\ 0'.

Someone must be confused that sds is synonymous with char *?

Sds and traditional C language strings keep type compatibility, so their type definitions are the same, both char *, and in some cases, where you need to pass in a C language string, you can indeed pass in a sds.

But sds is not the same as char *. Sds is Binary Safe, it can store any binary data, and it can't mark the end of a string with the character'\ 0' like a C language string, so it must have a length field, which is in header.

Header structure of sds

/ * Note: sdshdr5 is never used, we just access the flags byte directly. * However is here to document the layout of type 5 SDS strings. * / struct _ attribute__ ((_ packed__)) sdshdr5 {unsigned char flags; / * 3 lsb of type, and 5 msb of string length * / char buf [];}; struct _ attribute__ ((_ _ packed__)) sdshdr8 {uint8_t len; / * used * / uint8_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];} Struct _ _ attribute__ ((_ _ packed__)) sdshdr16 {uint16_t len; / * used * / uint16_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];}; struct _ attribute__ ((_ _ packed__)) sdshdr32 {uint32_t len; / * used * / uint32_t alloc; / * excluding the header and null terminator * / unsigned char flags / * 3 lsb of type, 5 unused bits * / char buf [];}; struct _ attribute__ ((_ packed__)) sdshdr64 {uint64_t len; / * used * / uint64_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];}

There are five types of header in SDS. The goal is to save memory.

The complete structure of an SDS string, consisting of two adjacent parts on the memory address:

A header. Typically contains the length of the string (len), maximum capacity (alloc), and flags. Sdshdr5 is different.

An array of characters. The length of this character array is equal to the maximum capacity + 1. The length of truly valid string data is usually less than the maximum capacity. After the real string data, there are free unused bytes (usually filled with byte 0), which allows the string data to be extended back a limited amount without reallocating memory. After the actual string data, there is a NULL Terminator, the'\ 0' character with ASCII code 0. This is for compatibility with traditional C strings. The reason why the length of the character array is 1 byte longer than the maximum capacity is that there is still 1 byte for the NULL Terminator when the string length reaches the maximum capacity.

Except for sdshdr5, the structure of the other four header contains three fields:

Len: represents the true length of the string (excluding the NULL Terminator).

Alloc: represents the maximum capacity of the string (excluding the last extra byte).

Flags: always takes up one byte. The lowest three bit are used to represent the type of header.

There are a few other things we need to pay attention to in the type definition of each header:

_ _ attribute__ ((packed)) is used in the definition of each header to allow the compiler to allocate memory in compact mode. Without this property, the compiler may optimize the alignment of struct fields by filling them with empty bytes. In that case, there is no guarantee that the data parts of header and sds are closely adjacent, nor can you get the flags field by a fixed offset of 1 byte to the low address.

There is a char buf [] at the end of each header definition. We notice that this is an unspecified character array, which is a special way to define character arrays in C language, called flexible arrays (flexible array member), which can only be defined on the last field of a structure. It just acts as a marker here, indicating that the flags field is followed by an array of characters, or that is, it indicates the offset of the character array immediately following the flags field in the structure. When the program allocates memory for header, it does not take up memory space. If you calculate the value of sizeof (struct sdshdr16), the result is 5 bytes, where there is no buf field.

Unlike several other header structures, sdshdr5 does not contain alloc fields, and the length is stored using the top 5 bits of flags. Therefore, it cannot allocate free space for a string. If the string needs to grow dynamically, then it must reallocate memory. Therefore, this type of sds string is more suitable for storing static short strings (less than 32 in length).

So far, we have seen very clearly that the header of the sds string is actually hidden in front of the real string data (in the low address direction). Such a definition has the following advantages:

The header is adjacent to the data and does not have to be allocated separately into two blocks of memory. This helps to reduce memory fragmentation and improve storage efficiency (memory efficiency).

Although there are many types of header, sds can be expressed as a unified char *. And it maintains type compatibility with traditional C language strings. If a sds stores a printable string, we can pass it directly to the C function, such as using strcmp to compare string sizes, or using printf to print.

With a clear understanding of the data structure of sds, its specific operating functions are easier to understand.

Some basic functions of sds

Sdslen (const sds s): gets the length of the sds string.

Sdssetlen (sdss, size_t newlen): sets the length of the sds string.

Sdsinclen (sds s, size_t inc): increase the sds string length.

Sdsalloc (const sds s): gets the capacity of the sds string.

Sdssetalloc (sdss, size_t newlen): sets the capacity of the sds string.

Sdsavail (const sds s): gets the sds string free space (that is, alloc-len).

SdsHdrSize (char type): gets the header size based on the header type.

SdsReqType (size_t string_size): calculates the required header type based on the length of the string data.

2. Dynamic allocation strategy of SDS array

With so many fields defined in header information, one of the most important functions is to achieve flexible manipulation of strings and minimize memory reallocation and recycling operations.

The memory allocation strategy for redis is as follows

When the length of the len attribute of SDS is less than 1MB, redis allocates free space of the same length as len. As for why this allocation, the last time the use of len length of space, then the next time the program may also use len length of space, so redis will pre-allocate so much space for you.

But when the len attribute of SDS is longer than 1MB, the program allocates an extra 1m of unused space. At this time, if I allocate it according to this inertia prediction, the loss will outweigh the gain. So redis sets 1MB as a value at risk, and I'll give you as much as you use it. After that, the value at risk is that I can give you a critical value.

The memory recovery strategy of reids is as follows

Redis memory recovery uses lazy recycling, that is, if you shorten the string, then I will not return the extra memory space to the operating system, but save it in case it is used again soon. Holding resources for a short time can not only make full use of resources, but also do not waste resources. This is a very good idea.

To sum up, the result of the high-performance string implemented by redis is that N times of string operations will inevitably lead to N times of memory reallocation to a maximum of N times of reallocation when the character is the worst.

/ * Enlarge the free space at the end of the sds string so that the caller * is sure that after calling this function can overwrite up to addlen * bytes after the end of the string, plus one more byte for nul term. * * Note: this does not change the * length* of the sds string as returned * by sdslen (), but only the free buffer space we have. * / sds sdsMakeRoomFor (sds s, size_t addlen) {void * sh, * newsh; size_t avail = sdsavail (s); size_t len, newlen; char type, oldtype = s [- 1] & SDS_TYPE_MASK; int hdrlen; / * Return ASAP if there is enough space left. * / if (avail > = addlen) return s; len = sdslen (s); sh = (char*) s-sdsHdrSize (oldtype); newlen = (len+addlen); if (newlen)

< SDS_MAX_PREALLOC) newlen *= 2; else newlen += SDS_MAX_PREALLOC; type = sdsReqType(newlen); /* Don't use type 5: the user is appending to the string and type 5 is * not able to remember empty space, so sdsMakeRoomFor() must be called * at every appending operation. */ if (type == SDS_TYPE_5) type = SDS_TYPE_8; hdrlen = sdsHdrSize(type); if (oldtype==type) { newsh = s_realloc(sh, hdrlen+newlen+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; } else { /* Since the header size changes, need to move the string forward, * and can't use realloc */ newsh = s_malloc(hdrlen+newlen+1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } sdssetalloc(s, newlen); return s;} /* Reallocate the sds string so that it has no free space at the end. The * contained string remains not altered, but next concatenation operations * will require a reallocation. * * After the call, the passed sds string is no longer valid and all the * references must be substituted with the new pointer returned by the call. */sds sdsRemoveFreeSpace(sds s) { void *sh, *newsh; char type, oldtype = s[-1] & SDS_TYPE_MASK; int hdrlen; size_t len = sdslen(s); sh = (char*)s-sdsHdrSize(oldtype); type = sdsReqType(len); hdrlen = sdsHdrSize(type); if (oldtype==type) { newsh = s_realloc(sh, hdrlen+len+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; } else { newsh = s_malloc(hdrlen+len+1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } sdssetalloc(s, len); return s;} 三、SDS的特点 sds正是在Redis中被广泛使用的字符串结构,它的全称是Simple Dynamic String。与其它语言环境中出现的字符串相比,它具有如下显著的特点: 可动态扩展内存。SDS表示的字符串其内容可以修改,也可以追加。在很多语言中字符串会分为mutable和immutable两种,SDS属于mutable类型的。 二进制安全(Binary Safe)。sds能存储任意二进制数据。 与传统的C语言字符串类型兼容。 预分配空间,可以懒惰释放,在内存紧张的时候也可以缩减不需要的内存 常数复杂度获取字符串长度 杜绝缓冲区溢出,边界检查 四、浅谈SDS与string的关系 127.0.0.1:6379>

Set test testOK127.0.0.1:6379 > append test "test" (integer) 9127.0.0.1 integer 6379 > get test "test test" 127.0.0.1 get test 6379 > setbit test 36 1 (integer) 0127.0.0.1 integer 6379 > get test "test (test" 127.0.0.1purl 6379 > getrange test-5-1 "(test")

The append operation is implemented using SDS's sdscatlen.

Both setbit and getrange fetch the entire sds string based on key, and then select or modify the specified part from the string. Because SDS is just an array of characters, it seems easy to manipulate some part of it.

However, in addition to these operations, string also supports operations such as incr, decr, and so on, when the value it stores is a number. Its internal storage is not SDS, in which case the implementation of setbit and getrange will be different.

The above is all the contents of the article "sample Analysis of SDS simple dynamic strings of redis Internal data structures". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report