Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of the internal data structure sds in Redis

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Redis what is the role of the internal data structure sds, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

Data structure definition of sds

We know that in C, strings are stored in an array of characters ending with the'\ 0' character (NULL Terminator), usually expressed as a character pointer (char *). It does not allow byte 0 to appear in the middle of the string, so it cannot be used to store arbitrary binary data.

We can find the type definition of sds in sds.h:

Typedef char sds

Someone must be confused that sds is synonymous with char? As we mentioned earlier, sds and traditional C language strings keep type compatibility, so their type definitions are the same, both char. In some cases, where you need to pass in a C language string, you can actually pass in a sds. However, sds and char are not the same. Sds is Binary Safe, it can store arbitrary binary data, can not be like the C language string with the character'\ 0' to identify the end of the string, so it must have a length field. But where is this length field? In fact, sds also contains a header structure:

Struct _ attribute__ ((_ packed__)) sdshdr5 {unsigned char flags; / * 3 lsb of type, and 5 msb of string length * / char buf [];}; struct _ attribute__ ((_ _ packed__)) sdshdr8 {uint8_t len; / * used * / uint8_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];} Struct _ _ attribute__ ((_ _ packed__)) sdshdr16 {uint16_t len; / * used * / uint16_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];}; struct _ attribute__ ((_ _ packed__)) sdshdr32 {uint32_t len; / * used * / uint32_t alloc; / * excluding the header and null terminator * / unsigned char flags / * 3 lsb of type, 5 unused bits * / char buf [];}; struct _ attribute__ ((_ packed__)) sdshdr64 {uint64_t len; / * used * / uint64_t alloc; / * excluding the header and null terminator * / unsigned char flags; / * 3 lsb of type, 5 unused bits * / char buf [];}

There are five types of header in sds. The reason why there are five is to enable strings of different lengths to use header of different sizes. In this way, short strings can use a smaller header, saving memory.

The complete structure of an sds string, consisting of two adjacent parts on the memory address:

A header. Typically contains the length of the string (len), maximum capacity (alloc), and flags. Sdshdr5 is different.

An array of characters. The length of this character array is equal to the maximum capacity + 1. The length of truly valid string data is usually less than the maximum capacity. After the real string data, there are free unused bytes (usually filled with byte 0), which allows the string data to be extended back a limited amount without reallocating memory. After the actual string data, there is a NULL Terminator, the'\ 0' character with ASCII code 0. This is for compatibility with traditional C strings. The reason why the length of the character array is 1 byte longer than the maximum capacity is that there is still 1 byte for the NULL Terminator when the string length reaches the maximum capacity.

Except for sdshdr5, the structure of the other four header contains three fields:

Len: represents the true length of the string (excluding the NULL Terminator). Alloc: represents the maximum capacity of the string (excluding the last extra byte). Flags: always takes up one byte. The lowest three bit are used to represent the type of header. There are five types of header, which are defined by constants in sds.h. # define SDS_TYPE_5 0#define SDS_TYPE_8 1#define SDS_TYPE_16 2#define SDS_TYPE_32 3#define SDS_TYPE_64 4

The data structure of sds, we need to parse it very carefully.

An example of Redis dict structure

The figure above is an example of the internal structure of sds. The figure shows the memory structure of two sds strings S1 and S2, one using header of type sdshdr8 and the other using header of type sdshdr16. But they all express the same value of a string of length 6: "tielei". Let's combine the code to explain the composition of each part.

Sds's character pointers (S1 and S2) point to the beginning of the real data (character array), while header is located in the direction of the lower memory address. There are some macro definitions in sds.h related to parsing header:

# define SDS_TYPE_MASK 7#define SDS_TYPE_BITS 3#define SDS_HDR_VAR (Tjuns) struct sdshdr##T * sh = (void*) ((s)-(sizeof (struct sdshdr##T); # define SDS_HDR (TMague s) ((struct sdshdr##T *) ((s)-(sizeof (struct sdshdr##T) # define SDS_TYPE_5_LEN (f) ((f) > > SDS_TYPE_BITS)

Where SDS_HDR is used to get the pointer to the starting position of header from the sds string, for example, SDS_HDR (8, S1) represents the header pointer of S1, and SDS_HDR (16, S2) represents the header pointer of S2.

Of course, before using SDS_HDR, we must first know which kind of header it is, so that we know what the first parameter of SDS_HDR should pass. The way to get the header type from the sds character pointer is to offset the position of 1 byte to the low address direction to get the flags field. For example, S1 [- 1] and S2 [- 1] obtain the flags values of S1 and S2, respectively. Then take the lowest 3 bit of flags to get the type of header.

Because S1 [- 1] = 0x01 = = SDS_TYPE_8, the header type of S1 is sdshdr8.

Because S2 [- 1] = 0x02 = = SDS_TYPE_16, the header type of S2 is sdshdr16.

With the header pointer, you can quickly navigate to its len and alloc fields:

In the header of S1, the value of len is 0x06, indicating that the length of the string data is 6 and the value of alloc is 0x80, indicating that the maximum capacity of the character array is 128.

In the header of S2, the value of len is 0x0006, indicating that the length of the string data is 6 and the value of alloc is 0x03E8, indicating that the maximum capacity of the character array is 1000. (note: the figure is made up of small-end addresses)

There are a few other things we need to pay attention to in the type definition of each header:

Attribute ((packed)) is used in the definition of each header to allow the compiler to allocate memory in a compact mode. Without this property, the compiler may optimize the alignment of struct fields by filling them with empty bytes. In that case, there is no guarantee that the data parts of header and sds are closely adjacent, nor can you get the flags field by a fixed offset of 1 byte to the low address.

There is a char buf [] at the end of each header definition. We notice that this is an unspecified character array, which is a special way to define character arrays in C language, called flexible arrays (flexible array member), which can only be defined on the last field of a structure.

It just acts as a marker here, indicating that the flags field is followed by an array of characters, or that is, it indicates the offset of the character array immediately following the flags field in the structure. When the program allocates memory for header, it does not take up memory space.

If you calculate the value of sizeof (struct sdshdr16), the result is 5 bytes, where there is no buf field.

Unlike several other header structures, sdshdr5 does not contain alloc fields, and the length is stored using the top 5 bits of flags.

Therefore, it cannot allocate free space for a string. If the string needs to grow dynamically, then it must reallocate memory. Therefore, this type of sds string is more suitable for storing static short strings (less than 32 in length).

So far, we have seen very clearly that the header of the sds string is actually hidden in front of the real string data (in the low address direction). Such a definition has the following advantages:

The header is adjacent to the data and does not have to be allocated separately into two blocks of memory. This helps to reduce memory fragmentation and improve storage efficiency (memory efficiency).

Although there are many types of header, sds can be expressed as a unified char *. And it maintains type compatibility with traditional C language strings.

If a sds stores a printable string, we can pass it directly to the C function, such as using strcmp to compare string sizes, or using printf to print.

With a clear understanding of the data structure of sds, its specific operating functions are easier to understand.

Some of the basic functions of sds sdslen (const sds s): get the sds string length. Sdssetlen (sdss, size_t newlen): sets the length of the sds string. Sdsinclen (sds s, size_t inc): increase the sds string length. Sdsalloc (const sds s): gets the capacity of the sds string. Sdssetalloc (sdss, size_t newlen): sets the capacity of the sds string. Sdsavail (const sds s): gets the sds string free space (that is, alloc-len). SdsHdrSize (char type): gets the header size based on the header type. SdsReqType (size_t string_size):

Calculates the required header type based on the length of the string data.

Here we pick the code for sdslen and sdsReqType and take a look.

Static inline size_t sdslen (const sds s) {unsigned char flags = s [- 1]; switch (flags&SDS_TYPE_MASK) {case SDS_TYPE_5: return SDS_TYPE_5_LEN (flags); case SDS_TYPE_8: return SDS_HDR (8)-> len; case SDS_TYPE_16: return SDS_HDR (16)-> len Case SDS_TYPE_32: return SDS_HDR (32)-> len; case SDS_TYPE_64: return SDS_HDR (64)-> len;} return 0;} static inline char sdsReqType (size_t string_size) {if (string_size)

< 1= addlen) return s; len = sdslen(s); sh = (char*)s-sdsHdrSize(oldtype); newlen = (len+addlen); if (newlen < SDS_MAX_PREALLOC) newlen *= 2; else newlen += SDS_MAX_PREALLOC; type = sdsReqType(newlen); /* Don't use type 5: the user is appending to the string and type 5 is * not able to remember empty space, so sdsMakeRoomFor() must be called * at every appending operation. */ if (type == SDS_TYPE_5) type = SDS_TYPE_8; hdrlen = sdsHdrSize(type); if (oldtype==type) { newsh = s_realloc(sh, hdrlen+newlen+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; } else { /* Since the header size changes, need to move the string forward, * and can't use realloc */ newsh = s_malloc(hdrlen+newlen+1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } sdssetalloc(s, newlen); return s;} sdscatlen将t指向的长度为len的任意二进制数据追加到sds字符串s的后面。本文开头演示的string的append命令,内部就是调用sdscatlen来实现的。 在sdscatlen的实现中,先调用sdsMakeRoomFor来保证字符串s有足够的空间来追加长度为len的数据。sdsMakeRoomFor可能会分配新的内存,也可能不会。 sdsMakeRoomFor是sds实现中很重要的一个函数。关于它的实现代码,我们需要注意的是: 如果原来字符串中的空余空间够用(avail >

= addlen), then it does nothing and returns directly.

If you need to allocate space, it allocates a little more than the actual request, in case further additions are made. It allocates at least more SDS_MAX_PREALLOC bytes if the string is already long, which is defined in sds.h as (1024, 1024) = 1MB.

Depending on the size of the allocated space, you may need to change the header type (the alloc field of the original header is too short to express the increased capacity).

If you need to change the header, then the entire string space (including header) needs to be reallocated (s_malloc) and the original data is copied to the new location.

If you don't need to change the header (the original header is enough), call a special s_realloc to try to reallocate space on the original address. The implementation of s_realloc depends on which allocator is chosen when Redis compiles (jemalloc is used by default on Linux).

But no matter which realloc implementation, its meaning is basically the same: it tries to redistribute in the original address location, if the original address location has enough free space to complete the reallocation, then the new address it returns is the same as the old address passed in; otherwise, it allocates a new address block and relocates the data. See http://man.cx/realloc.

From the functional interface of sdscatlen, we can see a usage pattern: when calling it, pass in an old sds variable, and then it returns a new sds variable. Because its internal implementation may cause address changes, after the caller has called, the old variables are invalidated and should be replaced with the new returned variables. Not only sdscatlen functions, but other functions in sds (such as sdscpy, sdstrim, sdsjoin, etc.), as well as other data structures in Redis that automatically expand memory (such as ziplist), are also used in the same pattern.

On the relationship between sds and string

Now let's go back to the example of string operation given at the beginning of this article.

The append operation is implemented using sds's sdscatlen. It has been mentioned earlier.

Both setbit and getrange fetch the entire sds string based on key, and then select or modify the specified part from the string. Because sds is just an array of characters, it seems easy to manipulate some part of it.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report