In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains the "Redis integer set of the use of what", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "what is the use of Redis integer set" bar!
I. Overview of the collection
STL's set is no stranger to collections, and its underlying implementation is a red-black tree. Insertion, deletion and search are all O (log n) time complexity. Of course, if you use a hash table to implement a collection, insert, delete, and lookup can all reach O (1). So why do collections use red-black trees and no hash tables? I think the greatest possibility is based on the characteristics of the set itself, and the set has its own unique operations: intersection, union, and difference. All three operations are O (n) for the hash table. Based on this, it is more appropriate to use an ordered red-black tree than an unordered hash table.
2. Redis integer set (intset)
The set of integers we are going to talk about today, also known as intset, is a unique data structure of Redis. Its implementation is neither a red-black tree nor a hash table. It's just a simple array plus memory coding. A collection of integers is used when there are fewer storage elements (the upper limit of the number of elements is defined at 512 in the OBJ_SET_MAX_INTSET_ENTRIES macro definition of server.h) and all are integers. Its lookup is O (log n), insertion and deletion are O (n). However, due to the relatively few storage elements, the difference between O (log n) and O (n) is not very large, but with this set of integers in Redis, memory can be greatly reduced compared to red-black trees and hash tables.
Therefore, the existence of intset, a collection of integers in Redis, is mainly to save memory.
1. Intset structure definition
The intset structure is defined in intset.h:
# define INTSET_ENC_INT16 (sizeof (int16_t)) # define INTSET_ENC_INT32 (sizeof (int32_t)) # define INTSET_ENC_INT64 (sizeof (int64_t)) typedef struct intset {uint32_t encoding; / * a * / uint32_t length; / * b * / int8_t contents []; / * c * /} intset
A) encoding specifies the encoding method, including INTSET_ENC_INT16, INTSET_ENC_INT32 and INTSET_ENC_INT64. As you can see from the macro definition, these three values are 2, 4, and 8, respectively. From the literal meaning, we can see that the range that the three can represent is 16-bit integer, 32-bit integer and 64-bit integer.
B) length stores the number of elements in a collection of integers.
C) contents is a flexible array of integers, and the element type is not necessarily of type int8_t. Contents does not occupy the size of the structure, it only serves as the first pointer to the collection of integers. The elements in the set of integers are arranged in contents in order from smallest to largest.
2. Coding mode
First of all, let's understand the meaning of the encoding mode encoding. To be clear, for a set of integers, the encoding of all elements must be consistent (otherwise each number must be stored in an encoding, rather than storing it in the intset structure), then the encoding of the entire set of integers depends on the number with the largest "absolute value" in the set (the reason is absolute, because integers contain positive and negative numbers).
Get the code from the integer with the highest absolute value, as follows:
Static uint8_t _ intsetValueEncoding (int64_t v) {if (v)
< INT32_MIN || v >INT32_MAX) return INTSET_ENC_INT64; else if (v
< INT16_MIN || v >INT16_MAX) return INTSET_ENC_INT32; else return INTSET_ENC_INT16;}
The implication of this code is that if the integer v cannot be represented as a 32-bit integer, then it needs to be encoded in INTSET_ENC_INT64; if it cannot be expressed as a 16-bit integer, then it needs to be encoded in INTSET_ENC_INT32; otherwise, it can be encoded in INTSET_ENC_INT16. The core is: if you can represent it with 2 bytes, you don't need 4 bytes, and if you can use 4 bytes, you don't need 8 bytes, and if you can save, you save.
Several macros are defined in stdint.h, as follows:
/ * Minimum of signed integral types. * / # define INT16_MIN (- 32767-1) # define INT32_MIN (- 2147483647-1) / * Maximum of signed integral types. * / # define INT16_MAX (32767) # define INT32_MAX (2147483647) 3, Encoding upgrade
When the current encoding is not sufficient to store a larger number of integers, you need to upgrade the encoding. For example, the four numbers shown in the following figure are all in the range of [- 32768, 32767], so you can use INTSET_ENC_INT16 coding. The array length of contents is sizeof (int16_t) * 4 = 2 * 4 = 8 bytes (that is, 64 binary bits).
Then we insert a number whose value is 32768, which is 1 greater than INT16_MAX, so it needs to be encoded in INTSET_ENC_INT32, and all the numbers in the set of integers need to be encoded consistently. Then, all numbers need to be converted to INTSET_ENC_INT32 coding. This is called "upgrade". As shown in the figure:
After the upgrade, the length of the contents array becomes sizeof (int32_t) * 5 = 4 * 5 = 20 bytes (that is, 160bits). And the memory occupied by each element has doubled, and the relative position has changed, so that all elements need to be migrated to higher memory.
It would be nice for us to code all the sets of integers in INTSET_ENC_INT64 at the beginning, and save the trouble. The reason is that Redis designed intset to save memory, assuming that the elements of a collection will never exceed 16-bit integers, so using 64-bit integers is equivalent to wasting three times as much memory.
Third, the collection of integers common operations 1, create a collection
Create a collection of integers intsetNew, which is implemented in intset.c:
Intset * intsetNew (void) {intset * is = zmalloc (sizeof (intset)); is- > encoding = intrev32ifbe (INTSET_ENC_INT16); is- > length = 0; return is;}
The set of integers created initially is an empty set. After the memory is allocated with zmalloc, the code is defined as INTSET_ENC_INT16, which keeps the memory as small as possible. It should be noted here that the storage of intset is directly related to memory encoding, so you need to consider the byte order of the host (see byte order for more information).
Intrev32ifbe means int32 reversal if big endian. That is, if the current host byte order is a large end order, then its memory storage is flipped. In short, all members of intset are stored in small end order. So create an empty set of integers and the memory distribution is as follows:
Now that we understand the memory encoding of a collection of integers, let's take a look at its settings (set) and get (get).
2. Element settings
Setting means that given a set of integers and a location and value, set the value to the corresponding position of the set of integers. _ intsetSet is implemented as follows:
Static void _ intsetSet (intset * is, int pos, int64_t value) {uint32_t encoding = intrev32ifbe (is- > encoding); / * a * / if (encoding = = INTSET_ENC_INT64) {((int64_t*) is- > contents) [pos] = value; / * b * / memrev64ifbe ((int64_t*) is- > contents) + pos) / * c * /} else if (encoding = = INTSET_ENC_INT32) {((int32_t*) is- > contents) [pos] = value; memrev32ifbe (int32_t*) is- > contents) + pos);} else {((int16_t*) is- > contents) [pos] = value; memrev16ifbe ((int16_t*) is- > contents) + pos);}}
A) large end sequence and small end sequence are only storage methods. Encoding performs an intrev32ifbe conversion when it is stored, and needs another intrev32ifbe conversion (serialization and deserialization in fact) when it is taken out and used.
B) according to the type of encoding, convert contents to a pointer of the specified type, then use pos to index to find the corresponding memory location, and then set the value of value to the corresponding memory.
C) for the implementation of memrev64ifbe, see the memrev64 function in byte order, which converts the value of the corresponding memory into small end-order storage.
3. Element acquisition
The meaning of getting is to return the value of an element at a given location, given a set of integers and a location. _ intsetGet is implemented as follows:
Static int64_t _ intsetGetEncoded (intset * is, int pos, uint8_t enc) {int64_t v64; int32_t v32; int16_t v16; if (enc = = INTSET_ENC_INT64) {memcpy (& v64, ((int64_t*) is- > contents) + pos,sizeof (v64)); / * a * / memrev64ifbe (& v64); / * b * / return v64 } else if (enc = = INTSET_ENC_INT32) {memcpy (& v32, ((int32_t*) is- > contents) + pos,sizeof (v32)); memrev32ifbe (& v32); return v32;} else {memcpy (& v16, ((int16_t*) is- > contents) + pos,sizeof (v16)); memrev16ifbe (& v16); return v16 }} static int64_t _ intsetGet (intset * is, int pos) {return _ intsetGetEncoded (is,pos,intrev32ifbe (is- > encoding);}
A) convert contents to a pointer of the specified type according to the type of encoding, then use pos to index to find the corresponding memory location, and copy the value on the memory location to the temporary variable
B) because it is a direct memory copy, so the value extracted is still in small end order, so the value obtained on the host with large end sequence is wrong, so you need to do another memrev64ifbe conversion to restore the value.
4. Element search
Because the set of integers is an ordered set, Redis uses a binary lookup to find out whether an element is in the set of integers. The intsetSearch implementation is as follows:
Static uint8_t intsetSearch (intset * is, int64_t value, uint32_t * pos) {int min = 0, max = intrev32ifbe (is- > length)-1, mid =-1; int64_t cur =-1; if (intrev32ifbe (is- > length) = = 0) {if (pos) * pos = 0; / * a * / return 0 } else {/ * b * / if (value > _ intsetGet (is,intrev32ifbe (is- > length)-1) {if (pos) * pos = intrev32ifbe (is- > length); return 0;} else if (value)
< _intsetGet(is,0)) { if (pos) *pos = 0; return 0; } } while(max >= min) {mid = (unsigned int) min + (unsigned int) max) > 1; / * c * / cur = _ intsetGet (is,mid); if (value > cur) {min = mid+1;} else if (value)
< cur) { max = mid-1; } else { break; } } if (value == cur) { /* d */ if (pos) *pos = mid; return 1; } else { if (pos) *pos = min; return 0; }} a) 整数集合为空,返回0表示查找失败; b) value 的值比整数集合中的最大值还大,或者比最小值还小,则返回0表示查找失败; c) 执行二分查找,将找到的值存在 cur 中; d) 如果找到则返回1,表示查找成功,并且将 pos 设置为 mid 并返回;如果没找到则返回一个需要插入的位置。 5、内存重分配 由于 contents 的内存是动态分配的,所以每次进行元素插入或者删除的时候,都需要重新分配内存,这个实现放在 intsetResize 中,实现如下: static intset *intsetResize(intset *is, uint32_t len) { uint32_t size = len*intrev32ifbe(is->Encoding); is = zrealloc (is,sizeof (intset) + size); return is;}
Encoding itself represents the number of bytes, so multiplying the number of sets len is the total number of bytes needed for the contents array. Call zrealloc to reallocate memory, and then return the reallocated address.
Note: the return value of zrealloc must be returned, because the address of intset may change after memory reallocation. That is, is = zrealloc (is,...) This is is not the other is. Therefore, all functions that call intsetResize need to return a new intset pointer along with it.
6. Code upgrade
The encoding upgrade must occur when the element is inserted and the absolute value of the inserted element is larger than that of the element in the integer set, so we put the upgraded element insertion and coding upgrade into a function implementation called intsetUpgradeAndAdd, which is implemented as follows:
Static intset * intsetUpgradeAndAdd (intset * is, int64_t value) {uint8_t curenc = intrev32ifbe (is- > encoding); uint8_t newenc = _ intsetValueEncoding (value); int length = intrev32ifbe (is- > length); int prepend = value
< 0 ? 1 : 0; /* a */ is->Encoding = intrev32ifbe (newenc); is = intsetResize (is,intrev32ifbe (is- > length) + 1); / * b * / while (length--) _ intsetSet (is,length+prepend,_intsetGetEncoded (is,length,curenc)); / * c * / if (prepend) _ intsetSet (is,0,value); else _ intsetSet (is,intrev32ifbe (is- > length), value) / * d * / is- > length = intrev32ifbe (intrev32ifbe (is- > length) + 1); return is;}
A) curenc records the pre-upgrade code, and newenc records the upgrade code
B) memory reallocation after setting the encoding of the set of integers is to the new encoding
C) get the data from the original memory and set it to the new memory (Note: since the two segments of memory overlap and the length of the new memory must be larger than the original memory, it needs to be copied from back to front)
D) when the inserted value value is negative, in order to ensure the order of the set, it needs to be inserted into the head of the contents; otherwise, it needs to be inserted into the tail; when the value is negative, the prepend is 1, so that the 0th position can be left blank when the memory is copied.
The figure shows the complete process of upgrading a collection of integers (- 32768, 0, 1, 32767) after inserting the number 32768:
The time complexity of upgrading a collection of integers is O (n), but the upgrade occurs at most twice during the lifetime of the set of integers (from INTSET_ENC_INT16 to INTSET_ENC_INT32 and from INTSET_ENC_INT32 to INTSET_ENC_INT64).
7. Memory migration
In the vast majority of cases, insert, delete, and find operations are performed. Inserting and deleting involves the movement of contiguous memory. In the internal implementation of Redis, there is a function intsetMoveTail that is used to achieve memory movement.
Static void intsetMoveTail (intset * is, uint32_t from, uint32_t to) {void * src, * dst; uint32_t bytes = intrev32ifbe (is- > length)-from; / * a * / uint32_t encoding = intrev32ifbe (is- > encoding); if (encoding = = INTSET_ENC_INT64) {src = (int64_t*) is- > contents+from; dst = (int64_t*) is- > contents+to Bytes * = sizeof (int64_t); / * b * /} else if (encoding = = INTSET_ENC_INT32) {src = (int32_t*) is- > contents+from; dst = (int32_t*) is- > contents+to; bytes * = sizeof (int32_t);} else {src = (int16_t*) is- > contents+from Dst = (int16_t*) is- > contents+to; bytes * = sizeof (int16_t);} memmove (dst,src,bytes); / * c * /}
A) count how many elements there are from from to the end
B) according to different encodings, calculate the number of memory bytes to be copied bytes, and copy source location src, copy destination location dst
C) memmove is a function in string.h: the memory area pointed to by src copies bytes bytes to the memory area pointed to by dst. This function supports memory overlap.
8. Element insertion
Finally, let's talk about the insertion and deletion of the set of integers. The insert calls intsetAdd, which is implemented in intset.c:
Intset * intsetAdd (intset * is, int64_t value, uint8_t * success) {uint8_t valenc = _ intsetValueEncoding (value); uint32_t pos; if (success) * success = 1; if (valenc > intrev32ifbe (is- > encoding)) {/ * a * / return intsetUpgradeAndAdd (is,value) } else {if (intsetSearch (is,value,&pos)) {if (success) * success = 0; / * b * / return is;} is = intsetResize (is,intrev32ifbe (is- > length) + 1) / * c * / if (pos
< intrev32ifbe(is->Length)) intsetMoveTail (is,pos,pos+1); / * d * /} _ intsetSet (is,pos,value); is- > length = intrev32ifbe (intrev32ifbe (is- > length) + 1); / * e * / return is;}
A) the memory coding of the inserted numeric value is larger than that of the existing collection, and the intsetUpgradeAndAdd is called directly to upgrade the coding.
B) the collection element is not duplicated. If intsetSearch can find it, set success to 0, indicating that the insertion failed.
C) if the intsetSearch cannot be found, the intset is reallocated, that is, the length is increased by 1.
D) pos is the location where the value will be inserted during the intsetSearch process, and we move the memory after the pos back 1 unit (the 1 unit here may be 2 bytes, 4 bytes, or 8 bytes, depending on the memory encoding of the current set of integers).
E) call _ intsetSet to set the value of value to the location of pos, and then add 1 to the member variable length. Finally, the first address of the intset pointer is returned, because there is an intsetResize during it, so the intset pointer passed in may not be the same as the one returned.
9. Element deletion
IntsetRemove is called to delete the element, and the implementation is as follows:
Intset * intsetRemove (intset * is, int64_t value, int * success) {uint8_t valenc = _ intsetValueEncoding (value); uint32_t pos; if (success) * success = 0; if (valenc encoding) & & intsetSearch (is,value,&pos) {/ * a * / uint32_t len = intrev32ifbe (is- > length); if (success) * success = 1; if (pos)
< (len-1)) intsetMoveTail(is,pos+1,pos); /* b */ is = intsetResize(is,len-1); /* c */ is->Length = intrev32ifbe (len-1);} return is;}
A) deletion can be performed only if the element value exists in the set of integers
B) if the element can be found through intsetSearch, then its location is on pos, which moves the memory forward through intsetMoveTail
C) intsetResize reallocates memory and subtracts the set length by 1
Thank you for your reading, the above is the "Redis integer set of the use of what" the content, after the study of this article, I believe you have a deeper understanding of the use of Redis integer set of what this problem, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.