In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains the "redis bitmap use case analysis", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "redis bitmap use case analysis" bar!
1. Introduction to bitmap
How do we accomplish this if we need to record whether a user logs in to our system every day of the year? If you use KV storage, each user needs to record 365, and when there are hundreds of millions of users, the storage space required is amazing.
Redis provides us with the data structure of bitmap, each user's login record occupies only one bit per day, 365 days is 365 bits, and only 46 bytes are needed to store, which greatly saves storage space.
Bitmap data structure is not a whole new thing, we can simply think of it as an array, but the contents can only be 0 or 1 (binary array).
two。 Command actual combat
Redis provides four common commands, SETBIT, GETBIT, BITCOUNT, and BITOP, for dealing with binary arrays.
SETBIT: specifies the set value of the binary bit on the offset for the bit array, which is counted from 0, and the value of the binary bit can only be 0 or 1. Return to the in-situ value.
GETBIT: gets the value of the binary bit on the specified offset.
BITCOUNT: counts the number of binary bits with a value of 1 in the digit array.
BITOP: performs bitwise and, OR, XOR operations on multiple bit arrays.
127.0.0.1 SETBIT first 01 # 0000 (integer) 0127.0.0.1 SETBIT first 3 1 # 0000 1001 (integer) 0127.0.0.1 SETBIT first 6379 > SETBIT first 00 # 0000 1000 (integer) 1127.0.0.1 integer > GETBIT first 0 (integer) 0127.0.1 0.1 purl 6379 > GETBIT first 3 (integer) 1127.0.1 1000 (integer) 1127.0.0.1 : 6379 > SETBIT first 01 # 0000 1001 (integer) 0127.0.0.1 BITCOUNT first # 1001 (integer) 2127.0.1 BITCOUNT first 6379 > SETBIT first 11 # 0000 1011 (integer) 0127.0.0.1 BITCOUNT first 6379 > BITCOUNT first # 00001011 (integer) 3127.0.0.1 SETBIT x 31 (integer) 0127.0.0.1 SETBIT x 11 (integer) 0127.0.1 .1SETBIT x 01 # 0000 1011 (integer) 0127.0.0.1 SETBIT y 2 1 (integer) 0127.0.0.1 SETBIT 6379 > SETBIT y 11 # 0000 0110 (integer) 0127.0.0.1 SETBIT z 2 1 (integer) 0127.0.0.1 BITOP AND andRes x y z > BITOP AND andRes x y z z 01 # 0000 0101 (integer) 0127.0.1 # 0000 0000 (integer) 1127.0.0.1 integer 6379 > BITOP OR orRes x y z # 0000 1111 (integer) 1127.0.0.1 integer 6379 > BITOP XOR x y z # 0000 1000 (integer) bit-by-bit inverse 127.0.0.16379 > SETBIT value 01 (integer) 0127.0.0.1integer 6379 > SETBIT value 3 "0000 1001 (integer) 0127.0.0.1 : 6379 > BITOP NOT notValue value # 1111 0110 (integer) 13.BitMap source code analysis 3.1 data structure
An one-byte (8-bit) bitmap in SDS is shown below:
Extension: each object in Redis is represented by a redisObject structure.
Typedef struct redisObject {/ / Type unsigned type:4;// Encoding unsigned encoding:4;unsigned lru:REDIS_LRU_BITS; / * lru time (relative to server.lruclock) * / / reference count int refcount;// performs a pointer to the underlying implemented data structure void * ptr;} robj
The value REDIS_STRING of type indicates that this is a string object
A value of 1 for sdshdr.len means that the SDS holds a 1-byte bit array.
Buf [0] in the buf array actually saves the bit array.
Buf [1] in the buf array is an automatically appended\ 0 character
For our observation, each byte of the buf array is represented by a row, buf [I] indicates that this is the I byte of the buf array, and the eight squares after buf [I] represent the 8 bits on this byte.
To bring your thoughts together again, another digit array is shown below:
The bit array is saved by three bytes buf [0], buf [1] and buf [2]. The real data is 1111 0000 1100 0011 1010 0101.
3.2 GETBIT
GETBIT is used to return the value of the binary bit of the bit array on the offset. It is worth noting that the time complexity of GETBIT is O (1).
The execution of the GETBIT command is as follows:
Calculate $byte =\ lfloor offset\ p8\ rfloor $(that is, > > 3), and the byte value indicates which byte of the specified offset is in the bit array (in which line of the calculation)
Now that you specify the i i i in buf [I], which bit out of 8 bytes will be calculated next? Calculated using $bit = (offset\%\ 8) + 1 $
Locate the target value in the bit array and return it according to byte and bit.
The source code of the GETBIT command is as follows:
Void getbitCommand (client * c) {robj * o; char llbuf [32]; uint64_t bitoffset; size_t byte, bit; size_t bitval = 0; / / get offset if (getBitOffsetFromArgument (cMagne c-> argv [2], & bitoffset,0,0)! = C_OK) return / / to find the corresponding bitmap object if ((o = lookupKeyReadOrReply (cjournal c-> argv [1], shared.czero)) = = NULL | | checkType (cmemogramme OBJ string) return; / / calculate which line of the bitmap offset is located in the bit array byte = bitoffset > > 3; / / calculate the number of bits of offset in a line, which is equivalent to taking the module bit = 7-(bitoffset & 0x7) / / # define sdsEncodedObject (objptr) (objptr- > encoding = = OBJ_ENCODING_RAW | | objptr- > encoding = = OBJ_ENCODING_EMBSTR) if (sdsEncodedObject (o)) {/ / SDS is RAW or EMBSTR type if (byte)
< sdslen(o->Ptr)) / / get the value of the specified location / / Note that it is not a true two-dimensional array and cannot be obtained with ((uint8_t*) o-> ptr) [byte] [bit]. ~ bitval = ((uint8_t*) o-> ptr) [byte] & (1 ptr) bitval = llbuf [byte] & (1 argv [2], & bitoffset) 0pl 0)! = C_OK) return / / get the value we need to set if (getLongFromObjectOrReply (cMagnec-> argv [3], & on,err)! = C_OK) return; / * to determine whether the specified value is 0 or 1 * / if (on & ~ 1) {/ / set a value other than 0 and 1, and directly report an error addReplyError (cMagneur); return } / / query the SDS object based on key (will automatically expand) if ((o = lookupStringForBitCommand (c Magnebitoffset)) = NULL) return; / * to get the current value * / byte = bitoffset > > 3; byteval = ((uint8_t*) o-> ptr) [byte]; bit = 7-(bitoffset & 0x7); bitval = byteval & (1 argv [1]) NotifyKeyspaceEvent (NOTIFY_STRING, "setbit", c-> argv [1], c-> db- > id); server.dirty++; addReply (c, bitval? Shared.cone: shared.czero);}
Take a chestnut 1.0.
Array represents the three-byte array in the figure above. Take SETBIT array 101 as an example:
$len =\ lfloor10room8\ rfloor + 1$ gets a value of 2, indicating that a bit array of at least 2 bytes is required
Check if you need to expand your capacity. No.
Calculate $byte = 1, calculate, calculate, calculate bit = 3, save, save, save the new value of oldValue$, setting
Return to oldValue oldValue oldValue
Take a chestnut 2.0.
Assume that the target bit array is 1 byte long. Execute SETBIT array 12 1, as follows:
Len= ⌊ 12 / 8 ⌋ + 1 gets a value of 2, indicating that a 2-byte SDS is required.
Check to see if you need to expand your capacity, yes! As can be seen from the automatic expansion mechanism of SDS, SDS will be expanded at twice the new length.
Calculate $byte = 1 $
Calculate $bit = 5 $
Save oldValue and set new values
Return to oldValue oldValue oldValue
3.4 BITCOUNT
The BITCOUNT command is used to count the number of binary bits with a value of 1 in the positioning array. The function does not seem complicated, but in fact it is not easy to implement this command efficiently, and some delicate algorithms are needed.
Counting the number of non-zero binary bits in a bit array is mathematically called "calculating hamming weight".
3.4.1 violent traversal
The easiest and most straightforward way to implement the BITCOUNT command is to iterate through each binary bit in the bit array and increment the counter when it encounters a binary bit with a value of 1.
Small amount of data is OK, large amount of data directly PASS!
3.4.2 look-up table method
For a finite set, the arrangement of set elements is limited, and for a finite length array of bits, the binary permutation it can represent is also limited. According to this principle, we can create a table whose key is an array of bits, and the value of the table is the number of binary bits with a value of 1 in the corresponding array.
For an 8-bit long array, we can create the following table, through which we can read 8 bits from the array at a time, and then look up the table based on the 8-bit values to directly know how many 1s the value contains.
Unfortunately, the table lookup method consumes memory!
3.4.3 binary statistical algorithm: variable-precision SWAR
At present, the most efficient general algorithm is the variable-precision SWAR algorithm, which can calculate the hamming weight of multiple bytes in constant time through a series of displacement and bit operations, and does not need to use any extra memory.
The SWAR algorithm code is as follows:
Uint32_t swar (uint32_t I) {/ 5 binary: 0101 I = (I & 0x55555555) + ((I > > 1) & 0x55555555); / / 3 binary: 0011 I = (I & 0x33333333) + ((I > 2) & 0x33333333); I = (I & 0x0F0F0F0F) + ((I > 4) & 0x0F0F0F0F); I = (i* (0x01010101) > 24); return I I = I-((I > > 1) & 0x55555555); I = (I & 0x33333333) + ((I > > 2) & 0x33333333); return (I + (I > > 4) & 0x0F0F0F0F) * 0x01010101) > > 24;}
Here's a description of what these steps have been done:
The binary representation of the value I calculated in step 1 can be grouped into a group of every two binary bits, and the decimal representation of each group is the number of 1 of the group.
The binary representation of the value I calculated in step 2 can be grouped into a group of every four binary bits, and the decimal representation of each group is the number of 1 of the group.
The binary representation of the value I calculated in step 3 can be grouped into a group of eight binary bits, and the decimal representation of each group is the number of 1 of the group.
The i*0x01010101 statement in step 4 calculates the number of 1 in bitarray and records it in the highest eight bits of the binary bit, while the > > 24 statement moves the hamming weight of bitarray to the lowest eight digits by moving to the right, and the result is the hamming weight of bitarray.
Take a chestnut.
For calling swar (0xFBB4080B), step 1 calculates the value 0xA6640406, and the decimal representation of every two binary bits of the value table records the hamming weight of every two binary bits of the 0xFBB4080B.
Step 2 calculates the value 0x43310103, and the decimal representation of every four binary bits of the value table records the hamming weight of every four binary bits of 0xFBB4080B.
Step 3 calculates the value 0x7040103, the decimal representation of every eight binary bits of the value table records the hamming weight of every eight binary bits of 0xFBB4080B.
Step 4 first calculates 0x7040103 * 0x01010101 = 0xF080403, aggregating the hamming weight to the highest eight digits of the binary bit.
After calculating 0xF080403 > > 24, move the hamming weight to the lower eight bits to get the final value 0x1111, that is, decimal 15.
If you are a Java programmer, you can take a look at the Integer.bitCount method, which is also based on the idea of SWAR algorithm!
You can also take a look at the explanation of it on StackFlow: [How does this algorithm to count the number of set bits in a 32-bit integer work?] (https://stackoverflow.com/questions/22081738/how-does-this-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer-work)3.4.4 source code analysis)
The hamming weight is calculated by calling the redisPopcount method in Redis. The source code is as follows:
Long long redisPopcount (void * s, long count) {long long bits = 0; unsigned char * p = s; uint32_t * p4 Table static const unsigned char bitsinbyte for the look-up table method: / / Table static const unsigned char bitsinbyte = {0pr 1] = {0meme1pyrum 2pyrus 2pyrus 3pyrus 2pyrus 3pyrus 3pyrus 3pyrus 4pyrus 4pime 4pyrus 3pyrus 3pyrus 4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,3,4,4,5,4,5,5,6,4,5,5,6,5,6 6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8} / / CPU reads 8 bytes at a time. If 4 bytes span two 8 bytes, you need to read it twice / / so consider 4-byte alignment. You can read while ((unsigned long) p & 3 & & count) {bits + = bitsinbyte [* paired +]; count-- only once. } / / one-time processing of 28 bytes is easy to understand by looking at an aux alone. In fact, it is SWAR algorithm / / uint32_t:4 bytes p4 = (uint32_t*) p; while (count > = 28) {uint32_t aux1, aux2, aux3, aux4, aux5, aux6, aux7; aux1 = * p4 bytes read at a time aux2 = * p4 bytes read at a time aux2 = * p4 bytes; aux3 = * p4 bytes + Aux4 = * p4 cycles; aux5 = * p4 cycles; aux6 = * p4 cycles; aux7 = * p4 cycles; count-= 28 / 28 bytes processed, so count needs to subtract 28 aux1 = aux1-((aux1 > 1) & 0x55555555); aux1 = (aux1 & 0x33333333) + (aux1 > 2) & 0x33333333); aux2 = aux2-(aux2 > 1) & 0x55555555) Aux2 = (aux2 & 0x33333333) + ((aux2 > 2) & 0x33333333); aux3 = aux3-((aux3 > > 1) & 0x55555555); aux3 = (aux3 & 0x33333333) + ((aux3 > > 2) & 0x33333333); aux4 = aux4-((aux4 > > 1) & 0x55555555); aux4 = (aux4 & 0x33333333) + (aux4 > > 2) & 0x33333333); aux5 = aux5-(aux5 > 1) & 0x55555555) Aux5 = (aux5 & 0x33333333) + ((aux5 > > 2) & 0x33333333); aux6 = aux6-((aux6 > > 1) & 0x55555555); aux6 = (aux6 & 0x33333333) + ((aux6 > > 2) & 0x33333333); aux7 = aux7-(aux7 > > 1) & 0x55555555); aux7 = (aux7 & 0x33333333) + (aux7 > > 2) & 0x33333333) Bits + = (aux1 + (aux1 > 4)) & 0x0F0F0F0F) + ((aux2 + (aux2 > 4)) & 0x0F0F0F0F) + ((aux3 + (aux3 > 4)) & 0x0F0F0F0F) + ((aux4 + (aux4 > 4)) & 0x0F0F0F0F) + ((aux5 + (aux5 > 4)) & 0x0F0F0F0F) + ((aux6 + (aux6 > 4)) & 0x0F0F0F0F) + ((aux7 + (aux7 > 4)) & 0x0F0F0F0F)) * 0x01010101) > > 24 } / * the remaining less than 28 bytes, use the look-up table method to count * / p = (unsigned char*) p4; while (count--) bits + = bitsinbyte [* paired +]; return bits;}
It is not difficult to find that the look-up table method and SWAR algorithm are used to complete the BITCOUNT function in Redis.
4. Interview question: 4 billion QQ to repeat
If there is no memory limit for 1GB, we can use sorting and Set to complete this algorithm:
Sort: ① first sorts 4 billion QQ numbers; ② traverses from small to large, skipping repeating elements and taking only the first element.
Set: put all 4 billion QQ numbers into the Set collection, automatically remove the weight, Perfect
The answer is to GG's rhythm!
How long does it take to sort 4 billion QQ numbers? The capacity of the array that stores 4 billion QQ numbers has exceeded that of 1GB, and the same way that the Set collection stores so many QQ numbers leads to memory overruns.
BitMap weight removal
* * isn't that a coincidence? we can use the BITMAP we just learned to go back and forth. * * A byte can record the existence of 8 numbers (similar to counting sorting). Setting the value of offset corresponding to QQ number to 1 means that this number exists. After traversing 4 billion QQ numbers, you can directly count the offset with a value of 1 on BITMAP to complete the de-duplication of QQ number.
If it is to sort 4 billion QQ numbers, it can also be done with a bitmap.
5. Bitmap actual combat
Now that we have an in-depth understanding of BITMAP, it is not reasonable not to carry out an actual combat project.
We use BITMAP to implement this small function of counting the number of submissions per day in GITHUB, which is based on SpringBoot+Echarts.
If it is to record the login status, we can easily use 0 and 1 records. If it is to record the number of submissions, it looks like BITMAP is useless. It doesn't matter, we can use one byte to record the number of submissions, but we just need to deal with the direct conversion between decimal and binary in business.
Generate simulation data public void genTestData () {if (redisUtils.isExist (CommonConstant.KEY)) {return;} / / get the total number of days in the current year int days = getDays (); for (int I = 0; I < days; iTunes +) {int random = ThreadLocalRandom.current (). NextInt (64); / / generate random numbers to represent the number of PR times per day String binaryString = Integer.toBinaryString (random) If (binaryString.length () < 8) {/ / fill 0 if (binaryString.length () = = 0) {binaryString = "00000000";} else if (binaryString.length () = = 1) {binaryString = "0000000" + binaryString;} else if (binaryString.length () = = 2) {binaryString = "000000" + binaryString;} else if (binaryString.length () = 3) {binaryString = "00000" + binaryString } else if (binaryString.length () = = 4) {binaryString = "0000" + binaryString;} else if (binaryString.length () = = 5) {binaryString = "0000" + binaryString;} else if (binaryString.length () = = 6) {binaryString = "00" + binaryString;} else if (binaryString.length () = 7) {binaryString = "0" + binaryString;}} char [] chars = binaryString.toCharArray () For (int j = 0; j < chars.length; jacks +) {/ / set BitMap redisUtils.setBit (CommonConstant.KEY,i*8+j,chars [j]);} / * * get the total number of days in the current year * @ return days total days * / private int getDays () {Calendar calOne = Calendar.getInstance (); int year = calOne.get (Calendar.YEAR) System.out.println (year); Calendar calTwo = new GregorianCalendar (year, 11,31); return calTwo.get (Calendar.DAY_OF_YEAR);} get data public List getPushData () {List res = new ArrayList (366); / / create data genTestData () without data; int days = getDays (); for
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.