Redis data structure and memory Management Strategy (part I) 07/12 Update SLTechnology News&Howtos

Redis data structure and memory Management Strategy (part I)

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Redis data structure and memory Management Strategy (part I)

Tag: Redis Redis data structure Redis memory management policy Redis data type Redis type mapping

Redis data type characteristics and usage scenarios String, List, Hash, Set, Zset case: Hujiang group buying system greatly promotes hot-top interface cache to design Redis memory data structure and coding OBJECT encoding key, DEBUG OBJECT key simple dynamic string (simple dynamic string) linked list (dict) jump table (skip list) integer set (zip list) Redis Object type and mapping Redis memory management policy key expiration time, Lifetime expiration key deletion policy AOF, RDB processing expiration key policy Redis LRU algorithm Redis persistence mode AOF (Append-only file) RDB (Redis DataBase) Redis data type characteristics and usage scenarios

Redis provides us with five data types. Basically, the one we use most frequently is string, while the other four data types are slightly less frequently used than string.

On the one hand, because string is relatively simple to use, it is convenient to store complex large objects, and there are many scenes to use. Another reason is that because redis expire time can only be set on key, such as list, hash, set, and zset belong to the collection type and will manage a set of item, we cannot set the expiration time on the item of these collections, so using expire time to deal with the cache failure of the collection becomes a little more complicated. However, it is easier for string to use expire time to manage expiration policies because it contains fewer items. The set we are talking about here is a broad similar set.

Another deep-seated reason for our habitual use of string while ignoring the other four data types is mostly because we don't know much about the use and principles of the other four data types. At this time, it is often overlooked that using a certain data type in a particular scenario may have much higher performance than string, such as using hash structure to improve the modification of an item of an entity.

Here we do not intend to list the use of these five data types, there are many of these data on the Internet. We mainly discuss the functional features of these five data types, which are suitable for dealing with real business scenarios, and the most important thing is how we combine these five data types to solve complex cache problems.

String 、 List 、 Hash 、 Set 、 ZsetString

String is the string type provided by redis. You can set expire time independently for the string type. It is commonly used to store long string data, such as the json string of an object.

The most ingenious thing about using the string type is that we can splice key dynamically. Usually, we can put a set of id in set, and then dynamically check whether the string still exists. If it does not exist, it means that it has expired or the active delete is due to data modification, we need to do the cache data load again.

Although set cannot set the expiration time of item, we can associate set item with string key to achieve the same effect.

On the left in the figure above is a set collection with key as set:order:ids, which may be a full collection or a collection obtained by a query condition.

Sometimes complex scenarios require multiple set collections to support computing, and there may be many such collections in the redis server.

These sets can be called functional data, which are used to assist cache calculation. After performing various set operations, we will get the subset that the current query needs to return, and finally we will get the real data of a certain order.

These string:order: {orderId} string key are not necessarily intended to serve a scenario, but rather the lowest-level data of the entire system, which is eventually needed in various scenarios. Those set collections can be thought of as query condition data, which are used to assist in the calculation of query conditions.

Redis provides us with TYPE commands to view the data type of a key, such as: string type:

SET string:order:100 order-100TYPE string:order:100stringList

List is very suitable for improving throughput scenarios, because its unique LPUSH, RPUSH, LPOP, RPOP functions can seamlessly support producer and consumer architecture patterns.

This is ideal for implementing work-stealing algorithms similar to those in the Java Concurrency Fork/Join framework (work theft).

Java fork/join framework uses parallelism to improve performance, but it will bring race condition (race condition) problems caused by concurrent take task, so work-stealing algorithm is used to solve the performance loss caused by competition.

A typical payment callback peak scenario is simulated in the figure above. Where the peak occurs, we usually use the method of adding buffer to speed up the request processing speed, so as to improve the concurrent processing capacity and throughput.

After the payment gateway receives the callback, it will be handed over to the distributor without any processing. The dispenser is a stateless cluster, and each node gets the message channel that the downstream processor registers with the registry by pull handler queue list.

Each dispatcher node maintains a local queue list and then pushes messages sequentially to these queue list. There will be a small problem here, that is, how to do load balance when paying gateway to call the dispenser. If it is not the average load, some queue list may be higher than other queue list.

The dispenser does not need to do soft load balance, because it does not matter if one queue list is more than other queue list, because the downstream message handler will steal other slow-consuming queue list according to the work-stealing algorithm.

Redis list's LPUSH, RPUSH, LPOP, and RPOP features can indeed improve this scale-out computing power in many scenarios.

Hash

The hash data type is obviously based on the hash algorithm, and the time complexity of searching items is O (1). In extreme cases, the problem of item hash conflict may occur. Redis is solved by linked list plus key judgment. The specific data structure within redis will be introduced later, so we won't expand it here.

The characteristics of hash data types can usually be used to solve the mapping relationship, while some items need to be updated or deleted. If it is not an item that needs to be maintained, it can generally be solved by using string.

If there is a need to modify a field, using string will obviously add a lot of overhead, you need to read it and deserialize it into an object, then operate, and then serialize it back to redis, which may have concurrency problems.

Then we can use the entity attribute hash storage feature provided by redis hash, we can think of hash value as a hash table, and each attribute of the entity gets the final data index of the attribute through hash.

The image above uses the hash data type to record the a metrics b metrics of the page. On the left is the statistics of each area of the home index, and on the right is the statistics of each area of the marketing marketing.

In the program, we can easily use the atomic feature of redis to accumulate an item in hash.

HMSET hash:mall:page:ab:metrics:index topbanner 10 leftbanner 5 rightbanner 8 bottombanner 20 productmore 10 topshopping 8OKHGETALL hash:mall:page:ab:metrics:index 1) "topbanner" 2) "10" 3) "leftbanner" 4) "5" 5) "rightbanner" 6) "8" 7) "bottombanner" 8) "20" 9) "productmore" 10) "10" 11) "topshopping" 12) "8" HINCRBY hash:mall:page:ab:metrics:index topbanner 1 (integer) 11

Use redis hash increment for atomic increment operations. The HINCRBY command can increment any given integer, or you can increment floating-point type data through HINCRBYFLOAT.

Set

The set collection data type can support collection operations and cannot store duplicate data.

The biggest feature of set is the computing power of sets, inter intersection, union union, diff difference sets, these characteristics can be used to do high-performance cross computing or eliminate data.

Set collections are relatively numerous and free to use in scenarios. To take a simple example, commodity and activity scenarios are more common in application systems. A valid collection of items is cached with a set, and a collection of active items is cached with a set. If the goods appear on and off the shelves, you only need to maintain the effective goods set, each time you get the active goods, you need to filter whether there are products off the shelves, and if so, you need to remove them from the active goods.

Of course, you can delete the cached active items directly when you take them off the shelves, but the activity is load from the marketing system. Even if I delete the active items in cache, there will still be items off the shelves the next time I load them from the marketing system. Of course, this is just an example, a scenario has different implementation methods.

In the figure above, there are two different sets on the left and right. On the left is the ids collection of available goods in the marketing domain, and on the right is the ids collection of active goods in the marketing domain. The intersection of the two sets is calculated in the middle.

SADD set:marketing:product:available:ids 1000100 1000120 1000130 1000140 1000150 1000160SMEMBERS set:marketing:product:available:ids1) "1000100" 2) "1000120" 3) "1000130" 4) "1000140" 5) "1000150" 6) "1000160" SADD set:marketing:activity:product:ids 1000120 1000120 1000130 1000140 1000200 1000300SMEMBERS set:marketing:activity:product:ids1) "1000100" 2) "1000120" 3) "1000130" 4) "1000140" 5) "1000200" 6) "1000300" SINTER set:marketing:product:available:ids set:marketing:activity:product:ids1) "1000100" 2) "1000120" 3) "1000130" 4) "1000140"

In some complex scenarios, you can also use the SINTERSTORE command to store the result of the intersection calculation in a target set. This is particularly useful in pipelines using pipeline commands, where wrapping SINTERSTORE commands in a pipeline command string allows you to reuse the calculated result set.

Because redis is a Signle-Thread single-threaded model, based on this feature, we can use the pipeline pipeline provided by redis to submit a series of logical commands that will not be disturbed by commands from other clients during processing.

Zset

Zset sort collections are similar to set collections, but zset provides sorting capabilities. When we introduce the set collection, we know that the members of the set collection are unordered, and zset fills the gap that the collection can be sorted.

The most powerful feature of zset is that it can be sorted according to a certain score ratio, which is urgently needed in many business scenarios. For example, in promotional activities, goods are sorted according to the number of goods sold, and popular scenic spots are sorted according to the number of inflows in tourist attractions.

Basically everything people do needs to be sorted according to certain conditions.

In fact, zset can be used everywhere in our application system. Here we give a simple example. In the group purchase system, we usually need to sort the group list according to the number of participants, and everyone wants to join the group that is about to form a group.

The figure above shows a zset,score score created according to the group purchase code, which is the sum of the number of participants.

ZADD zset:marketing:groupon:group:codes 5 G_PXYJY9QQFA 8 G_4EXMT6NZJQ 20 G_W7BMF5QC2P 10 G_429DHBTGZX 8 G_KHZGH9U4PPZREVRANGEBYSCORE zset:marketing:groupon:group:codes 1000 01) "G_W7BMF5QC2P" 2) "G_ZMZ69HJUCB" 3) "G_429DHBTGZX" 4) "G_KHZGH9U4PP" 5) "G_4EXMT6NZJQ" 6) "G_PXYJY9QQFA" ZREVRANGEBYSCORE zset:marketing:groupon:group:codes 1000 0 withscores 1) "G_W7BMF5QC2P" 2) "20" 3) "G" _ ZMZ69HJUCB "4)" 10 "5)" G_429DHBTGZX "6)" 10 "7)" G_KHZGH9U4PP "8)" 8 "9)" G_4EXMT6NZJQ "10)" 8 "11)" G_PXYJY9QQFA "12)" 5 "

Zset itself provides many ways to sort collections, and if you need a score score, you can use the withscore sentence to bring out the score of each item.

In some special situations, combinatorial sorting may be needed, and multiple zset may be used to sort the same entity in different dimensions, by time, by number of people, and so on. At this time, you can combine the convenience brought by zset, and use pipeline to combine multiple zset to get a combinatorial sort set.

Case: cache Design of hot-top Interface promoted by Hujiang Group purchase system

We summarize the characteristics and general usage scenarios of the five data types provided by redis. But not only can we use these data types separately, we can use these data types together to complete complex cache scenarios.

Let's share an example of using multiple zset and string to optimize the foreground interface of a group purchase system. Due to space and time constraints, only the information related to this case is introduced here.

Hot-top interface refers to hot spots and ranking interfaces, indicating that it has relatively high pageviews and concurrency. Generally speaking, there are several APIs with high performance requirements when it is greatly promoted.

Let's first analyze the general information contained in a query interface.

First, a query interface must have query condition query conditions, then sort sorting information, and finally page paging information. This is the basic responsibility of general APIs. Of course, in special scenarios, you also need to support the data session consistency requirements of master/slave replication, and you need to provide trace tags to query data back and forth, so we will not expand here.

We can abstract the information from these dimensions:

Query condition

Query conditions, companyid=100,sellerid=1010101, and so on.

Sort

Sorting information usually defaults to a column sort, but in complex scenarios, it is possible to allow interface users to customize sorting fields, such as some tenant information columns.

Page

Paging information is simply understood as the number of lines to lines after the data records have been sorted.

Since we only use redis to improve cache capabilities here, it doesn't involve any search capabilities, so we ignore other complex queries here. In fact, we use elastcsearch in complex places to improve our search ability.

Above we analyze and summarize the basic information of a query interface, and another design principle about high concurrency interface is to separate the hot-top interface from the general search interface, because only divide and conquer can we choose different technologies according to their characteristics. If we encapsulate all query scenarios in one interface regardless of responsibilities, it will be very troublesome to optimize the performance of the interface later. Some scenarios cannot or are difficult to be solved with cache, because various scenario logic is coupled in the interface, and the performance will not be high even if it can be implemented barely.

The purpose of laying the groundwork is to reach a basic consensus when introducing the case. Now let's look at the specific logic of the hot-top interface of this group buying system.

During the promotion, you need to show the group purchase list. The traffic of this API is very large. The group purchase activity needs to be sorted in reverse order according to the number of participants, and the specified number of group list is returned by page.

Let's assume that this interface is called getTopGroups (getTopGroupsRequest request)

Query condition query condition problem

Let's carefully analyze, first of all, different query conditions from the DB query data is not the same, that is to say, the query group list is not the same, there may be company companies, channel channels and other filtering conditions. Since there will not be too many groups under a group purchase activity, hundreds at most is the limit, so there will be no more than dozens of group lists for a query condition, and no more than 10 hot query conditions will be analyzed according to the scenario. Therefore, we choose to hash the query condition into a code to cache the full group list collection of this query condition, but these result sets do not have any sort.

Sort scheduling problem

If we look at the ranking problem according to the number of participants, we can immediately think of using zset to deal with the group sorting problem, because there is only one sorting dimension, so one zset is enough. We use a zset to cache the number of participants of all groups, which is a full group sort set.

So how do we sort the group list of users' query conditions according to the number of participants, we can use the intersection operation of zset to directly calculate the zset subset of the current set.

Page paging problem

Get the paging collection by using zrange on the sorted group list zset.

Let's take a look at the complete process, how to deal with query, sorting, paging.

The above figure calculates hash code from query condition, and then queries the list of full groups of current conditions through DB.

Zset:marketing:groupon:hottop:available:group key represents the number of participants in the full group, which is cached by a zset. Then, by intersecting the two zset calculations, you can get the zset with the number of participants needed for the current query, and finally use zrevrange to get the paging interval.

ZADD zset:marketing:groupon:hottop:condition:2986080 0 G4ZD5732YZQ 0 G5VW3YF42UC 0 GF773FEJ7CC 0 GFW8DUEND8S 0 GKPKKW8XEY9 0 GL324DGWMZM (integer) 6ZADD zset:marketing:groupon:hottop:available:group 5 GN7KQH36ZWK 10 GS7VB22AWD4 15 GF773FEJ7CC 17 G5VW3YF42UC 18 G4ZD5732YZQ 32 GTYJKCEJBRR 40 GKPKKW8XEY9 45 GL324DGWMZM 50 GFW8DUEND8S 60 GYTKY4ACWLT (integer) 10ZINTERSTORE zset:marketing:groupon:hottop:condition:interstore 2 zset:marketing:groupon:hottop:condition:2986080 zset:marketing:groupon:hottop:available:group (integer) 6ZRANGE zset:marketing:groupon:hottop:condition:interstore 0-1 withscores 1) " GF773FEJ7CC "2)" 15 "3)" G5VW3YF42UC "4)" 17 "5)" G4ZD5732YZQ "6)" 18 "7)" GKPKKW8XEY9 "8)" 40 "9)" GL324DGWMZM "10)" 45 "11)" GFW8DUEND8S "12)" 50 "ZREVRANGE zset:marketing:groupon:hottop:condition:interstore 24 withscores1)" GKPKKW8XEY9 "2)" 40 "3)" G4ZD5732YZQ "4)" 18 "5)" G5VW3YF42UC "6)" 17 "

Once you have the returned regiment code collection, you can obtain the group details of string type in batches through mget, and the code will not be posted here.

Due to space and time constraints, we will not introduce too many business scenarios here. It also involves the issue of calculating the expiration time of cache, which is also related to the operational rules of promotions, and possible query condition hash conflicts, but these are no longer relevant to the topics of this section.

Redis memory data structure and coding

We have learned about the five data types provided by redis, so how does redis support these five data types, that is, what kind of data structure does redis use to store and find the data we set in memory?

Although we use five data types to cache data, redis chooses different data structures and encodings depending on how we store the data.

We use the five data types provided by redis on a daily basis, but there are many data structures and encodings of these five data types in memory. With the different types of data we store and the different size of the amount of data, it will cause the dynamic adjustment of the data structure in memory.

This section is only a general introduction to the data structure and coding, without too much detail discussion, on the one hand, there are many information about redis source code analysis on the web, and another reason is that the implementation of each version of redis is very different, once the details are discussed in each point, each data structure will be very complex, so we will not discuss these here, just play a role.

OBJECT encoding key 、 DEBUG OBJECT key

We know that we can use the type command to see if a key is one of five data types, but we can use the OBJECT encoding command when we want to see which data structures and encodings are used at the bottom of a key.

SET string:orderid:10101010 10101010OKOBJECT encoding string:orderid:10101010 "int" SET string:orderid:10101010 "orderid:10101010" OKOBJECT encoding string:orderid:10101010 "embstr"

The same key, but redis chooses different in-memory data structures and encodings because of the different values we set. Although the string data type provided by redis, redis will automatically identify whether our cache data type is int or string.

If we set a string that is no longer than 39 bytes, it will be encoded using embstr, and if it is greater than 39 bytes, it will be encoded using raw. Redis 4. 0 expands this threshold by 45 bytes.

In addition to using the OBJECT encoding command, we can also use the DEBUG OBJECT command to view more details.

DEBUG OBJECT string:orderid:10101010Value at:0x7fd190500210 refcount:1 encoding:int serializedlength:5 lru:6468044 lru_seconds_idle:8DEBUG OBJECT string:orderid:10101010Value at:0x7fd19043be60 refcount:1 encoding:embstr serializedlength:17 lru:6465804 lru_seconds_idle:1942

DEBUG OBJECT can see the refcount reference count, serializedlength length, and _ lru_secondsidle time of the object, which determines the key cache cleanup policy.

Simple dynamic string (simple dynamic string)

The simple dynamic string is abbreviated to SDS, and all the strings involved in redis are implemented in SDS, of course, not literally. The difference between SDS and traditional C string is that SDS is structured, which can efficiently deal with allocation, recovery, length calculation and so on.

Struct sdshdr {unsigned int len; unsigned int free; char buf [];}

This is the definition of the sds.h header file in redis version 3.0, which has changed a lot since 3.0.0. Len represents the string length, free represents the space length, and the buf array represents the string.

SDS has many advantages, for example, the time complexity of getting the length is O (1), there is no need to traverse all the char buf [] groups, and the len value is returned directly.

Static inline size_t sdslen (const sds s) {struct sdshdr * sh = (void*) (s-(sizeof (struct sdshdr); return sh- > len;}

Of course, there are also space allocation checking, space pre-allocation, space laziness release, and so on, which are the powerful extensibility brought by SDS structured strings.

Linked list (linked list)

We are familiar with the linked list data structure, and the biggest feature is that the addition and deletion of nodes are very flexible. The underlying redis List data type is implemented based on linked lists. This is an implementation of redis 3.0.

Typedef struct list {listNode * head; listNode * tail; void * (* dup) (void * ptr); void (* free) (void * ptr); int (* match) (void * ptr, void * key); unsigned long len;} list;typedef struct listNode {struct listNode * prev; struct listNode * next; void * value;} listNode

The quicklist linked list structure was introduced in redis 3.2.0, which combines the advantages of linkedlist and ziplist.

Typedef struct quicklist {quicklistNode * head; quicklistNode * tail; unsigned long count; / * total count of all entries in all ziplists * / unsigned int len; / * number of quicklistNodes * / int fill: 16; / * fill factor for individual nodes * / unsigned int compress: 16; / * depth of end nodes not to compress;0=off * /} quicklist;typedef struct quicklistNode {struct quicklistNode * prev; struct quicklistNode * next; unsigned char * zl Unsigned int sz; / * ziplist size in bytes * / unsigned int count: 16; / * count of items in ziplist * / unsigned int encoding: 2; / * RAW==1 or LZF==2 * / unsigned int container: 2; / * NONE==1 or ZIPLIST==2 * / unsigned int recompress: 1; / * was this node previous compressed? * / unsigned int attempted_compress: 1; / * node can't compress; too small * / unsigned int extra: 10 / * more bits to steal for future usage * /} quicklistNode

Quicklist provides flexibility while taking into account the compression capabilities of ziplist, and quicklist- > encoding specifies two compression algorithms. Quicklist- > compress indicates that we can implement the deep compression capability of quicklist node. Redis provides two configurations for compression.

List-max-ziplist-size:ziplist length control

List-compress-depth: controls the compression number of nodes at both ends of the linked list. The closer the nodes are, the more likely they are to be accessed, so the nodes with high access probability can be uncompressed, and other nodes can be compressed.

Comparing quicklist of redis 3.2 with redis 3.0, it is clear that quicklist provides richer compression capabilities. The version of redis 3.0 is the direct cache value for each listnode, while quicklistnode also has a strong ability to compress.

LPUSH list:products:mall 100200 300 (integer) 3OBJECT encoding list:products:mall "quicklist"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.