How to solve the uneven Hash Distribution in Redis 07/11 Update SLTechnology News&Howtos

How to solve the uneven Hash Distribution in Redis

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to solve the uneven distribution of hashes in Redis. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Preface

Redis is a database of key-value pairs whose keys are stored by hashing. The whole Redis can be thought of as an outer hash, which is called an outer hash because a hash type is also provided inside the Redis, which can be called an internal hash. When we use hash objects for data storage, there are two layers of hash storage for the whole Redis.

Hash object

The hash object itself is also a key-value storage structure, and the underlying storage structure can also be divided into two types: ziplist (compressed list) and hashtable (hash table). The two storage structures are also distinguished by coding:

Encoding attribute description object encoding command return value OBJ_ENCODING_ZIPLIST uses compressed list to implement hash object ziplistOBJ_ENCODING_HT uses dictionary to implement hash object hashtablehashtable

The key-value in Redis is wrapped by the dictEntry object, and the hash table is obtained by wrapping the dictEntry object again, which is the hash table object dictht:

Typedef struct dictht {dictEntry * * table;// hash table array unsigned long size;// hash table size unsigned long sizemask;// mask size, used to calculate the index value, always equal to the number of existing nodes in the size-1 unsigned long used;// hash table} dictht

Note: the table in the structure definition above is an array, each element of which is a dictEntry object.

Dictionaries

A dictionary, also known as a symbol table (symbol table), an associative array (associative array), or a mapping (map), has a hash table dictht object nested inside the dictionary. Here is the definition of a dictionary ht:

Some specific functions of typedef struct dict {dictType * type;// dictionary type void * privdata;// private data, specific functions in type may need to use dictht ht [2]; / / hash table (note that there are two hash tables here) long rehashidx; / / rehash index, when not in rehash, the value is-1 unsigned long iterators; / / the number of iterators in use} dict

Some common functions are defined inside dictType, and their data structures are defined as follows:

Typedef struct dictType {uint64_t (* hashFunction) (const void * key); / / calculate hash function void * (* keyDup) (void * privdata, const void * key); / / copy key function void * (* valDup) (void * privdata, const void * obj); / / copy value function int (* keyCompare) (void * privdata, const void * key1, const void * key2) / / compare key function void (* keyDestructor) (void * privdata, void * key); / / destroy key function void (* valDestructor) (void * privdata, void * obj); / / destroy value function} dictType

When we create a hash object, we get the following diagram (some properties are omitted):

Rehash operation

An array ht [2] is defined in dict, and two hash tables are defined in ht [2]: ht [0] and ht [1]. By default, Redis only uses ht [0], does not use ht [1], and does not initialize the allocation of space for ht [1].

When setting a hash object, which subscript will fall into the hash array (dictEntry [3] in the figure above) is determined by calculating the hash value. If a hash collision occurs (the calculated hash values are the same), then there will be multiple dictEntry for the same subscript, forming a linked list (the rightmost point to the NULL in the figure above), but it is important to note that the last inserted element always falls at the front of the linked list (that is, when a hash conflict occurs, the node is always placed at the head of the linked list).

When you encounter a node with multiple elements when reading data, you need to traverse the linked list, so the longer the linked list, the worse the performance. To ensure the performance of the hash table, you need to rehash (re-hash) the hash table when one of the following two conditions is met:

When the load factor is greater than or equal to 1 and dict_can_resize is 1.

When the load factor is greater than or equal to the safety threshold (dict_force_resize_ratio=5).

PS: load factor = number of nodes used in the hash table / hash table size (that is, h [0] .used / h [0] .size).

Rehash step

Both extended hashing and shrinking hashing are accomplished by executing rehash, which involves the allocation and release of space, mainly through the following five steps:

1. Allocate space for the ht [1] hash table of the dictionary dict, whose size depends on the number of saved nodes in the current hash table (that is, ht [0] .used):

If it is an extension operation, the first of the n powers of the size 2 of ht [1] is greater than or equal to the value of the ht [0] .used * 2 attribute (for example, used=3, where ht [0] .used * 2 is 6, so the third power of 2 is 8, which is the first value greater than used * 2 (2 to the power of 2).

< 6 且 2 的 3 次方 >

6)).

In the case of a shrink operation, the first of the n-th powers of ht [1] size 2 is greater than or equal to the value of ht [0] .used.

two。 Setting the value of the attribute rehashix in the dictionary to 0 indicates that a rehash operation is in progress.

3. Recalculate the hash value of all the key-value pairs in ht [0] in turn, and put them in the corresponding position of the ht [1] array. The rehashix value needs to be incremented by 1 after completing the rehash of a key-value pair.

4. After all the key-value pairs in ht [0] have been migrated to ht [1], release ht [0], change ht [1] to ht [0], and then create a new ht [1] array in preparation for the next rehash.

5. Setting the attribute rehashix in the dictionary to-1 means that this rehash operation ends and waits for the next rehash.

Progressive rehash

This re-hashing operation in Redis is not all rehash at once, but slowly rehash the key-value pairs in ht [0] to ht [1], so this operation is also called progressive rehash. Progressive rehash can avoid the huge amount of computation brought by centralized rehash, and it is an idea of divide and conquer.

In the progressive rehash process, because new key-value pairs may be saved, the practice of * * Redis is to put the newly added key-value pairs into ht [1], which ensures that the number of ht [0] key-value pairs will only be reduced * *.

When the rehash operation is being performed, if the server receives a command request from the client, it will query ht [0] first, and then query ht [1] if no result is found.

Ziplist

Some of the features of ziplist have been analyzed separately in previous articles. For more information, please click here. However, it should be noted that the difference between the ziplist in the hash object and the ziplist in the list object is that the hash object is in the form of key-value, so its ziplist also shows that key-value,key and value are close together:

Transcoding between ziplist and hashtable

When a hash object can meet either of the following two conditions, the hash object chooses to use ziplist encoding for storage:

The total length of all key-value pairs (including keys and values) in the hash object is less than or equal to 64 bytes (this threshold can be controlled by the parameter hash-max-ziplist-value).

The number of key-value pairs in the hash object is less than or equal to 512 (this threshold can be controlled by the parameter hash-max-ziplist-entries).

Once either of these two conditions is not met, the hash object chooses to use hashtable encoding for storage.

Common commands for hash objects

Hset key field value: sets a single field (the key value of the hash object).

Hmset key field1 value1 field2 value2: sets multiple field (the key value of the hash object).

Hsetnx key field value: set the value of the field field in the hash table key to value, and do nothing if field already exists.

Hget key field: gets the value corresponding to the domain field in the hash table key.

Hmget key field1 field2: gets the value corresponding to multiple fields field in the hash table key.

Hdel key field1 field2: delete one or more field from the hash table key.

Hlen key: returns the number of fields in the hash table key.

Hincrby key field increment: add an incremental increment to the value of the field field in the hash table key. Increment can be negative, and an error will be reported if field is not a number.

Hincrbyfloat key field increment: add an incremental increment,increment to the value of the field field in the hash table key can be negative, and an error will be reported if field is not of type float.

Hkeys key: gets all the fields in the hash table key.

Hvals key: gets the values of all fields in the hash table.

Knowing the common commands for manipulating hash objects, we can verify the type and encoding of the hash objects mentioned earlier. In order to prevent the interference of other key values, let's execute the flushall command to empty the Redis database before testing.

Then execute the following command in turn:

Hset address country china type address object encoding address

The results are as follows:

You can see that when there is only one key-value pair in our hash object, the underlying code is ziplist.

Now let's change the hash-max-ziplist-entries parameter to 2, then restart Redis, and finally enter the following command to test:

Hmset key field1 value1 field2 value2 field3 value3 object encoding key

After the output, the following results are obtained:

As you can see, the coding has become hashtable.

The above content is how to solve the uneven distribution of hashes in Redis. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.