Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the data structure in Redis

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces what the data structure in Redis is, which can be used for reference by interested friends. I hope you can learn a lot after reading this article. Let's take a look at it.

In actual development, Redis will be used frequently, so how should we correctly choose the data type in the process of using it? Which data types are applicable in which scenarios. And interviewers often ask questions about Redis data structures during interviews:

Why is Redis fast?

Why is the query operation slower?

Redis Hash rehash process

Why use hash table as the index of Redis

When we analyze and understand the Redis data structure, we can choose the right data type when we use Redis and improve the performance of the system. [related recommendation: Redis video tutorial]

Redis underlying data structure

Redis is an in-memory key-value key-value database, and the key-value pair data is stored in memory, so Redis's memory-based data operation is efficient and fast.

Where Key is the String type, and the value types supported by Redis include String, List, Hash, Set, Sorted Set, BitMap, and so on. The reason why Redis can be widely applied to many business scenarios is based on its diversified types of value.

The data type of Redis's Value is based on the object system redisObject customized for Redis.

Typedef struct redisObject {/ / type unsigned type:4; / / Encoding unsigned encoding:4; / / pointer to the underlying implementation data structure void * ptr;... .. }

In addition to recording actual data, redisObject also needs additional memory space to record metadata information such as data length and space usage, which contains 8-byte metadata and an 8-byte pointer pointing to the location of the actual data of a specific data type:

Among them, the pointer points to the location of the data stored in the underlying data structure based on Redis, and the underlying data structure of Redis: SDS, two-way linked list, jump table, hash table, compressed list, integer set.

So how is the underlying data structure of Redis implemented?

Implementation of Redis underlying data structure

Let's first look at Redis's relatively simple SDS, two-way linked list, set of integers.

SDS, two-way linked lists, and integer sets

SDS, use the len field to record the number of bytes used, reduce the complexity of getting string length to O (1), and SDS is lazy to free space, you free the space, the system records the data and you can use it directly next time you want to use it. You don't have to apply for new space.

A collection of integers, which allocates a continuous address space in memory, and data elements are stored next to each other without additional pointers. It is characterized by compact memory, high efficiency of O (1) query complexity and O (N) complexity of other operations.

Two-way linked lists, which can be discontiguous, non-sequential space in memory, concatenate the order between elements through additional pointer overhead.

Its characteristic is that the complexity of inserting / updating data is O (1), the efficiency is high and the query complexity is O (N).

Hash hash table

A hash table is actually similar to an array. Each element of the array is called a hash bucket. Key-value pairs of data are stored in each hash bucket, and the elements in the hash bucket use dictEntry structure.

Therefore, the hash bucket element does not store the key-value pair itself, but a pointer to the specific value, so saving each key-value pair will incur additional space overhead of at least 24 bytes, especially for key-value pairs where Value is String, each key-value pair requires an additional 24 bytes of space. When the saved data is small and the extra cost is greater than the data, consider changing the data structure in order to save space.

Let's take a look at the full picture of the global hash table:

Although hash table operations are fast, when Redis data becomes larger, there is a potential risk: hash table conflicts and rehash overhead problems, which explains why hash table operations are slower?

Hash conflicts are inevitable when writing more data to the hash table. Redis solves hash conflicts by chain hashing. Multiple elements in the same hash bucket are stored in a linked list, and they are connected by pointers in turn, as shown in the figure:

When there are more and more hash conflicts, it will cause some hash conflict chains to be too long, which will lead to long time-consuming and inefficient search of elements on this chain.

In order to solve the problem of long chain caused by hash conflict, rehash operation is carried out to increase the number of existing hash buckets and disperse the number of elements in a single bucket. So how does the rehash process work?

Rehash

To make rehash operations more efficient, use two global hash tables: hash table 1 and hash table 2, as follows:

Allocate more space to hash table 2

Remap and copy the data from hash table 1 to hash table 2

Free up space in hash table 1

However, because of the large data in Table 1 and Table 2 during remapping replication, if all the data in hash table 1 is migrated at once, the Redis thread will be blocked and other requests cannot be served.

In order to avoid this problem and ensure that Redis can handle client requests properly, Redis uses progressive rehash.

When each request is processed, all the entries in the index position is copied from hash table 1 to hash table 2 in turn, which allocates the cost of a large number of copies at one time to the process of processing requests many times, which avoids time-consuming operations and ensures fast access to data.

After understanding the relevant knowledge points of the Hash hash table, take a look at the uncommon compressed lists and jump tables.

Compressed list and jump table

Compressed list, based on the array, there are three fields zlbytes, zltail and zllen in the header of the compressed list, which represent the length of the list, the offset at the end of the list and the number of entry in the list, respectively. The compressed list also has a zlend at the end of the table, indicating the end of the list.

Advantages: compact memory saves memory space, a continuous address space is allocated in memory, data elements will be stored next to each other, and no additional pointers are needed to bring space overhead; finding and locating the first element and the last element can be located directly through the length of the first three fields in the table, and the complexity is O (1).

The jump table adds a multi-level index to the linked list to quickly locate the data through several jumps in the index position, as shown in the following figure:

For example, query 33

Features: when the amount of data is very large, the search complexity of the hopping table is O (logN).

To sum up, we can know the time complexity of the underlying data structure:

Data structure type time complexity hash table O (1) integer array O (N) bi-directional linked list O (N) compressed list O (N) hop table O (logN)

The object system type defined by Redis is the data type of Redis's Value, and the data type of Redis is based on the underlying data structure, so what are the data types?

Redis data type

String, List, Hash, Sorted Set and Set are common types, which correspond to the underlying data structures as follows:

Data type data structure StringSDS (simple dynamic string) List two-way linked list

Compressed list Hash compressed hash list Sorted Set compressed list hopped table Set hash table integer array

The correspondence characteristic of the data type is similar to the underlying data structure of its implementation, and the nature is the same, and

String, based on SDS implementation, is suitable for simple key-value storage, setnx key value implementation of distributed locks, counters (atomicity), and distributed global unique ID.

List, sorted according to the order in which elements enter the List, follows the FIFO (first-in, first-out) rule and is generally used in sorting statistics and simple message queues.

Hash, which is the mapping between the string key and the string value, is very suitable to represent an object information. The complexity of adding and deleting features is O (1).

Set is an unordered collection of elements of type String. The members of the collection are unique, which means that there can be no duplicate data in the collection. Based on the hash table, so the complexity of adding, deleting and searching is O (1).

Sorted Set is an upgrade of the type of Set, except that each element is associated with a score of type double, which can be queried in a range by sorting the scores.

So let's take a look at these data types, Redis Geo, HyperLogLog, BitMap?

Redis Geo, which regards the earth as an approximate sphere, converts the two-dimensional longitude and latitude into a string based on GeoHash to realize the location division and the query of the specified distance. Features are generally used in location-related applications.

HyperLogLog, a probabilistic data structure, uses probabilistic algorithms to calculate the approximate cardinality of a set, with an error rate of about 0.81%. When the number of set elements is very large, the space needed to calculate the cardinality is always fixed and small, so it is suitable for UV statistics.

BitMap, which uses a bit to map the state of an element, has only 0 and 1 states, which is a very typical binary state, and it is a statistical binary state data type realized by using String type as the underlying data structure. It has the advantage of saving a lot of memory space, but it can be used in binary statistics scenarios.

After understanding the above knowledge, let's next discuss which strategies are used to select the Redis data type for the corresponding application scenario.

Select the appropriate Redis data type strategy

In practical development applications, Redis can be applied to many business scenarios, but how do we choose data type storage?

It is mainly based on time / space complexity, and the following points can be considered in actual development:

The size of the data itself.

Collection type statistical model

Support single point query / range query

Special usage scenario

The size of the data itself.

When the amount of data is relatively large and the data itself is relatively small, the use of String will greatly increase the use of extra space, because using a hash table to save key-value pairs and using dictEntry structure to save each key-value pair will lead to the overhead of saving three additional pointers of dictEntry when saving each key-value pair, which will lead to less data itself than the extra space overhead, and eventually lead to the storage space data size much larger than the original data storage size.

You can implement List, Hash, and Sorted Set based on integer arrays and compressed lists, because both integer arrays and compressed lists allocate a contiguous block of address space in memory, and then put the elements in the collection one by one in this space, which is very compact, eliminating the space overhead of additional pointers by concatenating the elements together. And when using the collection type, a key corresponds to the data of a collection, which can save a lot of data, but only one dictEntry is used, which saves memory.

Collection type statistical model

Common statistical patterns for Redis collection types are:

Aggregate statistics (intersection, difference, union statistics): when aggregating multiple sets, you can choose Set

Sort statistics (requires set types to keep the order of elements): List and Sorted Set in Redis are ordered sets, List is sorted according to the order in which elements enter List, and Sorted Set can sort elements according to their weight.

Binary state statistics (there are only 0 and 1 values for collection elements): Bitmap itself is a data type for counting binary states using the String type as the underlying data structure. Bitmap uses BITCOUNT to count the number of 1s after bitwise and, OR, XOR operations by BITOP.

Cardinality statistics (counting the number of non-repeating elements in a set): HyperLogLog is a type of data set used to count cardinality, the statistical results have some errors, and the standard miscalculation rate is 0.81%. For accurate statistical results, use the Set or Hash type.

Set type, suitable for aggregating statistical users / friends / followers / fans / interested people, such as

Statistics on the number of new users of mobile APP per day

The common friend of two users

List and Sorted Set in Redis are ordered collections, using response set elements to sort requirements, such as

List of latest comments

Ranking

Bitmap binary state statistics are suitable for those with a large amount of data and can be represented by binary state, such as:

Check in, the number of users signing in on the same day

Users are active every week.

User online status

HyperLogLog is a type of data set used to count cardinality, counting the number of non-repeating elements in a set, such as

According to the UV of the web page, multiple visits by a user in a day can only be counted as once.

Support single point query / range query

In Redis, List and Sorted Set are ordered sets that support range queries, but Hash does not support range queries.

Special usage scenario

Message queuing uses Redis as the implementation of message queuing. The basic requirements of messages are as follows: keeping order of messages, handling duplicate messages and ensuring message reliability:

Message queuing solution based on List

Message queuing solution based on Streams

Based on List based on Strems message preservation using LPUSH/RPOP using XADD/XREAD blocking read using BRPOP repetitive message processing producer using XREAD block to automatically generate global unique ID message reliability using BRPOPLPUSH using PENDING List automatic retention message applicable scenario message total amount small message total amount needs to be read in the form of consumer group

Based on location LBS service, using the specific GEO data type of Redis, GEO can record geographic location information in the form of longitude and latitude, so it is widely used in LBS service. For example: how the ride-hailing software provides services based on location.

The reason why Redis is so fast is that its memory-based data operation and using Hash hash table as index are efficient and fast, and because the diversity of its underlying data makes it suitable for many scenarios, choosing the appropriate data type in different scenarios can improve its query performance.

Thank you for reading this article carefully. I hope the article "what is the data structure in Redis" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report