Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Why Redis uses a jump table instead of a red-black tree to implement SortedSet

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "Why Redis uses a jump table instead of a red-black tree to achieve SortedSet". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Catalogue

What is a jump watch?

What exactly is the meaning of skipping the meter?

Search time complexity of hopping table

Does the meter skip cost a lot of memory?

Time complexity of inserts and deletions

insert

Delete

Dynamic update of jump table index

Code implementation of hopping table (Java version)

Data structure definition

Search algorithm

Insert and delete algorithm

insert

Delete

What is a jump watch?

The hopping table was invented by William Pugh. In his paper "Skip lists: a probabilistic alternative to balanced trees", he introduced the data structure, insertion and deletion and other operations of the hopping table in detail.

Skip lists are a data structure that can be used in place of balanced trees.Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.

In other words, the jump table can be used to replace the red-black tree, using probability equalization technology to make insert and delete operations easier and faster. Let's first take a look at a picture in the paper:

Observe the image above

A: a linked list that has been sorted. To find a node, you need to compare at most N nodes.

B: add a pointer every 2 nodes to point to the subsequent nodes whose spacing is 2, then finding a node needs to compare at most ceil (Naber 2) + 1 nodes.

C, every 4 nodes add a pointer to the subsequent nodes with a distance of 4, then finding a node needs to compare ceil (Naber 4) + 1 nodes at most.

If every 2 ^ I node has a pointer to subsequent nodes with a spacing of 2 ^ I, the number of comparisons will be reduced to log (N) if the pointer is continuously increased. In that case, the search will be fast, but inserts and deletions will be difficult.

A node with k pointers is called a k-layer level k node. According to the above logic, 50% of the nodes are layer 1, 25% are layer 2, and 12.5% are layer 3. What if the number of layers of each node is randomly selected, but still obeys such a distribution (figure e above, compared with figure d above)?

Make the I pointer of a k-layer node point to the next node of layer I, instead of the second ^ (iMul) node behind it, then the insertion and deletion of the node only need to modify the operation in place; the number of layers of a node is randomly selected when it is inserted and will never change. Because such a data structure is based on linked lists and extra pointers skip intermediate nodes, the authors call it a Skip Lists.

Binary search relies on the random access of arrays at the underlying level, so it can only be implemented in arrays. If the data is stored in a linked list, it is impossible to use binary search?

In fact, only a little modification of the linked list, can support a similar "dichotomy" search algorithm, that is, jump table (Skip list), support fast add, delete, search operations.

The ordered set (Sorted Set) in Redis is implemented with a jump table. We know that red-black trees can also achieve fast insert, delete and find operations. So why doesn't Redis choose the red-black tree to achieve it?

What exactly is the meaning of skipping the meter?

Even if the data stored in the single linked list is orderly, if you search for some data, you can only traverse it from beginning to end. The search efficiency is very low, and the average time complexity is O (n).

The programmer who pursues the extreme begins to think, how can this improve the search efficiency of linked list structure?

As shown in the following figure, create a level of "index" on the linked list, extract a node from every two nodes to the upper level, and call the extracted level as the index or index layer. The down in the figure represents the down pointer, which points to the next node.

For example, to search for 16:

First traversing the index layer, when traversing to 13:00 of the index layer, it is found that the next node is 17, indicating that the target node is between the two nodes.

Then through the down pointer, descend to the surface of the original chain and continue to traverse

At this point, you only need to traverse 2 more nodes to find 16!

In the past, the single linked list structure needed to traverse 10 nodes, but now it only needs to traverse 7 nodes. It can be seen that by adding a layer of index, the number of nodes that need to be traversed is reduced, and the search efficiency is improved.

If you add a layer index, is the search more efficient? Then one node is extracted from every two nodes to the second level index. Now search 16, only need to traverse 6 nodes!

There is not a lot of data here, and you may not feel that the search efficiency ROI is high.

Then the amount of data will become a little larger, there is a 64-node linked list, give it a five-level index.

When there was no index, single-linked list search 62 needed to traverse 62 nodes!

Now? Only need to traverse 11! So now you can see that when the length of the linked list n is very large, the search performance is significantly improved after indexing.

This kind of linked list with multi-level index, which can improve the efficiency of query, is the jump table that has been popular all over the interview circle recently.

As serious programmers, we are curious again.

Search time complexity of hopping table

We all know that the time complexity of single linked list search is O (n). What about such a fast jump table?

If the linked list has n nodes, how many indexes will there be? Assuming that one node is extracted from every two nodes as the parent index, then:

The number of first-level index nodes is nmax 2.

Second stage name4

Level 3 name8

...

The k level is n / (2 ^ k).

Assuming that the index has h level and the highest index has 2 nodes, we can get n / (2h) = 2.

So: h = log2n-1

If the original linked list is included, the height of the entire jump table is log2 n. When we query some data in the jump table, if each layer has to traverse m nodes, then the time complexity of querying a data in the jump table is O (m*logn).

What is the value of this m? According to the previous index structure, we only need to traverse no more than 3 nodes at each level of the index, that is to say, masks 3, why 3? Let me explain.

Suppose the data we are looking for is x. In the k-level index, after traversing to the y node, we find that x is greater than y and smaller than the following node z, so we drop from k-level index to k-1 index through the down pointer of y. In the k-1 index, there are only 3 nodes (including y and z) between y and z, so we only need to traverse 3 nodes at most in the Kmuri 1-level index, and so on, each level index only needs to traverse 3 nodes at most.

From the above analysis, we get massively 3, so the time complexity of querying any data in the jump table is O (logn). The time complexity of this search is the same as that of binary search. In other words, isn't it amazing that we actually implement binary search based on a single linked list? However, there is no such thing as a free lunch, and the improvement of query efficiency depends on the establishment of many levels of indexes, that is, the design idea of space for time, which we talked about in Section 6.

Does the meter skip cost a lot of memory?

Because the jump table has to store multi-level indexes, it is bound to consume more storage space than the single linked list. How much is that?

If the original linked list size is n:

There are about 2 nodes in the first-level index.

The second-level index has about 4 nodes of nram.

...

Two nodes at the last level

The sum of the number of multi-level nodes is:

N/2+n/4+n/8... + 8+4+2=n-2

So the space complexity is O (n). This amount is still quite large, can you slightly reduce the memory space occupied by the index?

What if only one node is extracted to the superior index every three or five nodes?

The first-level index needs about 3 nodes of n _ ramp

The second-level index requires about nine nodes in nUniverse.

At each level up, the number of index nodes is divided by 3

Suppose the number of the most advanced index nodes is 1, and the total number of index nodes is: nUniverse 3 nodes, 9 nodes, 9 nodes, 27 +. + 9+3+1=n/2

Although the space complexity is still O (n), compared with the index construction method of drawing one node for every two nodes above, the storage space of index nodes is reduced by half.

We do not have to pay too much attention to the extra space occupied by the index. in actual development, the objects stored in the original linked list may be very large objects, while the index nodes only need to store key values and a few pointers, and there is no need to store objects. therefore, when the object is much larger than the index node, the extra space occupied by the index can be ignored.

Time complexity of inserts and deletions

Inserting a data into the jump table requires only O (logn) time complexity.

In a single linked list, once the position to be inserted is located, the time complexity of insertion is O (1). But here, in order to ensure the order of the data in the original linked list, it is necessary to find the insertion position first, so the search operation in this process is more time-consuming.

For a simple single linked list, you need to traverse each node to find the insertion location. However, the time complexity of searching for a node in a hopping table is O (logn), so the time complexity of searching where some data should be inserted is also O (logn).

Delete

If this node also appears in the index, delete the node in the index in addition to deleting the node in the original linked list.

Because the single linked list deletion operation needs to get the precursor node of the node to be deleted, and then complete the deletion through the pointer. So when looking for a node to be deleted, be sure to get the precursor node. If it is a two-way linked list, there will be no such problem.

Dynamic update of jump table index

When constantly inserting data into the jump table, if the index is not updated, there may be a lot of data between two index nodes. In extreme cases, the jump table can also degenerate into a single linked list.

As a dynamic data structure, we need some means to maintain the balance between the index and the size of the original linked list, that is, if there are more nodes in the linked list, the index nodes will increase accordingly to avoid complexity degradation and performance degradation of search, insert and delete operations.

Balanced binary trees such as red-black trees and AVL trees keep the size of the left and right subtrees balanced by spinning left and right, while jump tables maintain the "balance" mentioned above through random functions.

When inserting data into a jump table, you can choose to insert this data into a partial index layer at the same time.

So how to choose which index layers to join?

Through a random function to decide which level index to insert this node into, for example, the random function generates the value K, then add the node to the K-level index from the first level to the K level.

Why does Redis use jump tables to achieve ordered collections instead of red-black trees?

The core operations supported by ordered collections in Redis mainly support:

Insert a data

Delete a data

Find a piece of data

Iterative output ordered sequence: the above operations can also be completed by the red-black tree, and the time complexity is the same as the jump table.

Look up the data by interval: the efficiency of the red-black tree is lower than the jump table. The jump table can do O (logn) to locate the starting point of the interval, and then traverse back in the original linked list order.

Besides performance, there are other reasons:

Code implementation is much easier to understand and write than a red-black tree, because simplicity means good readability and is not easy to make mistakes.

Table skipping is more flexible and can effectively balance execution efficiency and memory consumption by changing the index building strategy.

Because red-black trees were born earlier than jump tables, Map types in many programming languages (such as JDK's HashMap) are implemented through red-black trees. Business development, directly from the JDK to use, but the jump table does not have a ready-made implementation, can only be implemented on their own.

Code implementation of hopping table (Java version) data structure definition

The elements in the table are represented by nodes, and the number of layers of nodes is randomly calculated when it is inserted (regardless of the number of nodes already in the table).

A layer I node has I forward pointers (represented by the node object array forward in java), with an index from 1 to I. Use MaxLevel to record the maximum number of layers of the jump table.

The number of layers of the hopping table is the maximum number of layers of all nodes currently (1 if list is empty).

The column header header has a forward pointer from 1 to MaxLevel:

Public class SkipList {/ / maximum number of layers private final int MAX_LEVEL; / / current number of layers private int listLevel; / / header private SkipListNode listHead; / / footer private SkipListNode NIL; / / probability value used to generate randomLevel private final double P; / / the best probability value given in this paper is private static final double OPTIMAL_P = 0.25 Public SkipList () {/ / 0.25,15 this (OPTIMAL_P, (int) Math.ceil (Math.log (Integer.MAX_VALUE) / Math.log (1 / OPTIMAL_P))-1);} public SkipList (double probability, int maxLevel) {P = probability; MAX_LEVEL = maxLevel; listLevel = 1; listHead = new SkipListNode (Integer.MIN_VALUE, null, maxLevel) NIL = new SkipListNode (Integer.MAX_VALUE, null, maxLevel); for (int I = listHead.forward.length-1; I > = 0; iMury -) {listHead.forward [I] = NIL;}} / / inner class class SkipListNode {int key; T value; SkipListNode [] forward Public SkipListNode (int key, T value, int level) {this.key = key; this.value = value; this.forward = new SkipListNode [level];}} search algorithm

Press key to search to find the value that returns the key, and return null if it is not found.

You need to find a specific searchKey by traversing the forward array. Assuming that the key of skip list is in the order from smallest to largest, then start looking for searchKey from the current highest level listLevel of the hopping table. After finding a node that is not less than searchKey in a certain layer, skip to the next layer and continue to look until the bottom. Then according to the next node of the last search stop position, we can judge whether searchKey is in the jump table or not.

The process of finding 8 in the jump table:

Insert and delete algorithm

All through find and search and splice:

Maintain an update array, and at the end of the search, update [I] holds the left node of the node to be inserted / deleted at layer I.

insert

If the key does not exist, insert the key and the corresponding value;. If the key exists, update the value.

If the number of layers of the node to be inserted is higher than the current number of layers of the hopping table listLevel, the listLevel is updated.

Select the number of layers of the node to be inserted: randomLevel:

RandomLevel only depends on the highest number of layers of the jump table and the probability value p.

Another implementation method is that if the generated randomLevel is greater than the number of layers of the current hop table listLevel, then set randomLevel to listLevel+1, which is convenient for future search, which is acceptable in engineering, but also destroys the randomness of the algorithm.

Delete

Delete the specific key and the corresponding value. If the node to be deleted is the node with the highest number of layers in the hopping table, update the listLevel after deletion.

Public class SkipList {/ / maximum number of layers private final int MAX_LEVEL; / / current number of layers private int listLevel; / / header private SkipListNode listHead; / / footer private SkipListNode NIL; / / probability value used to generate randomLevel private final double P; / / the best probability value given in this paper is private static final double OPTIMAL_P = 0.25 Public SkipList () {/ / 0.25,15 this (OPTIMAL_P, (int) Math.ceil (Math.log (Integer.MAX_VALUE) / Math.log (1 / OPTIMAL_P))-1);} public SkipList (double probability, int maxLevel) {P = probability; MAX_LEVEL = maxLevel; listLevel = 1; listHead = new SkipListNode (Integer.MIN_VALUE, null, maxLevel) NIL = new SkipListNode (Integer.MAX_VALUE, null, maxLevel); for (int I = listHead.forward.length-1; I > = 0; iMury -) {listHead.forward [I] = NIL;}} / / inner class class SkipListNode {int key; T value; SkipListNode [] forward Public SkipListNode (int key, T value, int level) {this.key = key; this.value = value; this.forward = new SkipListNode [level];}} public T search (int searchKey) {SkipListNode curNode = listHead; for (int I = listLevel; I > 0; iMae -) {while (curNode.curNode [I] .key

< searchKey) { curNode = curNode.forward[i]; } } if (curNode.key == searchKey) { return curNode.value; } else { return null; } } public void insert(int searchKey, T newValue) { SkipListNode[] update = new SkipListNode[MAX_LEVEL]; SkipListNode curNode = listHead; for (int i = listLevel - 1; i >

= 0; iMel -) {while (curNode.curNode.curd.key)

< searchKey) { curNode = curNode.forward[i]; } // curNode.key < searchKey = 0; i--) { while (curNode.forward[i].key < searchKey) { curNode = curNode.forward[i]; } // curNode.key < searchKey 0 && listHead.forward[listLevel - 1] == NIL) { listLevel--; } } } private int randomLevel() { int lvl = 1; while (lvl < MAX_LEVEL && Math.random() < P) { lvl++; } return lvl; } public void print() { for (int i = listLevel - 1; i >

= 0; iMel -) {SkipListNode curNode = listHead.forward [I]; while (curNode! = NIL) {System.out.print (curNode.key + "- >"); curNode = curNode.forward [I];} System.out.println ("NIL") } public static void main (String [] args) {SkipList sl = new SkipList (); sl.insert (20,20); sl.insert (5,5); sl.insert (10,10); sl.insert (1,1); sl.insert (100,100); sl.insert (80,80); sl.insert (60,60) Sl.insert (30,30); sl.print (); System.out.println ("- -"); sl.delete (20); sl.delete (100); sl.print ();}} "Why Redis uses a jump table instead of a red-black tree to implement SortedSet" ends here. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report