In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains the basic principles and applications of LSM-tree. The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn the basic principles and applications of LSM-tree.
LSM-tree is very common in NoSQL systems and has basically become a must-have option. Today, let's introduce the main ideas of LSM-tree, and give an example of LevelDB.
LSM-tree
Originated in 1996 a paper "The Log-Structured Merge-Tree (LSM-Tree)", this paper 32 pages, I have not read, the study of LSM is basically from the background knowledge of the top paper and open source system documentation. Today's content and pictures are mainly from FAST'16 's "WiscKey: Separating Keys from Values in SSD-conscious Storage".
First look at the name, log-structured, log structure, the log is typed by the software system, just like people keep a diary, write down page by page, and the system writes the log will not be wrong, so there is no need to change, just add after it. The pre-write logs of various databases are also appended, so the log structure basically refers to appending. Notice that he is also a "Merge-tree", that is, "merge-tree", merging is to combine multiple into one.
All right, let's cut the bullshit and get to the text.
LSM-tree is specially designed for key-value storage systems. Key-value type storage systems have two main functions, put (kline v): write a (kline v), get (k): given a k to find v.
The most important feature of LSM-tree is its fast writing speed, which mainly makes use of the sequential writing of the disk, which competes for the B-tree that needs to be written randomly. About the order and random writing of disks, please refer to "various Concepts of hard disk".
The following figure is part of LSM-tree, which is a multi-tier structure, just like a tree, small at the top and large at the bottom. The first is the C0 layer of the memory, which holds all the recently written (kjournal v). This memory structure is orderly and can be updated in place at any time, while supporting query at any time. The remaining C1 to Ck layers are on disk, and each layer is an ordered structure on key.
Writing process: a put (KQuery v) operation comes, first appended to the pre-write log (Write Ahead Log, that is, the log actually written before), and then added to the C0 layer. When the data of layer C0 reaches a certain size, layer C0 and layer C1 are merged, similar to merging and sorting, and this process is called Compaction (merging). The merged new new-C1 will write to the disk sequentially, replacing the original old-C1. When C1 layer reaches a certain size, it will continue to merge with the lower layer. After the merger, all the old files can be deleted, leaving the new ones.
Note that data writes may be duplicated, and the new version needs to overwrite the old version. What is a new version? I will write it first (axi1) and then (axiom 233). 233 is the new version. If the old version of a has already reached the Ck layer, at this time there is a new version of the C0 layer. At this time, we will not care whether there is an old version of the file below. The cleaning of the old version is done during the merge.
The writing process basically uses only the memory structure, and Compaction can be completed asynchronously in the background without blocking writes.
Query process: in the writing process, you can see that the latest data is in the C0 layer, and the oldest data is in the Ck layer, so the query is also to check the C0 layer first, if there is no k to check, then check C1, layer by layer.
A query may require multiple single-point queries, slightly slower. Therefore, LSM-tree is mainly aimed at writing-intensive scenarios with a small number of queries.
LSM-tree is used in a variety of key-value databases, such as LevelDB,RocksDB, and the distributed row storage database Cassandra also uses LSM-tree 's storage architecture.
LevelDB
In fact, just looking at the above model is still a bit of a problem, such as after merging C0 and C1, what about the new writes? In addition, every time to merge C0 and C1, this background arrangement is also very troublesome. Take LevelDB as an example to see how the actual system uses the idea of LSM-tree.
The following figure shows the architecture of LevelDB. First, the LSM-tree is divided into three files, the first is two memtable in memory, one is the normal memtable that receives write requests, and the other is the immutable immutable memtable.
The other part is the SStable (Sorted String Table) on disk, the ordered string table, which is the key of the data. SStable has a total of seven layers (L0 to L6). The total size limit of the next layer is 10 times that of the previous layer.
Write process: first add the write operation to the pre-write log, and then write the data to the memtable. When the memtable is full, switch the memtable to an immutable immutable memtable, and open a new memtable to receive new write requests. And this immutable memtable can swipe the disk. Here the disk is directly brushed into the L0 layer of the SSTable file, and not directly merged with the L0 layer file.
The total size of all files on each layer is limited, with each layer ten times larger. Once the total size of a layer exceeds the threshold, select a file to merge with the files of the next layer. Just like playing 2048, every merge can be triggered, which is the best in 2048, but it is troublesome in the system, because there is a lot of data to be flipped around, but it is not a bad thing, because it can speed up the query.
Note here that all files affected by the next layer will participate in the Compaction. After the merge, ensure that the data of each layer from L1 to L6 is globally ordered on the key. The L0 layer can overlap.
The above figure is an example. After an immutable memtable is brushed to the L0 layer, the merge of L0 and L1 is triggered. If the yellow file is related to this merge, the L0 layer is deleted, the L1 layer is updated, the L1 layer is still globally ordered, and the data order of the three files is abcdef.
Although multiple files in the L0 layer are in the same layer, they are also related in order, and the data of the same key will also overwrite the previous one. How can you tell the difference here? Add a version number to each key-value. So only the latest version should be left on Compaction.
Query process: first check memtable, then check immutable memtable, and then check all the files on the L0 layer, and look down on the last layer.
LSM-tree read and write magnification
Read-write magnification (read and write amplification) is the main problem of LSM-tree, which is defined as: read-write magnification = the actual amount of data read and written on disk / the amount of data that users need. Note that it is the amount of data that interacts with the disk, and it doesn't matter how many times this data has been calculated in memory. For example, the user was supposed to write 1KB data, but as a result, you calculated it in memory for an hour, and finally you wrote the 10KB data to disk. The write magnification is 10, and the reading is similar.
Write magnification: let's take RocksDB's Level Style Compaction mechanism as an example, this merging mechanism takes all the files in the upper layer and merges the next layer, and the size of the lower layer is r times larger than that of the previous layer. In this way, the write magnification of a single merge is r times, and whether it is r times or 1 times is related to the specific implementation. Let's give an example.
If there are three layers now, the file sizes are: 9, 9, 90, 90, 900, and 10. Write another 1, at this time will continue to merge, 1, 9, 10, 10, 10, 90, 100, 100, 900, 1000. A total of 10 '100' 1000 was written. Normally, the magnification should be 1110 level 1, but this is not what is said in various papers. What is said in the paper is that the sum on the right side of the equal sign is greater than the left sum of the plus sign, that is, 10pm 1 + 100max 10 + 1000max 100 = 30 = r * prime. I feel that writing magnification is a process, it is not accurate to measure with a number, and this is only the worst-case scenario.
Read magnification: to query the data of an 1KB. At worst, you need to read 8 files in L0 layer, and then read each file from L1 to L6, a total of 14 files. Inside each file, you need to read 16KB's index, 4KB's Bloom filter, and 4KB's data blocks (it doesn't matter if you don't understand, as long as you know how to look up a key from a SSTable, you need to read so much). A total of 24,14ppm, 1x and 336 times. The smaller the key-value, the larger the magnification.
Summary
The content of LSM-tree and the design idea of LevelDB are introduced, which mainly includes three parts: WAL,memtable,SStable before writing. Merge layer by layer and search layer by layer. The main disadvantage of LSM-tree is read-write amplification, which can be reduced by some other strategies.
Thank you for your reading, the above is the content of "the basic principles and applications of LSM-tree". After the study of this article, I believe you have a deeper understanding of the basic principles and applications of LSM-tree, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.