In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
In this article, the editor introduces in detail "what are the functional features of LevelDB", the content is detailed, the steps are clear, and the details are handled properly. I hope this article "what are the functional features of LevelDB" can help you solve your doubts.
Open and close
LevelDB data is stored in a specific directory with many data files, log files, and so on. Use LevelDB API to open this directory, and you get a reference to db. Later, we will use this db reference to perform read and write operations. The following code is pseudocode described by the Java language.
Class LevelDB {
Public static LevelDB open (String dbDir, Options options)
Void close (); / / close the database
}
There are many options to configure when opening the database, such as setting block cache size, compression, etc.
Basic API
LevelDB works like HashMap, but is slightly weaker than HashMap, because the put method cannot return the old value, and the delete operation does not know whether the corresponding key actually exists.
Class LevelDB {
Byte [] get (byte [] key)
Void put (byte [] key, byte [] value)
Void delete (byte [] key)
...
}
Atomic batch processing
For multiple consecutive writes, it is possible that the multiple consecutive writes are only partially completed because of downtime. For this reason, LevelDB provides batch processing capabilities, batch operations are like transactions, and LevelDB ensures that these column operations are executed atomically, either all or not at all.
Class WriteBatch {
Void put (byte [] key, byte [] value)
Void delete (byte [] key)
}
Class LevelDB {
...
Void write (WriteBatch wb)
}
Log file
When we call LevelDB's put method to write data to the library, it records the data in memory and then persists it to disk through some special strategy. There is a problem: if there is a sudden outage, the data that is not written to disk will be lost. Therefore, LevelDB also adopts a strategy similar to the Redis AOF log, which first writes the log of the modification operation to the disk file, and then carries on the actual write operation flow processing.
In this way, even if the outage occurs, the database can be restored through log files when it starts.
Secure write (synchronous write)
Students who know Redis know that its AOF write strategy has a variety of configurations, depending on how often the log files are synchronized to the disk. The higher the frequency, the less data is lost in the event of an outage. The operating system needs disk IO to synchronize dirty data from files in the kernel to disk, which will affect access performance, so it usually does not synchronize too frequently.
LevelDB is similar. If you use the previous non-secure write, although the API call is successful, if you encounter downtime, the corresponding operation log may be lost. So it provides secure write operations at the expense of poor performance.
Class LevelDB {
...
Void putSync (byte [] key, byte [] value)
Void deleteSync (byte [] key)
Void writeSync (WriteBatch wb)
}
There is often a tradeoff between security and performance, so we usually use synchronous writes for a few milliseconds or every number of writes. This allows you to minimize data loss while taking into account write performance.
Concurrence
LevelDB disk files will be placed in a file directory, which contains a lot of related data and log files. It does not support multiple processes to open this directory at the same time to use LevelDB API for read and write access. But LevelDB API supports multi-thread safe reading and writing for the same process. Special locks are used internally in LevelDB to control concurrency operations.
Ergodic
The Key in LevelDB is ordered, neatly arranged from small to large in dictionary order. LevelDB provides traversing the API to access all key-value pairs sequentially, and you can specify that traversal starts in the middle.
Class LevelDB {
...
Iterator scan (byte [] startKey, byte [] endKey, int limit)
}
Snapshot isolation
LevelDB supports multi-threaded concurrent reads and writes, which means that two consecutive reads of the same key may read different data, because the data between the two reads may be modified by other threads. This is called "repeat reading" in database theory. LevelDB provides a snapshot isolation mechanism to ensure that continuous read and write operations are not affected by other thread modification operations within the same snapshot range.
Class Snapshot {
Byte [] get (byte [] key)
Void put (byte [] key, byte [] value)
Void delete (byte [] key)
Void write (WriteBatch wb)
...
Void close (); / / close the snapshot
}
Class LevelDB {
...
Snapshot getSnapshot ()
}
Although the snapshot is magical, its principle is actually very simple, which we will explain later.
Custom Key comparator
LevelDB's key uses lexicographic order by default, but it also provides custom collations. You can customize a sort function to register, such as sorting by number. You must try your best to ensure that the collation remains the same throughout the database life cycle, because sorting affects the storage order of disk key-value pairs, and the disk storage order cannot be changed dynamically.
Options options = new Options ()
Options.comparator = new CustomComparator ()
Db = LevelDB.open ("/ tmp/ldb", options)
Custom comparators are dangerous and should be used with caution. The improper setting of the comparison algorithm will seriously affect the storage efficiency. If you do have to change the collation, you need to plan ahead, and there is a special tip here that you need to know the details of disk storage to understand it, so we'll discuss it in detail later.
Data block
LevelDB disk data is stored as database blocks, and the default block size is 4k. Properly increasing the block size will benefit the efficiency of batch large-scale traversal operations. If random reads are more frequent, the performance of small blocks will be slightly better, which requires us to make a compromise.
Options options = new Options ()
Options.blockSize = 8092
Db = LevelDB.open ("/ tmp/ldb", options)
The block should not be too small less than 1k, nor should it be too large and set to several M. such a radical setting will not bring much improvement to the performance, but will greatly increase the performance fluctuation of the database in different read and write situations. We have to choose the middle way and float around the default block size. Once initialized, the block size cannot be changed again.
Compress
LevelDB disk storage is enabled by default, which is a commonly used Snappy algorithm in the industry. The compression efficiency is very high, so there is no need to worry about performance loss. If you don't want to use compression, you can also turn it off dynamically. Turning off the compression switch usually does not result in a significant performance improvement, so we try not to touch it as much as possible.
Options options = new Options ()
Options.compression = CompressionType.kSnappyCompression
/ / options.compression = CompressionType.kNoCompression; / / turn off compression
Db = LevelDB.open ("/ tmp/ldb", options)
Block caching
LevelDB stores a recent read and write hot data in memory, and if the requested data cannot be found in the hot data, you need to look it up in the disk file, which will greatly reduce the efficiency. In order to reduce the number of searches for disk files, LevelDB increases the block cache and caches the content after decompression of recently frequently used blocks.
Options options = new Options ()
Options.blockCache = LevelDB.NewLRUCache (100 * 1024 * 1024); / / 100m
Db = LevelDB.open ("/ tmp/ldb", options)
Default block caching is not enabled, and options can be set manually when you open the database. Block cache will take up part of the memory, but this usually does not need to be set too large, about 100m is enough, no matter how large the efficiency improvement is not obvious.
You also need to pay attention to the impact of the traversal operation on the cache. In order to prevent the traversal operation from brushing a lot of unpopular data into the block cache, you can set an option fill_cache when traversing, which is used to control whether the data blocks traversed by the disk need to be synchronized to the cache.
Bloom filter
Disk search caused by reading miss in memory is a time-consuming operation. In order to further reduce the number of disk reads, LevelDB adds a Bloom filter to each disk file, which consumes a certain amount of disk space, but in effect, the number of disk reads can be greatly reduced. The data of the Bloom filter is stored behind the data block in the disk file.
LevelDB disk files are stored in layers, it will first go to Level 0 to find, if not find, continue to go to Level 1 to find, all the way to the bottom recursive. So if you find a key that doesn't exist, you need to read a lot of disk files, which can be very time-consuming. The Bloom filter can save you more than 95% of the time spent searching for disk files.
The Bloom filter is similar to a memory Set structure, which stores fingerprint information of all Key within a certain range of specified disk files. When it finds that the fingerprint of a key cannot be found in the Set collection, it can conclude that the key certainly does not exist.
If the corresponding fingerprint can be found in the collection, it is not certain that it exists. Because different Key may generate the same fingerprint, this is the false positive rate of Bloom filter. The lower the misjudgment rate, the more Key fingerprint information is needed, and the more memory space is consumed.
If the Bloom filter can accurately know whether a Key exists, then there will be no miscalculation, and there will be no wasted disk reads. Such an extreme form of Bloom filter is HashSet-all the Key is stored in memory, and of course the memory space is unacceptable.
Options options = new Options ()
/ / the fingerprint size of each key is 10bit
Options.filterPolicy = LevelDB.NewBloomFilterPolicy (10)
Db = LevelDB.open ("/ tmp/ldb", options)
When using Bloom filters, we need to make a compromise between memory consumption and performance. If you want to understand the principle of the Bloom filter in depth, you can take a look at the Redis depth Adventure, which has a separate chapter on the internal principle of the Bloom filter.
The default Bloom filter is not turned on, and you need to set the filter_policy parameter when you open the database to take effect. The Bloom filter is the last stronghold to reduce disk reads. The bitmap data inside the Bloom filter is stored in the disk file, but the use is cached in memory.
Data check
LevelDB has a strict data checking mechanism, which makes the unit of check accurate to 4K bytes of data blocks. Checksums waste a little storage space and computing time, but healthy data can be recovered more accurately in the event of block corruption.
Class LevelDB {
...
Public void static repairDB (String dbDir, Options options)
}
When opening the database, the mandatory verification option is not enabled by default. If it is enabled, an error will be reported when a verification error is encountered. If there is a real problem with the data, LevelDB also provides a method to repair the data repairDB () to help us recover as much data as possible.
After reading this, the article "what are the functional characteristics of LevelDB" has been introduced. If you want to master the knowledge of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.