LevelDB code up! 07/09 Update SLTechnology News&Howtos

LevelDB code up!

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The general principles of LevelDB are over, and in this section we will personally use the Java language third-party library leveldbjni to practice the various features of LevelDB. This library uses Java Native Interface counting to wrap the LevelDB implemented by C++ into the API of the Java platform. Other languages also use JNI-like technology to package LevelDB.

Maven dependence

Download the dependency package address below and you will get a platform-wide jar package. LevelDB compiles different forms of dynamic link libraries on different operating system platforms, and this jar package includes dynamic link libraries for all platforms.

Org.fusesource.leveldbjni

Leveldbjni-linux64

1.8

Add, query and delete API

In this example, we will automatically create a LevelDB database, then insert 100w pieces of data into it, take it out, and delete all the data. This example will run on my Mac for about 10 seconds. In other words, the average QPS of reading and writing is as high as 30w/s, which is completely comparable to Redis, but this is probably because the key-value pairs are relatively small, so the performance should not be so high in the actual production environment, and it should at least be a little slower than Redis.

Import static org.fusesource.leveldbjni.JniDBFactory.factory

Import java.io.File

Import java.io.IOException

Import org.iq80.leveldb.DB

Import org.iq80.leveldb.Options

Public class Sample {

Public static void main (String [] args) throws IOException {

Options options = new Options ()

Options.createIfMissing (true)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + I) .getBytes ()

Byte [] value = new String ("value" + I) .getBytes ()

Db.put (key, value)

}

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + I) .getBytes ()

Byte [] value = db.get (key)

String targetValue = "value" + I

If (! new String (value) .equals (targetValue)) {

System.out.println ("something wrong!")

}

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + I) .getBytes ()

Db.delete (key)

}

} finally {

Db.close ()

}

Let's take a look at the directory of the database, and LevelDB creates those things.

We see a lot of files with sst extensions in this directory, which are LevelDB disk data files, and some LevelDB versions of data files have the extension ldb, but the internal format is the same, but the extension is different. There is also a file with the log extension, which is the operation log file, which records the operation log of the recent period of time. We do not need to understand the other several larger name files first, and we will explain them in detail later.

Delete all the files in this directory and the library will be completely empty.

You might think that not all the data in the above example was eventually deleted, so why are there so many sst data files? This is because the delete operation of LevelDB does not really delete the key-value pair immediately, but converts the delete operation into an update operation and writes in a special key-value pair, the value part of which is a special delete tag.

The corresponding key-value pair will not be deleted until LevelDB triggers data merging (compact) under certain conditions.

Data merging

LevelDB provides a manual call to API for data merging. Let's sort it out manually and see what happens after it is sorted out.

Public void compactRange (byte [] begin, byte [] end)

The API can be sorted out in a range. If the begin parameter is null, it means starting from the beginning, and if the end parameter is null, it means going all the way to the tail.

Public static void main (String [] args) throws IOException {

Options options = new Options ()

Options.createIfMissing (true)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

Db.compactRange (null, null)

} finally {

Db.close ()

}

After running for more than 1 second, we can see that the sst file in the directory is gone.

If we do not manually call the data collation API,LevelDB, there is also a certain strategy to organize it on a regular basis.

Read performance

If we add time to the above code and observe the difference in read and write performance, you will find an interesting phenomenon that write performance is better than read performance, even though all reads in this case are hit.

Put: 3150ms

Get: 4128ms

Delete: 1983ms

This is because the write operation records the operation log and writes the data back to memory, while the read operation may read miss in memory and then go to disk to read. The gap between read and write performance is not very obvious because LevelDB caches the last read block, and my PC's disk is SSD disk, which has good read performance. If it is a regular disk, you will see a significant performance difference.

Next, if we change the read operation to random reading, we will find that there is a great difference in read and write performance.

For (int I = 0; I < 1000000; iTunes +) {

Int index = ThreadLocalRandom.current () .nextInt (1000000)

Byte [] key = new String ("key" + index) .getBytes ()

Db.get (key)

}

Put: 3094ms

Get: 9781ms

Delete: 1969ms

At this point, block caching can be used to improve read performance.

/ / set 100m block cache

Options.cacheSize (100 * 1024 * 1024)

Put: 2877ms

Get: 4758ms

Delete: 1981ms

Synchronous vs async

We mentioned in the previous section that LevelDB also provides API for synchronous writes, ensuring that the put method does not return until the operation log is landed. Its performance will be significantly weaker than that of normal write operations, so let's compare the performance differences between the two.

Public static void main (String [] args) throws IOException {

Long start = System.currentTimeMillis ()

Options options = new Options ()

Options.createIfMissing (true)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + I) .getBytes ()

Byte [] value = new String ("value" + I) .getBytes ()

WriteOptions wo = new WriteOptions ()

Wo.sync (true)

Db.put (key, value, wo)

}

} finally {

Db.close ()

}

Long end = System.currentTimeMillis ()

System.out.println (end-start)

}

The synchronous write operation above took more than 90 seconds. After the sync option is removed, it only takes more than 3 seconds. The performance gap is as high as 30 times. Let's simply modify the above code to write synchronously at intervals, that is, to synchronize every N writes, taking N = 100.

WriteOptions wo = new WriteOptions ()

Wo.sync (I% 100 = = 0)

The run time has become less than 5s. Then change N to 10, and the running time is less than 12s. Even at 12s, the average QPS written is as high as 8w/s, which is still very objective.

General write VS batch write

LevelDB provides bulk write operations, will it be similar to Redis pipeline to speed up the execution of instructions? let's try to use WriteBatch to compare normal write operations to see how big the performance gap is.

Public static void main (String [] args) throws IOException {

Long start = System.currentTimeMillis ()

Options options = new Options ()

Options.createIfMissing (true)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

WriteBatch batch = db.createWriteBatch ()

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + I) .getBytes ()

Byte [] value = new String ("value" + I) .getBytes ()

Batch.put (key, value)

If (I% 100 = = 0) {

Db.write (batch)

Batch.close ()

Batch = db.createWriteBatch ()

}

Db.write (batch)

Batch.close ()

} finally {

Db.close ()

}

Long end = System.currentTimeMillis ()

System.out.println (end-start)

}

The batch number N is changed to 10,100,100 and 1000 respectively. After running, it can be found that the time consuming is about the same, which is more than 2 seconds. This means that bulk writes do not significantly increase the throughput of write operations. However, after changing N to 1, you will find that the time-consuming is about the same as the ordinary write operation, which is about 3 seconds. If you change N to 2, 5, etc., the time-consuming will be reduced, and it will be stable after more than 2 seconds. At this time, raising the N value will no longer have an obvious effect. This means that bulk writes are indeed a little faster than normal writes, but the difference is not too great. Unlike the pipeline of Redis, it can greatly reduce the significant performance improvement caused by network overhead. LevelDB is a pure memory database, not to mention network overhead at all.

So why is mass writing still faster than ordinary writing? To answer this question, you need to track the source code of LevelDB, and the logic in this part is relatively simple, and everyone should be able to understand it, so it is posted here directly.

Status DB::Put (WriteOptions& opt, Slice& key, Slice& value) {

WriteBatch batch

Batch.Put (key, value)

Return Write (opt, & batch)

}

Obviously, every normal write operation will eventually be converted into a batch write operation, but Number1. This explains why bulk writes are pretty much the same as normal writes when Number1 is used.

As we continue to trace the source code of WriteBatch, I find that every bulk write operation requires the use of mutexes. When the batch N value is relatively large, the average number of locks is reduced, so the overall performance is improved. But it will not improve too much, because the cost of locking itself is not very large. This also means that write performance degrades in multithreaded situations because competition between locks will lead to increased internal friction.

Why does bulk write guarantee the atomicity of a series of internal operations? it is because the protection of this mutex makes the write operation single-threaded. Because of this coarse-grained lock, the performance of LevelDB writes is greatly limited. This has also become the key optimization direction of RocksDB, which comes from behind.

Snapshots and traversal

LevelDB provides snapshot reading function to ensure that the data read by the same Key in the same snapshot is consistent, so as to avoid the occurrence of "non-repeatable reading". Let's use the snapshot to try the traversal operation, and modify the value of the corresponding Key in the traversal process to see if snapshot reads can isolate write operations.

Public static void main (String [] args) throws IOException {

Options options = new Options ()

Options.createIfMissing (true)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

For (int I = 0; I < 10000; iTunes +) {

String padding = String.format ("d", I)

Byte [] key = new String ("key" + padding) .getBytes ()

Byte [] value = new String ("value" + padding) .getBytes ()

Db.put (key, value)

}

Snapshot ss = db.getSnapshot ()

/ / scan

Scan (db, ss)

/ / modify

For (int I = 0; I < 10000; iTunes +) {

String padding = String.format ("d", I)

Byte [] key = new String ("key" + padding) .getBytes ()

Byte [] value = new String ("! value" + padding). GetBytes (); / / modify

Db.put (key, value)

}

/ / rescan

Scan (db, ss)

Ss.close ()

} finally {

Db.close ()

}

Private static void scan (DB db, Snapshot ss) throws IOException {

ReadOptions ro = new ReadOptions ()

Ro.snapshot (ss)

DBIterator it = db.iterator (ro)

Int k = 0

/ / it.seek (someKey); / / start traversing from the specified location

It.seekToFirst (); / / traverse from scratch

While (it.hasNext ()) {

Entry entry = it.next ()

String key = new String (entry.getKey ())

String value = new String (entry.getValue ())

String padding = String.format ("d", k)

String targetKey = new String ("key" + padding)

String targetVal = new String ("value" + padding)

If (! targetKey.equals (key) | |! targetVal.equals (value)) {

System.out.printf ("something wrong")

}

Kicking +

}

System.out.printf ("total% d\ n", k)

It.close ()

}

Total 10000

The data obtained from the snapshot before and after the two traverses are the same, that is, the write operation in the middle does not affect the state of the snapshot at all, which is what we want. What is the principle of the snapshot?

The principle of snapshot is actually very simple, simple enough to make people doubt life. For each key-value pair in the library, it has multiple versions of values as a result of the modification operation. Before the database file contents are merged, the same Key may exist in multiple files, each with a different version of the value. This version number is marked by a unique global self-increment value in the database. The snapshot records the current count value, and the data read in the current snapshot needs to be compared with the snapshot count value. Only less than this count value is a valid data version.

Since there are multiple versions of data in the same Key, how can traversal operations avoid repetition for the same Key? We will discuss this issue in depth later.

Bloom filter

Leveldbjni does not encapsulate the Bloom filter function provided by LevelDB. So to try the effect of the Bloom filter, we need to try other languages, here I use the levigo library of the Go language.

/ / install leveldb and snappy libraries

$brew install leveldb

/ / reinstall levigo

$go get github.com/jmhodges/levigo

In this example, we will write more data-1000W, and as the amount of data increases, the LevelDB will form a deeper hierarchy. At the same time, in order to construct the effect of reading miss, we write even key-value pairs, and then randomly read odd-numbered key-value pairs. Then compare the difference of reading performance before and after adding Bloom filter.

Package main

Import (

"fmt"

"math/rand"

"time"

"github.com/jmhodges/levigo"

)

Func main () {

Options: = levigo.NewOptions ()

Options.SetCreateIfMissing (true)

/ / each key occupies 10 bit

/ / options.SetFilterPolicy (levigo.NewBloomFilter (10))

Db, _: = levigo.Open ("/ tmp/lvltest", options)

Start: = time.Now () .UnixNano ()

For I: = 0; I < 1000000; iTunes + {

Key: = [] byte (fmt.Sprintf ("key%d", iTun2))

Value: = [] byte (fmt.Sprintf ("value%d", iTun2))

Wo: = levigo.NewWriteOptions ()

Db.Put (wo, key, value)

}

Duration: = time.Now () .UnixNano ()-start

Fmt.Println ("put:", duration/1e6, "ms")

Start = time.Now () .UnixNano ()

For I: = 0; I < 1000000; iTunes + {

Index: = rand.Intn (10000000)

Key: [] byte (fmt.Sprintf ("key%d", index*2+1))

Ro: = levigo.NewReadOptions ()

Db.Get (ro, key)

}

Duration = time.Now () .UnixNano ()-start

Fmt.Println ("get:", duration/1e6, "ms")

Start = time.Now () .UnixNano ()

For I: = 0; I < 1000000; iTunes + {

Key: = [] byte (fmt.Sprintf ("key%d", iTun2))

Wo: = levigo.NewWriteOptions ()

Db.Delete (wo, key)

}

Duration = time.Now () .UnixNano ()-start

Fmt.Println ("get:", duration/1e6, "ms")

}

Put: 61054ms

Get: 104942ms

Get: 47269ms

Then remove the comments, open the Bloom filter, and observe the results.

Put: 57653ms

Get: 36895ms

Get: 57554ms

It is clear that read performance has improved by a factor of three, which is a remarkable performance improvement. With the Bloom filter turned on in the read miss, let's try to open the block cache again to see if we can continue to improve read performance.

Put: 57022ms

Get: 37475ms

Get: 58999ms

The conclusion is that block caching hardly works when reading the miss with the Bloom filter turned on. But this is not to say that block caching is useless, in the case of a hit, the role of block cache is still very important.

Bloom filters not only significantly improve performance, but also need to waste a certain amount of disk space. LevelDB needs to store the binary data of the Bloom filter in the data block, but the space share of the Bloom filter is not very high and is well within the acceptable range.

Compress

The compression algorithm of LevelDB uses Snappy, which has high decompression efficiency and low CPU consumption when the compression ratio is small. Officially, it is not recommended to turn off the compression algorithm, but my tests have found that turning off compression can significantly improve read performance. However, turning off compression also means that your disk space will be wasted several times, which is not cheap.

Public static void main (String [] args) throws IOException {

Options options = new Options ()

Options.createIfMissing (true)

Options.compressionType (CompressionType.None)

DB db = factory.open (new File ("/ tmp/lvltest"), options)

Try {

Long start = System.currentTimeMillis ()

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + 2 * I) .getBytes ()

Byte [] value = new String ("value" + 2 * I) .getBytes ()

Db.put (key, value)

}

Long duration = System.currentTimeMillis ()-start

System.out.printf ("put:%dms\ n", duration)

Start = System.currentTimeMillis ()

For (int I = 0; I < 1000000; iTunes +) {

Int index = ThreadLocalRandom.current () .nextInt (1000000)

Byte [] key = new String ("key" + (2 * index + 1)) .getBytes ()

Db.get (key)

}

Duration = System.currentTimeMillis ()-start

System.out.printf ("get:%dms\ n", duration)

Start = System.currentTimeMillis ()

For (int I = 0; I < 1000000; iTunes +) {

Byte [] key = new String ("key" + 2 * I) .getBytes ()

Db.delete (key)

}

Duration = System.currentTimeMillis ()-start

System.out.printf ("delete:%dms\ n", duration)

} finally {

Db.close ()

}

Put:3785ms

Get:6475ms

Delete:1935ms

Next, let's turn on compression, and compare the results, and the read performance gap is nearly doubled.

Options.compressionType (CompressionType.SNAPPY)

Put:3804ms

Get:11644ms

Delete:2750m

The next section will delve into the principle of LevelDB implementation, starting with the macro structure of LevelDB.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.