What is the Bitcask model? 07/04 Update SLTechnology News&Howtos

What is the Bitcask model?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about what the Bitcask model is, many people may not know much about it. In order to make you understand better, the editor summarizes the following content for you. I hope you can get something according to this article.

Bitcask is a log-based hash table structure and key-value storage model, but its simple and efficient design. The following editor will explain what the Bitcask model is.

What is the Bitcask model?

1. Log data file

What is a log type? Is appendonly, all write operations only append but not modify the old data, just like our various server logs. In the Bitcask model, the data file is written to the file with log type, and the file has a certain size limit. When the file size increases to the corresponding limit, a new file will be generated, and the old file will only be read but not written. At any point in time, only one file is writable, which is called activedatafile in the Bitcask model, while the other files that have reached the limit size are called olderdatafile, as shown in the following figure:

The data structure in the file is very simple, it is a data write operation one by one, and each data is structured as follows:

The above data items are the size of the key,value,key, the size of the value, the timestamp (which should be), and the crc check values for the previous items. (the data deletion operation does not delete the old entry, but sets the value to a special value for marking)

The data file contains successive pieces of data in the above format, as shown in the following figure:

Well, the above is a log data file, if the data file continues to save, it will certainly infinitely expand, in order to solve a problem, like other journal storage systems, Bitcask also has a regular merge operation.

Merge operation, that is, scan all the data in the olderdatafile periodically and generate a new datafile (excluding activedatafile because it is still writing). The merge here actually deletes multiple operations of the same key to keep only the latest one. After each merge, the newly generated data file no longer has redundant data.

What is the Bitcask model?

two。 Index data based on hash table

The data files mentioned above are data files, and log-type data files will make our write operations very fast (one of the advantages of log-type data is that the disk is used as a tape, and the efficiency of sequential reading and writing is very high, as can be seen here). It would be very inefficient to look up key values on such log data. So we need to use some methods to improve the search efficiency.

For example, in Bigtable, the bloom-filter algorithm is used to maintain a block of bloom-filter for each data file to determine whether a value is in a data file.

In the Bitcask model, we use another approach, using an indexed data structure based on the hash table.

In the Bitcask model, in addition to the data file stored on disk, there is another piece of data, that is, the hash table stored in memory, the function of the hash table is to quickly locate the location of the value through the key value. The structure of the hash table is roughly shown in the following figure:

The structure corresponding to the hash table includes three pieces of information used to locate the data value, namely, the file id number (file_id), the location of the value value in the file (value_pos), and the size of the value value (value_sz), so we get the value we need by reading the value_sz bytes of the value_pos of the corresponding file. The whole process is shown in the following figure:

Because of the existence of one more hash table, our write operation needs to update one more piece of content, that is, the correspondence of the hash table. So a write operation requires a sequential disk write and a memory operation.

3. Useful hintfile

At this point, the Bitcask model is basically complete, and the hintfile mentioned in this section is a useful technique, which I think is not necessarily a necessary feature of the Bitcask model.

We can know from the above that the hash table, which we call the index, is stored in memory. Although some persistence guarantees can be made in their respective implementations, the Bitcask model does not guarantee that the data of the hash table will not be lost after power outage or restart.

So, if we don't do extra work, when we start up to rebuild the hash table, we need to scan our data file throughout, which can be a very time-consuming process if the data file is large. Therefore, the Bitcask model includes a section called hintfile, which is designed to speed up the reconstruction of hash tables.

We mentioned above that when olddatafile performs merge operations, a new datafile will be generated, and the Bitcask model actually encourages the generation of a hintfile. The data structure of each item in this hintfile is very similar to the data structure in datafile, except that it does not store specific values, but the location where the value is stored (as in the hash table), as shown below:

In this way, when you rebuild the hash table, you no longer need to scan all the datafile files, but only need to read and rebuild the data in the hintfile row by row. It greatly improves the speed of restarting the database by using data files.

After reading the above, do you have any further understanding of what the Bitcask model is? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.