Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the reason for using B-tree in MongoDB

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article is to share with you about the reasons for the use of B-tree in MongoDB, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

MongoDB is a general-purpose, document-oriented distributed database [^ 1], which is an official introduction to MongoDB. Different from the traditional relational database MySQL, Oracle and SQL Server,MongoDB, one of the most important features is "document-oriented". Due to the different ways of data storage, the external interface is no longer the well-known SQL, so it is divided into NoSQL,NoSQL relative to SQL, and many storage systems that we are familiar with are divided into NoSQL, such as Redis, DynamoDB [^ 2] and Elasticsearch.

Sql-and-nosq

NoSQL is often understood as there is no SQL (Non-SQL) or non-relational (Non-Relational) [^ 3], but some people understand it as not just SQL (Not Only SQL) [^ 4], digging into the meaning and origin of the word may not have much meaning, this second interpretation is often for marketing services, we just need to know that MongoDB stores data in a completely different way from traditional relational databases.

The architecture of MongoDB is very similar to that of MySQL. They both use pluggable storage engines to meet the different needs of users. Users can choose different storage engines according to data characteristics. The latest version of MongoDB uses WiredTiger as the default storage engine [^ 5].

Mongodb-architecture

As the default storage engine for MongoDB, WiredTiger uses B-tree as the underlying data structure for indexing, but in addition to B-tree, it also supports LSM tree as an optional underlying storage structure. The full name of LSM tree is Log-structured merge-tree. You can create a collection based on LSM tree (Collection) [^ 6] in MongoDB using the following command:

Db.createCollection ("posts", {storageEngine: {wiredTiger: {configString: "type=lsm"})

In this article, we will not only introduce why MongoDB's default storage engine WiredTiger chooses to use B-tree instead of B + tree, but also compare the performance and application scenarios between B-tree and LSM tree to help readers understand today's problems in a more comprehensive way.

Design

Now that we want to compare the differences between two different data structures and B-trees, here we will explain why B+ trees and LSM trees have not become the default data structures for WiredTiger in two sections:

As a non-relational database, MongoDB does not need to traverse data as strongly as a relational database. It pursues the performance of reading and writing individual records.

Most OLTP databases are faced with the scenario of reading more and writing less. B-tree and LSM-tree have greater advantages in this scenario.

Both of the above scenarios need to be faced and solved by MongoDB, so we will compare different data structures in these two common scenarios.

Non-relational type

In fact, we have mentioned many times that MongoDB is a non-relational document database. After abandoning the relational database system completely, it is very free in design and implementation. It no longer needs to follow the system of SQL and relational database, and it is more free to optimize specific scenarios, while traversing data in MongoDB hypothetical scenarios is not a common requirement.

Mysql-innodb-b-plus-tree

The B + tree is used in MySQL because only the leaf nodes of the B + tree store data, and each leaf node in the tree can be traversed sequentially by connecting each leaf node through a pointer, and traversing data is very common in relational databases, so there is no problem with this choice [^ 7].

The ultimate goal of MongoDB and MySQL choosing between multiple different data structures is to reduce the number of random IO required by queries. MySQL believes that queries traversing data are common, so it chooses B+ tree as the underlying data structure instead of storing data through non-leaf nodes, but MongoDB faces different problems:

Mongodb-wiredtiger-b-tree

Although queries traversing data are relatively common, MongoDB believes that querying a single data record is far more common than traversing data. Because non-leaf nodes of B-tree can also store data, the average number of random IO needed to query a piece of data will be less than that of B + tree, and MongoDB using B-tree will query faster than MySQL in similar scenarios. This is not to say that MongoDB cannot traverse the data. We can also use range in MongoDB to query a batch of records that meet the corresponding criteria, but it will take longer than MySQL.

SELECT * FROM comments WHERE created_at > '2019-01-01'

Many people who see queries that traverse data may think of scope queries as shown above, but what is more common in relational databases is the SQL shown below-- query foreign keys or all records where a field equals a value:

SELECT * FROM comments WHERE post_id = 1

The above query is not a scope query, it does not use >, < and other expressions, but it will query a series of records in the comments table. If there is an index post_id on the comments table, then the query may traverse the corresponding index in the index to find the comment that meets the conditions. This query will also benefit from the interconnected leaf nodes of the MySQL B+ tree, because it can reduce the number of random IO on disks.

MongoDB, as a non-relational database, uses a completely different approach to the design of collections. If we still use the table design idea of traditional relational databases to think about the design of collections in MongoDB, writing queries similar to those shown above will bring relatively poor performance:

Db.comments.find ({post_id: 1})

Because all nodes of the B-tree can store data, and there is no good way to connect successive nodes through pointers, the performance of the above query in the B-tree will be much worse than that of the B + tree, but this is not a recommended design method in MongoDB. It is more appropriate to use embedded documents to store post and all the comments that belongs to it:

{"_ id": "...", "title": "Why does MongoDB use B-tree", "author": "draven", "comments": [{"_ id": "...", "content": "you can't write this"} {"_ id": "...", "content": "the first floor is right"}]}

When using the above method to store data, we will not encounter queries like db.comments.find ({post_id: 1}). We just need to take out the post and get all the relevant comments. This way of design, which is different from the traditional relational database, needs to be reconsidered by all developers using MongoDB. This is also the biggest reason why many people use MongoDB only to find that the performance is not as good as MySQL-- using the wrong posture.

Some readers may wonder, since MongoDB believes that querying individual data records is far more common than queries that traverse data, why not use hashes as the underlying data structure?

Datastructures-and-query

If we use hashing, the complexity of the query for all individual records will be O (1), but the complexity of traversing data will be O (n). If you use B + tree, the complexity of single record query is O (log n), and the complexity of traversing data is O (log n) + X. one of these two different data structures provides the best performance of single record query, and the other provides the best performance of traversing data, but this does not satisfy the scenario faced by MongoDB-single record query is very common. However, relatively good performance support is also needed for traversing data. Hash, which is a data structure with extreme performance, can only be used in simple and extreme scenarios.

Read more and write less

The LSM tree is a disk-based data structure designed to provide a low-cost indexing mechanism for files that require high-frequency writes for a long time [^ 8]. Whether it is B-tree or B + tree, random disk writes are needed to write records to the index files composed of these data structures. The optimization logic of LSM tree is to convert random writes into sequential writes to optimize data writing at the expense of part of the read performance.

We will not elaborate on why LSM trees have good write performance in this article, but we will just analyze why WiredTiger uses B-trees as the default data structure. WiredTiger conducted a benchmark test on the read and write throughput of LSM tree and B tree [^ 9]. Through the benchmark test, we get the results shown in the following figure, from which we can find:

LSM_btree_Throughput

Without restricting writes

The write performance of LSM tree is 1.5 ~ 2 times higher than that of B tree.

The read performance of LSM tree is from 1 to 3 of B tree.

In the case of write restrictions

The write performance of LSM tree is almost the same as that of B tree.

The read performance of the LSM tree is 1 to 2 of the B tree.

In the case of write restriction, 30000 pieces of data are written per second, and from the analysis results here, the read performance of B-tree is much better than that of LSM-tree in either case. For most OLTP systems, queries are many times more written, so the excellent performance of the LSM tree in writing doesn't make it the default data format for MongoDB.

These are the reasons for the use of B-tree in MongoDB. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report