Talk about B + Tree 07/06 Update SLTechnology News&Howtos

Talk about B + Tree

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The B+ Tree data structure (From Wikipedia).

1. The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context - in particular, filesystems.

2. B+ trees have very high fanout(number of pointers to child nodes in a node, typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.

For point 2, see the diagram below. Each node contains pointers to the next level. The more pointers, the lower the tree height, i.e., the fewer I/Os are required to retrieve data from a block device (usually a disk).

In MySQL, different storage engines use the B+ Tree data structure to form their own way of storing data. For the InnoDB storage engine, it is the storage method of Clustered index(called index-organized table in Oracle). In MyISAM storage engine, this is how heap tables are stored. The following figure can be a more intuitive reflection of the organization of both data.

In the cluster index of the upper left graph,

a. Nonleaf nodes store, .

b. The leaf node stores, line by line, records.

in the secondary index of the lower left figure,

a. Nonleaf nodes store, .

b. The leaf node stores, .

In the index structure on the right,

a. Nonleaf nodes store, .

b. The leaf node stores a, which points to the record.

Let's take a look at how the efficient retrieval and high fanout features of the B+ Tree data structure are reflected in the InnoDB storage engine. Take the upper left figure as an example. Suppose Bigint data type (8Bytes) is used as the primary key, a record size is 400Bytes, Page size is 16K, and the index tree height is 1, 2, and 3. How many records are stored (Note, Pointer size is 6Bytes).

Now ordinary SAS disk, one second can also complete 200 times I/O, from tens of millions of data, retrieve a record, as long as 3 times I/O, that is, 0.015 seconds on the line, visible high efficiency, coupled with the current general use of SSD disk, at least 50 times faster.

Finally, look at the pros and cons of the two data storage methods.

1. Observe the second picture. When retrieving data using the secondary index in the InnoDB storage engine, because the leaf node stores the primary key, you have to check the cluster index when obtaining the Primary key, that is, the table return operation, in order to obtain the record. In MyISAM storage engine, primary index and secondary index have the same status (except that the primary index value is not empty), and there is no need to return to the table when retrieving data. Perhaps MyISAM storage engine is better suited for queries from this point on.

2. For DML operations, a record changes from 400Bytes to 600 bytes. If it cannot be updated in-place, the index leaf node stores a pointer to the record in MyISAM storage engine, which changes more than InnoDB storage engine. Perhaps the InnoDB storage engine is better suited for change from this point on. Of course, both leave holes in the Page to prevent the effects of non-in-place updates.

If interested, follow the subscription number DB Best Practice.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.