Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How much row data is stored in a B+ tree in InnoDB

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how many rows of data are stored in a B+ tree in InnoDB". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. InnoDB how many rows of data can a B+ tree hold?

InnoDB how many rows of data can a B+ tree hold? The simple answer to this question is: about 20 million. Why so many? Because this can be calculated, to figure out this problem, let's start with the InnoDB index data structure and the way the data is organized.

We all know that when computers store data, they have the smallest storage unit, which is like the smallest unit of cash flow we have today. In the computer, the smallest unit of disk data storage is the sector, the size of a sector is 512 bytes, and the file system (such as XFS/EXT4) his smallest unit is the block, the size of a block is 4k, and for our InnoDB storage engine also has its own minimum storage unit-Page, the size of a page is 16K.

Second, the following pictures can help you understand the minimum storage unit:

The size of a file in a file system is only 1 byte, but it has to take up 4KB space on disk.

All the data files of innodb (files with the suffix ibd) are always an integer multiple of 16384 (16k).

Disk sectors, file systems, and InnoDB storage engines all have their own minimum storage units.

In MySQL, the default size of our InnoDB page is 16k, which can also be set by parameter:

The data in the data table is stored in the page, so how many rows of data can be stored in a page? Assuming that the size of a row of data is 1k, then a page can hold 16 rows of such data.

If the database is only stored in this way, then how to find the data becomes a problem, because we do not know which page the data we are looking for is in, and it is impossible to traverse all the pages, which is too slow. So people came up with a way to organize the data in a B+ tree way. As shown in the figure:

We first sort the data records by the primary key and store them in different pages (in order to understand that we only store 3 records in a page here, the actual situation can store a lot of). In addition to the page that stores the data, there are pages that store key values + pointers, such as the page number=3 page in the figure, which stores key values and pointers to the data page. Of course, it is also in order. This form of data organization is called an index organization table. Now let's take a look. If you want to find a piece of data, how to check it?

Such as select * from user where id=5

Here id is the primary key, we look through this B+ tree, first find the root page, how do you know where the root page of the user table is? In fact, the root page location of each table is fixed in the table space file, that is, the page number=3 page (which we will further prove below). After finding the root page, through the binary search method, the data located to the id=5 should be in the page pointed to by the pointer P5. Then go further to the page number=5 page, and you can also find the id=5 record through the binary query method:

| | 5 | zhao2 | 27 | |

Now that we know how the primary key index B + tree in InnoDB organizes and queries data, let's summarize:

1. The minimum storage unit of InnoDB storage engine is page, which can be used to store data or key values + pointers. Leaf nodes store data in B + tree, and non-leaf nodes store keys + pointers.

2. The index organization table determines which page the data is in through the binary search method of the non-leaf node and the pointer, and then finds the needed data in the data page.

3. So back to the question we started with, how many rows of data can a B+ tree usually hold?

Here we assume that the height of the B+ tree is 2, that is, there is a root node and several leaf nodes, then the total number of records stored in the B+ tree is: the number of pointers of the root node * the number of rows recorded by a single leaf node.

As we have explained above, the number of records in a single leaf node (page) = 16K/1K=16. (it is assumed that the data size of a row of records is 1k. In fact, many Internet business data records are usually about 1K.)

So now we need to figure out how many pointers non-leaf nodes can hold?

In fact, this is also very easy to calculate, we assume that the primary key ID is the bigint type, with a length of 8 bytes, and the pointer size is set to 6 bytes in the InnoDB source code, so a total of 14 bytes, how many such units we can store in a page actually represents how many pointers there are, that is, 16384CP1401170. Then it can be calculated that a B + tree with a height of 2 can hold 1170, 16,18,720 such data records.

According to the same principle, we can calculate that a B+ tree with a height of 3 can store: 1170117016021902400 such records.

Therefore, in InnoDB, the height of B + tree is generally 1-3 layers, so it can satisfy tens of millions of data storage. A page lookup represents an IO when looking for data, so it usually takes only 1-3 IO operations to find the data through a primary key index query.

Fourth, how to get the height of the InnoDB primary key index B + tree?

Above we infer that the height of the B+ tree is usually 1-3. Let's prove this conclusion from the other side. In the tablespace file of InnoDB, the root page of the primary key index is represented by the convention page number of 3, while the page level of the B+ tree is stored at the root page offset of 64. If page level is 1 and tree height is 2, page level is 2, then tree height is 3. That is, the height of the B+ tree = page level+1; below we will try to find this page level from the real environment.

Before the actual operation, you can confirm that the page number of the root page of the primary key index is 3 through the InnoDB metadata table, or you can get it from the book InnoDB Storage engine.

Execution result:

You can see that the page number of the customer table and the lineitem table primary key index root page under the database dbt3 is 3, while the other secondary indexes page number is 4. For the difference between a secondary index and a primary key index, please refer to MySQL related books, which are not covered here.

Let's parse the database tablespace file:

Because the root page of the primary key index B+ tree starts on the third page of the entire tablespace file, its offset in the file can be calculated: 16384, 16384, 49152 (page size).

In addition, according to the "InnoDB storage engine" described in the root page 64 offset position of the first 2 bytes, saved the value of page level, so we want the page level value in the whole file offset is: 16384 "3" 6464 "49152" 49216, the first two bytes.

Next, using the hexdump tool, look at the data on the specified offset of the tablespace file:

The page level of linetem table is 2, the height of B + tree is page level+1=3;region, the height of page level is 0, the height of B + tree is page level+1=1;customer, the height of page level is 2, the height of B + tree is page level+1=3.

The amount of data in these three tables is as follows:

5. Summary

The number of data rows in the lineitem table is more than 6 million, the height of the B+ tree is 3, the number of data rows in the customer table is only 150000, and the height of the B+ tree is 3. You can see that despite the large differences in the amount of data, the height of the two table trees is 3, in other words, the query efficiency of the two tables through the index is not very different, because they only need to do IO three times. So if there is a table with 10 million rows, then its B + tree height is still 3, and the query efficiency will not be much different.

The region table has only five rows of data, and of course his B+ tree is 1 in height.

This is the end of the content of "how many rows of data are stored in a B+ tree in InnoDB". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report