Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

InnoDB how many rows of data can be stored in a B+ tree

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

InnoDB how many rows of data can be stored in a B + tree, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

A question? InnoDB how many rows of data can a B+ tree hold? The simple answer to this question is: about 20 million. Why so many? Because this can be calculated, to figure out this problem, let's start with the InnoDB index data structure and the way the data is organized. We all know that when computers store data, they have the smallest storage unit, which is like the smallest unit of cash flow we have today. In the computer, the smallest unit of disk data storage is the sector, the size of a sector is 512 bytes, and the file system (such as XFS/EXT4) his smallest unit is the block, the size of a block is 4k, and for our InnoDB storage engine also has its own minimum storage unit-Page, the size of a page is 16K. The following pictures can help you understand the minimum storage unit: a file in the file system is only 1 byte in size, but has to take up 4KB space on disk. All the data files of imginnodb (files with the suffix ibd) are always an integer multiple of 16384 (16k). Img disk sectors, file systems, and InnoDB storage engines all have their own minimum storage units. Img in MySQL, the default size of our InnoDB page is 16k. Of course, you can also set the parameter: mysql > show variables like 'innodb_page_size'. +-+-+ | Variable_name | Value | +-+-+ | innodb_page_size | 16384 | +-+-+ 1 row in set (0.00 sec) data table is stored in the page So how many rows of data can be stored in a page? Assuming that the size of a row of data is 1k, then a page can hold 16 rows of such data. If the database is only stored in this way, then how to find the data becomes a problem, because we do not know which page the data we are looking for is in, and it is impossible to traverse all the pages, which is too slow. So people came up with a way to organize the data in a B+ tree way. As shown in the figure: img, we first sort the data records by the primary key and store them in different pages (in order to make it easy to understand that we only store 3 records in a page, the actual situation can store a lot of). In addition to the page that stores data, there are pages that store key values + pointers, such as the page number=3 page in the figure, which stores key values and pointers to the data page. Of course, it is also in order. This form of data organization is called an index organization table. Now let's take a look. If you want to find a piece of data, how to check it? Such as select * from user where id=5; where id is the primary key, we use this B+ tree to find, first find the root page, how do you know where the root page of the user table is? In fact, the root page location of each table is fixed in the table space file, that is, the page number=3 page (which we will further prove below). After finding the root page, through the binary search method, the data located to the id=5 should be in the page pointed to by the pointer P5, then go further to the page number=5 page, and you can also find the id=5 record through the binary query method: 5

Zhao2

twenty-seven

Now we know how the primary key index B + tree in InnoDB organizes data and queries data, let's sum up: 1, the smallest storage unit of InnoDB storage engine is page, which can be used to store data or key value + pointer, leaf node stores data in B+ tree, non-leaf node stores key value + pointer. 2. The index organization table determines which page the data is in through the binary search method of non-leaf nodes and pointers, and then finds the needed data in the go data page. So back to our initial question, how many rows of data can be stored in a Btree? Here we assume that the height of the B+ tree is 2, that is, there is a root node and several leaf nodes, then the total number of records stored in the B+ tree is: the number of pointers of the root node * the number of rows recorded by a single leaf node. As we have explained above, the number of records in a single leaf node (page) = 16K/1K=16. (it is assumed that the data size of a row of records is 1k. In fact, many Internet business data records are usually about 1K.) So now we need to calculate the number of pointers that can be stored in non-leaf nodes, in fact, this is also very easy to calculate, we assume that the primary key ID is the bigint type, with a length of 8 bytes, and the pointer size is set to 6 bytes in the InnoDB source code, so that a total of 14 bytes, how many such units we can store in a page actually represents how many pointers there are, that is, 16384CPU 1401170. Then it can be calculated that a B + tree with a height of 2 can hold 1170, 16,18,720 such data records. According to the same principle, we can calculate that a B+ tree with a height of 3 can store: 1170117016021902400 such records. Therefore, in InnoDB, the height of B + tree is generally 1-3 layers, so it can satisfy tens of millions of data storage. A page lookup represents an IO when looking for data, so it usually takes only 1-3 IO operations to find the data through a primary key index query. How do I get the height of the InnoDB primary key index B + tree? Above we infer that the height of the B+ tree is usually 1-3. Let's prove this conclusion from the other side. In the tablespace file of InnoDB, the root page of the primary key index is represented by the convention page number of 3, while the page level of the B+ tree is stored at the root page offset of 64. If page level is 1 and tree height is 2, page level is 2, then tree height is 3. That is, the height of the B+ tree = page level+1; below we will try to find this page level from the real environment. Before the actual operation, you can confirm that the page number of the root page of the primary key index is 3 through the InnoDB metadata table, or you can get it from the book InnoDB Storage engine. SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO FROM information_schema.INNODB_SYS_INDEXES a, information_schema.INNODB_SYS_TABLES b WHERE a.table_id = b.table_id AND a.space 0; execution result: img can see that the page number of the root page of customer table and lineitem table primary key index under database dbt3 is 3, while other secondary indexes page number is 4. For the difference between a secondary index and a primary key index, please refer to MySQL related books, which are not covered here. Let's do some parsing of the database tablespace file: img because the root page of the primary key index B+ tree starts on the third page of the entire tablespace file, so we can calculate its offset in the file: 16384 inch 3 inch 49152 (16384 is the page size). In addition, according to the "InnoDB storage engine" described in the root page 64 offset position of the first 2 bytes, saved the value of page level, so we want the page level value in the whole file offset is: 16384 "3" 6464 "49152" 49216, the first two bytes. Next, we use the hexdump tool to view the data on the specified offset of the tablespace file: the page level of the imglinetem table is 2MagneB+ tree height is page level+1=3;region table, the page level of the Btree height is 0Magi B+ tree height is the page level+1=1;customer table, the page level of the B+ tree height is the page level+1=3; table, the amount of data of the three tables is as follows: img summary: the number of data rows of the lineitem table is more than 6 million, the height of the Btree is only 150000 rows of the customer table, and the height of the Btree is only 3. You can see that despite the large differences in the amount of data, the height of the two table trees is 3, in other words, the query efficiency of the two tables through the index is not very different, because they only need to do IO three times. So if there is a table with 10 million rows, then its B + tree height is still 3, and the query efficiency will not be much different. The region table has only five rows of data, and of course his B+ tree is 1 in height. Finally, to review an interview question with a MySQL interview question, why should the MySQL index use a B + tree instead of other tree structures? Like B-tree? Now the complex version of this question can refer to this article; his simple version answers: because the B-tree saves data regardless of leaf nodes or non-leaf nodes, this results in a smaller number of pointers that can be saved in non-leaf nodes (some data are also called fan out). In the case of fewer pointers, saving a large amount of data can only increase the height of the tree, resulting in more IO operations and lower query performance.

Starting from a question, this paper gradually introduces the principle and query mode of InnoDB index organization table, and combines the existing knowledge to answer the question and prove it with practice. Of course, in order to make the expression simple and easy to understand, some details are ignored, such as not all the space in a page can be used to store data, it will also store a small number of other fields such as page level,index number, and so on, and the fill factor of the page also makes it impossible for a page to save all data. About the secondary index data access way can refer to MySQL related books, his main point is to combine the primary key index back to the table query. The answer to the question about how many rows of data can be stored in an InnoDB B + tree is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report