Overview of mysql btree Index 07/19 Update SLTechnology News&Howtos

Overview of mysql btree Index

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Today, under the study of B-tree index in mysql, through this article you can learn about the principle of btree index in mysql, the process of retrieving data, the difference between btree index in innodb and myisam engine, and the benefits and limitations of btree index. B-Tree index is the most frequently used index type in MySQL database, and all storage engines except Archive storage engine support B-Tree index. This is not only true in MySQL, but also in many other database management systems, B-Tree index is also the most important index type, mainly because the storage structure of B-Tree index has very excellent performance in database data retrieval. It is worth noting that B+ tree index is used in innodb and myisam engine in mysql.(That is, each leaf node contains a pointer to the next leaf node, thus facilitating range traversal of leaf nodes, and all nodes except leaf nodes store only keys and pointers). Generally speaking, most of the physical files of B-Tree index in MySQL are stored in the structure of B+tree, that is, all the actual data needed are stored in the Leaf Node of Tree, and the length of the shortest path to any Leaf Node is exactly the same, and various databases (or MySQL storage engines) may store their own B-Tree index. For example, the storage structure actually used by the B-Tree index of the Innodb storage engine is B+Tree, that is, a small modification is made on the basis of the B-Tree data structure. In addition to storing the relevant information of the index key value and the primary key on each Leaf Node, B+Tree also stores the pointer information pointing to the next Leaf Node adjacent to the Leaf Node. This is mainly to speed up the efficiency of retrieving multiple adjacent Leaf Nodes. One: The following focuses on the different implementation principles of b-tree index in mysql innodb and myisam; 1) MyISAM index implementation MyISAM engine uses B+Tree as index structure, the data field of leaf node only stores the address pointing to data record (also called row pointer), in MyISAM, there is no difference in structure between primary index and secondary index (Secondary key), but the primary index requires that the key is unique, and the key of secondary index can be repeated. 2) InnoDB index implementation Although InnoDB also uses B+Tree as the index structure, the specific implementation is completely different from MyISAM. As mentioned earlier, MyISAM index files and data files are separate, and the index file only holds the address of the data row record (row pointer). However, in innodb engine, btree index is divided into two kinds, 1, clustered index (primary index), 2. secondary index, or auxiliary index. The primary key index in InnoDB is a clustered index. The table data file itself is an index structure organized by B+Tree. The leaf node data field of this tree stores complete data records (whole row data). The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary key index. However, innodb's secondary index stores index column values and pointers to primary keys, so we use overlay indexes to optimize the index for mysql innodb. To sum it up: MyISAM engine leaf node stored content: primary index: only row pointers stored; secondary index: only row pointers stored; leaf node stored content in InnoDB engine primary index: clustered index stored complete data (whole row data) secondary index: stored index column values + primary key information The following figure shows the principle of index implementation in mysql innodb and myisam engines

Two: Next, let's talk about the process of retrieving data through btree index: Myisam and innodb engines both use B+tree to implement btree index. In the process of retrieving data, the process of finding data stored in leaf nodes from root nodes to child nodes is the same, but because the contents stored in data in leaf nodes in myisam and innodb engines are different (as described above), the process of finding real data after finding data in leaf nodes is different. Then, according to the different data stored in the leaf node data in the different storage engines described above, for example, the primary key index in innodb stores the complete data row, so when traversing the data according to the primary key index in innodb, you can find the data of the leaf node. As for the leaf node data stored in myisam, it is the row pointer, that is, after finding the data of the leaf node, you can find the real data row according to the row pointer. The following focuses on the process of finding the data domain in the leaf node from the root node: For comparison, you can first look at the B-tree implementation principle:

B+tree implementation principle is as follows:

It can be seen from the two graphs that, compared with B-tree, the root node and child nodes of B+Tree only store key values and pointers, and all data record nodes are stored on leaf nodes of the same layer according to the order of key values. This can greatly increase the number of key values stored in each node and reduce the height of B+Tree. Moreover, leaf nodes in B+Tree store more pointers to the next leaf node than B-tree nodes, which is more convenient for leaf node range traversal. Each node takes up one disk block of disk space, and there are n ascending keywords and (n+1) pointers pointing to the root node of the child tree. This pointer stores the address of the disk block where the child node is located (note that n here is calculated according to the amount of data when creating the index. If the amount of data is too large, the three-layer B+tree may not be satisfied, and a four-layer B+tree or more layers are needed). Then n keywords are divided into (n+1) range fields, and each range field corresponds to a pointer to point to the child node. The child nodes are again divided according to the keywords, and then the pointer points to the leaf nodes. The implementation principle of B+tree index below is explained in detail for the following figure (modified from network):

For the above figure, each node occupies a disk space of a disk block, and one node has two keywords sorted in ascending order and three pointers pointing to subtree root nodes. The three scope fields divided by the two keywords correspond to the scope fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the keywords are 17 and 35, the data range of the subtree pointed to by the P1 pointer is less than 17, the data range of the subtree pointed to by the P2 pointer is 17~35, and the data range of the subtree pointed to by the P3 pointer is greater than 35. Then simulate the specific process below where id=29 for the above figure: (first mysql reads data in blocks (pages)). First find disk block 1 according to the root node and read it into memory. [Disk I/O operation 1] Compare key 29 in interval (17, 35) and find pointer P2 of disk block 1. According to the P2 pointer to find disk block 3, read into memory. [Disk I/O operation 2] Compare key 29 in interval (26, 30) and find pointer P2 of disk block 3. According to the P2 pointer to find disk block 8, read into memory. [Disk I/O operation number 3] Keyword 29 is found in the keyword list in disk block 8. Analyzing the above procedure, we find that it requires 3 disk I/O operations and 3 memory lookup operations. Because the keywords in memory are an ordered list structure, binary search can be used to improve efficiency. And 3 disk I/O operations are the decisive factor affecting the efficiency of the whole B-Tree lookup. Compared with AVLTree, B-Tree reduces the number of nodes, so that the data fetched from memory every time disk I/O plays a role, thus improving query efficiency. In contrast to B-trees, B+Tree root nodes and child nodes only store keys and pointers. Check the page size in mysql: MySQL [meminfo]> show variables like 'innodb_page_size';+------------------| Variable_name | Value | +------------------+-------+ | innodb_page_size | 16384 |+------------------+-------+ 1 row in set (0.00 sec) The page size in the InnoDB storage engine is 16KB, the primary key type of the table is INT (4 bytes) or BIGINT (8 bytes), and the pointer type is usually 4 or 8 bytes, that is, a page.(A node in the B+Tree) stores approximately 16KB/(8B+8B)= 1K key values (because it is an estimate, for convenience of calculation, K here is <$10 <$^3). That is, a B+Tree index with depth 3 can maintain 10^3 * 10^3 * 10^3 = 1 billion records. In practice, each node may not be filled, so in the database, the height of the B+Tree is generally 2~4 layers. MySQL's InnoDB storage engine is designed to have the root node resident in memory, which means that it only needs 1~3 disk I/O operations to find the row record of a certain key value. III: The following is the difference between the traversal data of the clustered table in the innodb engine in mysql and the non-clustered table in myisam, as shown in the following figure (from the network):

It seems that clustered indexes are significantly less efficient than non-clustered indexes, because every time you use an auxiliary index to retrieve, you have to go through two B+ tree lookups. Isn't this unnecessary? What are the advantages of clustered indexing? 1 Because row data and leaf nodes are stored together, so the primary key and row data are loaded into memory together. If you find the leaf node, you can immediately return the row data. If you organize the data according to the primary key Id, it is faster to obtain the data. 2. The advantage of using primary key as "pointer" instead of row address value as pointer is that it reduces the maintenance work of secondary index when row movement or data page split occurs. Using primary key value as pointer will make secondary index occupy more space. The advantage in exchange is that InnoDB does not need to update this "pointer" in secondary index when moving rows. Using clustered index can ensure that no matter how the nodes of this primary key B+ tree change, Secondary index trees are not affected. Four: Finally, the benefits and limitations of B+tree index in MySQL (extracted from the third edition of high-performance MySQL)(a) can be used: you can use the query type of btree index, btree index is used for full key value, key value range, or key prefix lookup, where key prefix lookup is only suitable for lookup based on the leftmost prefix. The multi-column index created in the previous example is valid for the following types of queries: 1) All-value matchingAll-value matching refers to matching all columns in the index, that is, finding names and birth dates2) Matching leftmost prefixes such as: finding only last names, that is, using only the first column of the index 3) Matching column prefixes can also match only the beginning of a column value, such as: matching last names that start with J.(like 'J %'), where only the first column of the index is used, and only part of the first column is used 4) Match range values such as finding people whose last name is between allen and barrymore, where only the first column of the index is used 5) Match exactly one column and range matches another column such as finding all people whose last name is allen and whose first name starts with K, i.e., the first column last_name matches exactly, The second column first_name range matches 6) index-only queries btree can usually support index-only queries, that is, queries only need to access the index, without accessing the data rows, that is, this is the concept of covering the index. The data that needs to be accessed is taken directly from the index, which is for the btree index in innodb. Because the nodes in the index tree are ordered, in addition to finding by value, the index can also be used for order by operations in queries. Generally speaking, if the btree can find values in some way, it can also be used for sorting in this way. Therefore, if the order by clause satisfies several query types listed above, the index can also meet the corresponding sorting requirements. (2) The following are the restrictions on btree index: 1) If the index is not searched according to the leftmost column of the index, the index cannot be used (note that this does not refer to the order of the where condition, that is, in the where condition, no matter what the order of the condition is, as long as the columns appearing in the where can be connected from the leftmost in the multi-column index, the multi-column index can be used) 2) Columns in the index cannot be skipped, such as creating a multi-column index (last name, first name, date of birth): The query criteria are surname and date of birth, skipping the first name column, so that the multi-column index can only use the last name column. 3) If there is a range query for a column in the query, all columns on the right cannot use the index optimization query, such as: where last_name=xxx and first_name like 'xxx%' and dob='xxx'; In this way, the first_name column can use the index, and the dob column after this column cannot use the index. Summary: Common engines in mysql are innodb and myisam. The default indexes created in these two engines are B-tree indexes, which are implemented by B+tree structure, and the contents stored in the specific leaf nodes of innodb and myisam are different. Then the overlay index is for the index of innodb engine, because the leaf nodes of b-tree index in myisam engine store only row pointers.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.