What is the underlying data structure of the index in Mysql 07/04 Update SLTechnology News&Howtos

What is the underlying data structure of the index in Mysql

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

What is the underlying data structure of the index in Mysql? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Index data structure versus binary tree

The data of the left child node is smaller than that of the parent node, and the data of the right child node is larger than that of the parent node. If col2 is an index and looks for a row element with an index of 89, you only need to find it twice to get the disk pointer address where the row element is located.

If col1 is an index and looks for a row element with index 6, you need to search six times to get the disk pointer address where the row element is located, that is, the row element with index 6. Therefore, binary tree is not suitable for storing unilateral growing sequence fields, almost full table scan to obtain data.

Red and black tree

Essential binary tree, belongs to binary balanced tree, the underlying implementation of jdk1.8 hashmap; stores a large amount of data, the height of the tree is uncontrollable, the larger the number, the higher the height of the tree; 500w rows of data, 2 to the n power = 500w data, n is the height of the tree, that is, the number of queries

Hash table

You can quickly get disk file pointers through hashing, and look up files for a specified index very quickly, but cannot support range lookups.

B tree

It is essentially a multipath binary tree; the leaf node has the same depth, and the pointer of the leaf node is empty; all index elements are not repeated; the data index in the node increases from left to right.

B + tree (a variety of B tree)

Non-leaf nodes do not store data, but only indexes (redundancy) and pointers, which can put more indexes and reduce the height of the tree; leaf nodes contain all index fields; leaf nodes have more pointer links than b-trees; leaf nodes have two-way pointer links (head and tail child nodes are also connected through pointers) to improve the performance of interval access and range search.

Why does the mysql page file default to 16K?

MySQL maximum storage capacity of each B+ tree node: 16KB (pointer + data + index). Assuming that the size of a row of data is 1K, then 16 pieces of data can be stored on a page, that is, a leaf node can store 16 pieces of data. Let's look at non-leaf nodes. Assuming that the primary key ID is of bigint type, then the length is 8B, and the pointer size is 6B in the Innodb source code, which makes a total of 14B. Then a B + tree with a height of 2 can store 16K/14=1170 (primary key + pointer) in a page. Then a B + tree with a height of 2 can store data as follows: 1170117016, 18720, and a height 3 B + tree can store data as follows: 1170117016cm, 21902400 (tens of millions)

Show global status like `Innodb_page_ size`

Therefore, tables with large amounts of data stored in B + tree can also be very efficient in obtaining data. MySQL uses B + tree as the data structure of the index.

Storage engine

The storage engine ultimately works on tables, not databases. Under the root directory of the mysql installation, there is a data directory that contains the data of all tables.

MyISAM: MyISAM index files and data files are separate (non-clustered or sparse); primary key indexes are similar to secondary primary key index stores

Frm file: the table structure of this table is stored in MYD file: all data rows of this table are stored in MYI file: the index fields in which this table is stored

InnoDB (aggregation):

The table data file itself is an index structure file organized according to B+tree: the table structure ibd file that stores the table: all the data rows and index fields of the table are clustered (clustered) index-the leaf node contains the complete data record

Why must an InnoDB table have a primary key, and it is recommended to use an integer self-incrementing primary key?

First of all, in order to meet the characteristics of the index data structure B + tree of MySQL, there must be an index as the primary key, which can effectively improve the query efficiency, so InnoDB must have a primary key. If you do not specify the primary key manually, InnoDB will find a non-repeating column from the inserted data as the primary key index. If no duplicate column is found, InnoDB will add a column of rowId as the primary key index in the background.

Secondly, the data type of the index is integer. On the one hand, integers occupy less disk space or memory space than strings, on the other hand, integer comparison is faster than strings. String comparison is first converted to ASCII code, and then compared.

Finally, the B + tree is essentially a multi-path and multi-forked tree, if the primary key index is not self-increasing, then the subsequent inserted index will cause the splitting and rebalancing of other nodes of the B+ tree, affecting the efficiency of data insertion. If it is a self-increasing primary key, just add it to the tail node.

Why do non-primary key index structure leaf nodes store primary key values?

Primary key index and non-primary key index maintain their respective B+ tree structure. When inserting data, because there is only one copy of data, the primary key value is obtained through non-primary key index, and then the corresponding row data is found in the B+ tree data structure of primary key index, which saves memory space.

If the leaf node of the non-primary key index also stores a piece of data, if the data is inserted through the non-primary key index, then the row data corresponding to the primary key index is synchronized, which will cause data consistency problems. It can be solved through transactions, and we all know that using transactions consumes performance.

Joint index

What does the underlying storage structure of federated indexes look like?

Define the federated index (employee level, employee name, employee date of birth), put the federated index into the node according to the index order, and compare the new node according to the employee level in the federated index. if the same is the employee name, if the employee level and employee name are the same, and finally the employee's birth year and month comparison. From top to bottom and from left to right in the figure, the nodes of the first B+ tree are compared by the employee level of the joint index, and the second node is of the same employee level, which is compared by employee name. The third node is that the employee level and employee name are both the same, and will be compared according to the year and month of the employee's birth.

This is the answer to the question about what is the underlying data structure of the index in Mysql. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.