The underlying structure, usage principles and characteristics of B-Tree indexes in Mysql 07/06 Update SLTechnology News&Howtos

The underlying structure, usage principles and characteristics of B-Tree indexes in Mysql

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "the underlying structure, principles and characteristics of B-Tree index in Mysql". In daily operation, I believe that many people have doubts about the underlying structure, principles and characteristics of B-Tree index in Mysql. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer your doubts about the underlying structure and usage principles and features of B-Tree indexes in Mysql! Next, please follow the editor to study!

MySQL is one of the most popular relational databases in the industry, and index optimization is also one of the keys to database performance optimization. Therefore, a full understanding of MySQL indexes can help developers improve their ability to optimize the use of MySQL databases.

There are many types of indexes for MySQL that can provide better performance for different scenarios. B-Tree index is the most common type of MySQL index. Generally speaking, when talking about MySQL index, if there is no special description, it refers to B-Tree index. This article will explain in detail the underlying structure, principles and features of the B-Tree index.

In order to save your time, the main contents of this article are as follows:

The underlying structure of the B-Tree index

Rules for using B-Tree index

Clustering index

Differences between InnoDB and MyISAM engine indexes

Loose index

Overlay index

B-Tree index

B-Tree indexes use B-Tree to store data, and of course different storage engines are implemented differently. B-Tree usually means that all values are stored sequentially, and each leaf page is the same distance from the root. The following figure shows an abstract representation of the B-Tree index, which shows how MySQL's B-Tree index works.

The underlying data structure of B-Tree index is generally B + tree, so its specific data structure and advantages are not described in detail here. The following figure shows the abstract representation of B-tree index, which roughly reflects how MyISAM index works, while the structure used by InnoDB is different.

MySQL can add B-Tree indexes on a single column or B-Tree indexes on multi-column data, which are combined and stored in B-Tree pages in the order in which index declarations are added. Suppose you have the following data table:

For each row of data in the table, the index contains the values of the last_name,first_name and birthday columns, and the following figure shows how the index organizes the storage of the data.

B-Tree indexes use B-Tree as the data structure in which they store data, and the query rules they use are also determined. In general, B-Tree indexes are suitable for full key values, range of key values, and key prefix lookups, where key prefix lookups are only applicable to lookups based on the leftmost prefix. The query principles supported by the B-Tree index are as follows:

Full-value matching: full-value matching refers to matching with all columns in the index.

Match the leftmost prefix: the index mentioned earlier can be used to find all people with the last name Allen, that is, to use only the first column in the index.

Match column prefix: you can also match only the beginning of the value of a column. For example, the index mentioned earlier can be used to find all people with the last name starting with J. Only the first column of the index is used here.

Match range values: for example, the index mentioned earlier can be used to find people with last names between Allen and Barrymore. Only the first column of the index is used here.

Matches one column exactly and the range matches another: the index mentioned earlier can also be used to find all people whose last name is Allen and whose first name starts with the letter K (such as Kim,Karl, etc.). That is, the first column last_name fully matches, and the second column first_name range matches.

Because the nodes of the index tree are ordered, in addition to looking up by value, the index can also be used for ORDER BY operations in the query (looking up sequentially), and the index can also meet the corresponding sorting requirements if the ORDER BY clause satisfies the query types listed earlier.

Here are some restrictions on B-Tree indexes:

If you do not start the search by the leftmost column of the index, you cannot use the index. For example, the index in the above example cannot find a person with the name Bill or the date of a particular birthday, because neither of these columns is the leftmost data column.

If there is a scope query for a column in the query, all columns on the right cannot be looked up using the index.

Clustering index

Clustered index is not a single index type, but a way of data storage. The details depend on how it is implemented, but InnoDB's clustered index actually holds the B-Tree index and data rows in the same structure.

When a table has a clustered index, its data rows are actually stored in the leaf pages of the index, which means that the data rows and adjacent key values are tightly stored together.

The following figure shows how the records in the clustered index are stored. Notice that the leaf page contains all the data rows of the row, but the node page contains only index columns.

Clustered indexes can be helpful for performance, but they can also cause serious performance problems. Clustered data has some important advantages:

Data access is faster, and clustered indexes keep the index and data in the same B-Tree, so it is usually faster to get data from clustered indexes than to find them in non-clustered indexes.

Queries that use override index scanning can directly use the primary key values in the page node.

If you can take full advantage of the above advantages when designing tables and queries, you can greatly improve performance. At the same time, clustered indexes have some disadvantages:

The insertion order is heavily dependent on the insertion order. Inserting in the order of primary keys is the fastest way to insert data into InnoDB tables, and you need to avoid clustered indexes with random (discontiguous and worthy distribution) primary key values, such as using UUID as the primary key, and using self-incrementing columns like AUTO_INCREMENT.

Updating the clustered index column is expensive because it forces InnoDB to move each updated row to a new location.

Tables based on clustered indexes may face the problem of "page splitting" when inserting new rows, or when the primary key is updated and the rows need to be moved. When the primary key value of a row requires that the row must be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, which is a page split operation. Page splitting can cause tables to take up more disk space.

The secondary index may be larger than expected because the leaf node in the secondary index contains the primary key column that references the row.

Secondary index access requires two index lookups instead of one.

Index differences between InnoDB and MyISAM

The data distribution of clustered index is different from that of non-clustered index, and the data distribution of corresponding primary key index and secondary index is also different, which is often confusing and unexpected. The following figure shows the different indexing and data storage methods for MyISAM and InnoDB.

The data distribution of MyISAM is very simple and is stored on disk in the order in which the data is inserted. The leaf nodes of the primary key index and the secondary index store pointers to the corresponding data rows.

In InnoDB, a clustered index is a table, so you don't need separate row storage like MyISAM does. Each leaf node of the clustered index contains the primary key value and all remaining columns (in this case, col2).

InnoDB's secondary index is very different from a clustered index. Instead of the row pointer stored in the leaf node of the InnoDB secondary index, the primary key value is stored as a "pointer" to the row.

Loose index scan

MySQL does not support loose index scanning, that is, an index cannot be scanned in a discontiguous manner. Usually, an index scan of MySQL needs to define a starting point and an end point, and even if only a few data are needed in this index, MySQL still needs to scan each entry in the index.

Next, let's illustrate this with an example, assuming that we have the following index (aformab) and the following query:

Because the leading field of the index is column a, but only the field brecom MySQL is specified in the query, the index cannot be used, so matching rows can only be found by a full table scan, as shown in the following figure.

If you understand the physical structure of the index, it is not difficult to find a faster way to execute the above query. The physical structure of the index (not the API of the storage engine) Yes, you can scan the range of the b column corresponding to the first value of column a, and then jump to the second different value of column a to scan the range of the corresponding b column. The following figure shows what happens if this process is implemented by MySQL.

Notice that there is no need to use the where clause filtering at this point, because the loose index scan has skipped all unnecessary records.

Versions later than MySQL 5.0 can be scanned using loose indexes in some special scenarios, such as finding the maximum and minimum values of a grouping in a grouping query:

The Extra field in EXPLAIN displays "Using index for group-by", indicating that loose index scanning will be used here.

Overlay index

Index is not only an efficient way to find data, but also a direct way to obtain column data. MySQL can use the index to get the data of the column directly, so that there is no need to read the data row. If an index contains the values of all the fields that need to be queried, we call it an override index.

Override indexes are very useful tools that can greatly improve performance. SQL queries only need to scan the index without returning the table, which brings many benefits:

The number and size of index entries are usually much smaller than the entries and sizes of data rows, so if you only need to read the index, MySQL will greatly reduce data access.

Because the indexes are stored in column order, there is much less intensive range lookup for Imax O than for randomly reading each row of data from disk.

Because of InnoDB's clustered indexes, override indexes are particularly useful for InnoDB tables. The secondary index of InnoDB stores the primary key of the row in the leaf node, and if the secondary primary key can override the query, avoid a second query on the primary key index.

When you initiate a query with an overwritten index (also known as an index overwrite query), you can see the information "Using Index" in the Extra column of EXPLAIN. For example, the table sakila.inventory has a multi-column index (store_id, film_id). If MySQL only needs to access these two columns, it can use this index as an override index, as shown below:

At this point, the study on "the underlying structure and principles and features of B-Tree indexes in Mysql" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.