Internal implementation and algorithm Analysis of mysql Index 04/21 Update SLTechnology News&Howtos

Internal implementation and algorithm Analysis of mysql Index

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article focuses on "mysql index internal implementation and algorithm analysis", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "mysql index internal implementation and algorithm analysis" it!

The storage engine searches from the root node of the index, looks for the lower layer through the pointer in the node slot, and compares the value of the node page with the value to find the appropriate pointer to enter the lower child node. The storage engine eventually either finds the corresponding value, or the record does not exist.

The pointer to the leaf node points to the data being indexed, not to other node pages.

In addition to looking up by value, the index can also be used for order by operations in the query (reason: the nodes in the index tree are ordered)

Restrictions on B+tree indexes:

1. If you do not start the search by the leftmost column of the index, you cannot use the index

two。 Columns in the index cannot be skipped

3. If there is a range lookup for a column in the query, all columns on the right cannot be optimized using the index

Insert operation of B+ tree

The insertion of B + tree must ensure that the records in the leaf node are still sorted after insertion.

Three cases of inserting B+ Tree

In the first case, insert 28J Leaf page and index page into the picture, which are not full. Insert them directly.

In the second case, inserting 70 page into the picture is full, but the index page is not full.

Description: the use of rotation operation to reduce a page split operation

In the third case, insert the 95J Leaf page and index page in the picture.

Note: in order to keep the balance, the B+ tree may need a lot of page splitting operations for the newly inserted keys, but the B+ tree is mainly used for disk, so we should reduce the page split as much as possible through the rotation function (leaf page is full, but the left and right sibling nodes are not full)

Delete operation of B+ tree

The B + tree uses the fill factor to control the deletion change of the tree, and 50% is the minimum capital that the fill factor can be set. You must also ensure that the records in the deleted leaf node are still sorted.

Three cases of deleting a B+ tree (measured by changes in the filling factor)

Delete 70 in the figure

Delete 25 in the figure, and 28 of the sibling nodes of 25 are updated to the page index

Delete 60 in the figure, and the fill factor is less than 50%. You need to merge.

B+ tree index

The essence of B + tree index is the implementation characteristic of B + tree in database: fan out.

The B + tree index in the database is divided into clustered index (clustered index) and secondary clustered index (secondary index).

Index organization table: the data in the table is stored in the order of primary key

Heap table: stored in the order in which the data is inserted. The indexes on the heap table are nonclustered, and the heap table has no primary key.

Clustered index (only one per table): a B + tree is constructed according to the primary key of each table, and the leaf node stores the row record data of the whole table, so the leaf node of the clustered index also becomes the data page.

The leaf node of the secondary clustered index stores only the key value and the offset to the data page, not a complete row record.

A clustered index is not a separate type of index, but a way to store data. Innodb's clustered index actually holds the B+tree index and data rows in the same structure.

When a table has a clustered index, the rows of data are actually stored in the leaf pages of the index

Description: the storage of clustered indexes is physically discontinuous and logically continuous: 1. Pages are linked through a two-way linked list, and pages are sorted in the order of primary keys; 2. The records in each page are also maintained through a two-way linked list, and physical storage can also be stored without primary keys.

Advantages of clustered indexes:

The relevant data can be saved together.

Faster data access (clustered indexes save indexes and data in the same b+tree)

Queries that use override index scanning can directly use the primary key values in the page node

Disadvantages of clustered indexes:

Aggregating data improves IO performance, and if the data is all in memory, the order of access is less important

The insertion speed is heavily dependent on the insertion order. Inserting in the order of the primary key is the fastest. However, if the data is not loaded in the primary key order, it is best to reorganize the table using optimize table after the load is complete.

Updating clustered index columns is expensive. Because it forces innod to move each updated row to a new location

A table based on a clustered index may face the problem of page splitting when a new row is inserted, or when the primary key is updated so that the row needs to be moved. Page splitting can cause tables to take up more disk space.

Clustered indexes can cause full table scans to slow down, especially when rows are sparse, or data storage is discontinuous due to page fragmentation

The nonclustered index is larger than expected because the leaf node of the secondary index contains the primary key column that references the row

Nonclustered index access requires two index lookups (the row pointer held by the leaf node in the nonclustered index points to the primary key value of the row). For innodb adaptive hash indexing, this repetition can be reduced.

Secondary index: the leaf node contains key values, and each leaf-level index row also contains a bookmark (clustered index of the corresponding row data)

Diagrams of secondary and clustered indexes

Note: the existence of secondary indexes does not affect the organization of data in clustered indexes, so each table can have multiple secondary indexes.

Principle: the process of finding data through a secondary index: innodb traverses the secondary index and obtains the primary key to the primary key index through a leaf-level pointer, and then finds a complete row record through the primary key index.

Management of B+ Tree Index

The method of creating and deleting indexes: alter table;create / drop index

The process of creating a primary key index: first create a new temporary table, then import the data into the temporary table, delete the original table, and rename the temporary table to the original table name

The process of creating a secondary index: there is no need to rebuild the table during the creation process, only an S lock is added to the table, which is extremely fast.

You can view the information of indexes in a table through show index from table_name

Description:

Table: the name of the table in which the index is located

Non_unique: non-unique index, primary key is 0

Key_name: the name of the index

Seq_in_index: the position of the column in the index

Column_name: columns of the index

Collation: how columns are stored in the index, the B+ tree index is always A, that is, sorting; if the heap storage engine is used and the hash index is established, the NULL will be displayed because hash stores the index data through hash buckets instead of sorting the data

Cardinality: an estimate of the number of unique values in the index. The number of rows in the Cardinality/ table is as close to 1 as possible. If it is too small, the index needs to be rebuilt.

Sub_part: whether the part of a column is indexed, for example, only the first number of characters of a column is indexed, and if the entire column is indexed, the field is NULL

Packed: how keywords are compressed. NULL if it is not compressed

NULL: whether the indexed column contains null values

Index_type: type of index. Innodb only supports B+ indexes.

Comment: comment

The use of B+ Tree Index

When accessing a highly selective field and fetching a small number of rows from the table, you need to add a Btree index to the field

Note: when the amount of data fetched exceeds 20% of the data in the table, the optimizer will not use the index, but will scan the table throughout the table. And the estimated number of rows returned is not accurate.

Sequential read, random read, pre-read

Sequntial read: reads blocks on disk sequentially (block)

Random read (random read): the accessed block is not contiguous and requires the head of the disk to move continuously.

Pre-read (read ahead): prereads multiple pages into the buffer pool with a single IO request

Random pre-read (random read ahead): when 13 pages in an area (64 consecutive pages) are also in the buffer and at the front of the LRU list (that is, the pages are accessed frequently), innodb prereads all remaining pages in the area into the buffer

Linear pre-read (linear read ahead): based on the pattern of pages in the buffer pool, not the number of pages. If all 24 pages in a zone are accessed sequentially, innodb will read all pages in the next zone

Innodb_read_ahead_threshold parameter: indicates how many pages in an area are accessed sequentially before innodb enables pre-reading. The default value is 56, that is, when 56 pages in an area are accessed sequentially, all pages in the next area are pre-read.

Federated index: it is still a B + tree, except that the number of keys of the federated index is not 1, but greater than or equal to 2.

Hash algorithm

Adaptive hash indexing uses a hash table (hash table) data structure

Hash table (hash table): improved from direct addressing table

Hash indexing is based on hash table implementation, and only queries that exactly match all columns of the index are valid.

Principle: for each row of data, the storage engine calculates a hash code for all index columns (rows with very small and different key values calculate different hash codes), and the hash index stores all hash codes in the index. it also holds the pointer to each data row. If multiple columns have the same hash value, the index stores multiple record pointers into the same hash entry as a linked list.

In mysql, only the memory engine shows support for hash index (note: non-unique hash index), NDB supports unique hash index, and innodb supports adaptive hash index

Restrictions on hash indexing:

1. The hash index contains only hash values and row pointers, but does not store field values. The index cannot use the values of the index to avoid reading rows.

two。 Hash index data is not stored in the order of index values, so it cannot be used for sorting

3. Hash indexing does not support partial index matching lookups (because hash indexing always uses the entire contents of the index column to calculate the hash value)

4. Hash indexes only support equivalent queries, including =, in

5. When there is a hash conflict (different index column values have the same hash value), the storage engine must traverse all row pointers in the linked list, comparing them one by one until they find a qualified row

6. If there are many hash conflicts, the cost of index maintenance operations can be high (to avoid hash conflicts, you must include hash values and corresponding column values in the where condition).

Adaptive hash indexing

The adaptive hash index can be enabled through the Innodb_adaptive_hash_index parameter, and innodb_buffer_pool_size/256 hash tables with the number of slots will be automatically created when the database is started.

You can view the current usage of adaptive hash indexing through show engine innodb status

Description: including the size of the hash index, usage, and the use of adaptive hash indexing search per second

Note: hash index can only be used to search for equivalent queries, other types cannot use hash indexing.

At this point, I believe that everyone on the "mysql index internal implementation and algorithm analysis" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.