First acquaintance of mysql Index 07/02 Update SLTechnology News&Howtos

First acquaintance of mysql Index

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The article is sorted out according to his own understanding after learning teacher Lin Xiaobin's "mysql 45 lectures" at geek time.

What is an index?

When we use a Chinese dictionary to find a word, we will first look up the page number of that word through the pinyin directory, and then turn directly to the page of the dictionary to find the word we are looking for. Searching through the pinyin directory is much faster than we pick up the dictionary from the first page, and so is the database index. the index is like the catalog of books, and the efficiency of data query can be greatly improved through the index.

The implementation of index

In database, the common methods of index implementation are hash table, ordered array and search tree.

Hash table

A hash table is an index implementation that stores data through key-value pairs (key-value). You can think of the hash table as an array, calculate the location of the data in the array by the hash function, and then save the data into the array. It is easy to find a problem. What if the array position calculated by the hash function is the same? Here, each value of the array is a linked list, each element on the linked list is a data, and the new data is added directly to the end of the linked list.

So the database query process is as follows: the index calculates the location of the data through the hash function-> traverses the linked list of the specified location to find the data that meets the conditions.

It should be noted that the data elements on the linked list are not ordered. Every time new data is added, the new data is added directly to the end of the linked list. The advantage of this is that it is convenient to add data.

Hash tables are not good at interval queries and are generally used for equivalent queries.

1. The array positions calculated by two adjacent indexes after passing the hash function may not necessarily remain adjacent.

2. The data on the linked list is an unordered array.

As the name implies, an ordered array stores data on an array according to the size of the index, because the array is ordered and the location can be easily found by dichotomy. After finding the first position, it is easy to get the data of the desired interval by traversing left / right. Therefore, whether it is equivalent query or interval query, the efficiency is very high.

But the defect is also obvious: when inserting a piece of data into the middle n of the array, all the data after n needs to be moved back, so this index is generally used in static storage engines. Search tree

Binary search tree: an empty tree, or a binary tree with the following properties: if its left subtree is not empty, the value of all nodes on the left subtree is less than the value of its root node; if its right subtree is not empty, then the value of all nodes on the right subtree is greater than the value of its root node; the left and right subtrees of the binary search tree are also binary search trees.

Balanced binary tree: balanced binary tree is introduced on the basis of binary search tree, which means that the depth difference between the left subtree and the right subtree of the node is less than 1.

Multi-tree: each node can have multiple child nodes, and the size of the child nodes increases sequentially from left to right.

When using balanced binaries to implement the index, the structure is as follows

From the figure, it can be found that each query needs to access up to 4 nodes to get the desired data. For example, when querying user2, the query process is: userA-- > userC-- > userF-- > user2.

So the query speed is very high, at the same time, because of the characteristics of the search tree (the left subtree is smaller than the right subtree), the interval query is also very convenient.

If the search tree is stored in memory, the binary tree has the highest search rate compared to the multi-tree, but in fact the database uses an n-tree instead of a binary tree.

1. The index is not only stored in memory, but also written to disk

2. Each node on the search tree appears as a data block on disk.

3. A multi-tree can have multiple sub-nodes under each node, so when storing the same amount of data, the height of the multi-tree is smaller than that of the binary tree, and the number of nodes needed to query a data is less, that is, the query process accesses fewer data blocks. The query speed is high.

The Index Model of innodb

Innodb uses the B+ tree as the index structure.

In the B+ tree, we divide the node into leaf node and non-leaf node, the index is saved on the non-leaf node, and a node can hold multiple indexes; all the data is stored on the leaf node, according to the content of the leaf node, the innodb index is divided into primary key index and non-primary key index. A non-primary key index is also called a secondary index.

The data stored in the leaf node of the primary key index is the entire row of data, while the non-primary key index leaf node holds the value of the primary key.

Primary key index graph

Non-primary key index graph

When querying data through the primary key index, we only need to find the primary key index tree to get the data; when querying data through the non-primary key index tree, we first find the primary key value through the non-primary key index tree, and then search the primary key index tree again, this process is called back to the table, that is to say, the non-primary key index query will search one more tree than the primary key query. So we should use the primary key query as much as possible.

Index maintenance

When adding a new row, a record will be added to the index table. If the index is incremented and inserted, the data will be appended after the current maximum index and will not affect other data in the tree; if the index value of the newly added data is in the middle of the node, you need to move the position of some nodes to maintain the order of the index tree.

Moreover, multiple adjacent nodes are stored on the same data page. at this time, if a node is inserted in a data page that has already stored a full state, it will apply for a new data page and move part of the data to the new data page. this process is called page splitting, which will not only affect performance, but also reduce disk space utilization. When irregular data is inserted, it will cause frequent page splitting.

When the utilization rate of two adjacent pages is very low due to the deletion of data, the data pages will be merged.

Therefore, in general, incremental primary keys will be used to incrementally insert new data.

What are the advantages and disadvantages of using business logic fields as primary keys?

1. The business logic field is not easy to ensure the orderly insertion of index tree nodes, so the writing cost is high.

2. Innodb uses the integer type as the primary key by default. The length of the primary key is smaller, and the primary key value is stored in the leaf node of the secondary index. The smaller the length of the primary key, the smaller the space occupied by the leaf node of the secondary index.

3. Of course, it is also good to use business logic fields as primary keys. You can avoid returning to the table and scan the primary key index tree only once at a time.

To sum up, in terms of performance and storage space, self-increasing primary key is often a more reasonable choice. When there is only one index in a business scenario, and the index is unique, it is more appropriate to use business logic fields as primary keys.

Because of data modification / deletion, page splitting and other reasons, the utilization of data page space will be reduced. At this time, you can consider rebuilding the index and inserting the data sequentially to improve the utilization of disk space. However, rebuilding the primary key index and the ordinary index will have a different impact. Rebuilding the ordinary index can achieve the purpose of improving the space utilization, and will not affect other indexes, but if the primary key index is rebuilt, it is unreasonable, it will affect all ordinary indexes, the performance impact is greater, and whether it is a new or deleted primary key, the entire table will be rebuilt. At this point, we can use the statement alter table T engine=InnoDB instead.

View index utilization

View the performance_schema.table_io_waits_summary_by_index_ use table

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.