What are the MySQL indexing mechanisms? 07/16 Update SLTechnology News&Howtos

What are the MySQL indexing mechanisms?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the MySQL indexing mechanisms". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn what the MySQL indexing mechanism has.

What is the index

MySQL's official definition of index is: index (Index) is a data structure that helps MySQL obtain data efficiently, while the data structure used by MYSQL is B+ tree.

Here, I recommend you to read a book, "A Book of in-depth understanding of computer Systems."

1.1 locality principle

Both program and data access tend to gather in groups, and only a small part of them are used in a period of time. The information that will be used in the near future is likely to be close to the spatial address of the information being used now (called spatial locality), or the recently accessed program code and data are very likely to be accessed again soon (called time locality).

1.2 disk pre-read

The length of pre-read is generally an integer multiple of pages (page). Pages are logical blocks of memory. The operating system often divides main memory and disk storage into continuous blocks of equal size, each of which is called a page (in many operating systems, the page size is usually 4K). Main memory and disk exchange data on a page-by-page basis

1.3 introduction

In the use of database, database query is usually one of the most important functions of the database. However, each search algorithm can only be applied to specific data structures.

For example, binary search requires that the retrieved data be ordered.

The binary tree search can only be applied to the binary search tree, but the organizational structure of the data itself cannot fully satisfy all kinds of data structures (for example, it is theoretically impossible to organize both columns in order at the same time), so, in addition to the data, the database system also maintains data structures that satisfy specific lookup algorithms that reference (point to) data in some way. In this way, advanced search algorithms can be implemented on these data structures. This kind of data structure is the index. The index is generally stored on disk in the form of a file, and index retrieval requires disk Icano operation. Therefore, the most important index to evaluate the quality of a data structure as an index is the progressive complexity of the number of disk Icano operations in the search process.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Index is a data structure that helps MYSQL to obtain data efficiently.

The index is stored in the file system

The file storage form of the index is related to the storage engine

Structure of index file: hash, binary tree, B tree, B + tree

II. Classification of the index 2.1 hash

There is a mysql data file with two columns Id and name. If we store it in hash format (hash table), as long as we calculate the hash value of a column and take a module according to the length of the array, we can take it to the position of 0-7n subscript. In fact, the efficiency is relatively high, but using hash table storage, it has some disadvantages:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

If you use hash storage, you need to add all the data files to memory, which costs more memory space.

If all queries are equivalent queries, then hash is really fast, but more data is found in the enterprise or in the actual work environment, rather than equivalent queries, because hash is not suitable, so the format 2.2 binary tree stored by hash is not selected in mysql

Index format:

For trees, there is an updated order in which they have fallen. Don't look at the structure as soon as you come up. First, you know what trees are made up of one root, and then there are more than n branches. These branches are some tree-shaped structures. When you have more than one tree branch (multi-element), the search efficiency will be relatively low, so there is something about the binary tree. Why is the binary tree easier to use? Because the binary tree has two branches, but if there are two branches, it will lead to an effect, that is, every time we look for data, it is similar to binary search, but the binary tree also has its own shortcomings. You can look at the index format of the binary tree in our picture above. The node on the left will be shorter (only need to be read three times), while the node on the right will grow a lot (need to be read five times). It will lead to the depth of the tree is relatively deep, every time the tree node reads, there will be an IO, the higher the depth, the higher the IO, which will affect the efficiency of our data reading, so there are (balanced binary tree) and (red-black tree)

Balanced binary tree: maintain a balance, that is, the difference between the height of the left subtree and the right subtree can not be greater than 1, but it is not suitable for our above format, because it is already more than 1, but the AVL tree will also have a problem is that the number of adjustments is too frequent, it involves an operation is rotation, a left-handed, a right-hand rotation, in order to maintain the balance requires N times of rotation This kind of rotation is actually a waste of time. Every time you add or delete it, you have to go through N times of rotation. The efficiency is too low.

Recommend a website, you can directly see the AVL tree operation process, there are students who do not understand can take a look, very image: AVL Trees (Balanced binary search trees)

The red-black tree itself is also a balanced tree, but it makes a tradeoff from the middle, that is, it loses part of the balanced performance, but maintains the relative balance. It makes such an operation, that is, the height of the longest subtree, as long as it is not more than twice the height of the shortest subtree. At the same time, it introduces red and black node information in the red-black tree. With these information, it can help us to make a balance. In the AVL tree, there is rotation to keep the balance, while the red-black tree has rotation and discoloration to keep the balance. The red-black tree is the advanced AVL tree, which loses part of the balance performance, but maintains our efficiency of inserting and deleting data. Although it loses part of the performance, it is still a balance tree. Since it is a balance tree, its longest subtree is no more than twice that of the shortest subtree, which means that if the shortest subtree is 4. Then the longest subtree is 8, so when you look for data, it is not a binary search, and the efficiency will become lower.

Whether it is a binary tree or a red-black tree, the depth of the tree will cause more IO times, which will affect the efficiency of data reading, and the most important thing is to reduce IO.

IO is a bottleneck in our IT industry, one is disk IO and the other is network IO. As software developers, we have no way to adjust the hardware bottleneck. We can only reduce our IO from the program. We have two directions, one is to reduce the number of IO, and the other is to reduce the amount of IO. For example, we used to read data 10 times, but now we only need to read it once. The amount of IO is 10 times less. Originally, we need to read 1MB data, but now we only need to read 1KB data, which is why we do not recommend using select * from when writing mysql query statements, because such a query will query more than N fields. Originally, I only need two fields, but I have given me 30 fields, which will lead to an increase in the amount of IO, so we will consider whether the number of indexes can be reduced. So this leads to our B-tree.

2.3 B-tree

The characteristics of B-tree:

All the keys are distributed in the whole tree.

The search may end at a non-leaf node and do a search in the full set of keywords, and the performance is close to binary search.

Each node has a maximum of m subtrees

The root node has at least 2 subtrees

The branch node has at least 2 subtrees (all branch nodes except root and leaf nodes)

All leaf nodes are on the same layer, and each node can have up to 1 key in ascending order.

Description of B-tree structure:

The example diagram shows that each node occupies a disk block, and on one node there are two keywords sorted in ascending order and three pointers to the root node of the subtree. The pointer stores the address of the disk block where the child node resides. The three scope fields divided by the two keywords correspond to the range fields of the data of the subtree pointed to by the three pointers. With the root node as the column, the data range of the subtree pointed to by the 16 and 34 pointer is smaller than that pointed to by the 16 pointer P2 pointer, and the data range of the subtree pointed to by the 16-34 pointer is larger than that pointed to by the 34 lookup keyword (28) process:

According to the node to find disk block 1, read the memory [the first time of disk Imaco operation]

The comparison keyword 28 finds the pointer P2 of disk block 1 in the interval (162.34).

Find disk block 3 according to P2 pointer and read it into memory [disk Icano operation for the second time]

The comparison keyword 28 is in the interval (255.31), find the pointer P2 of disk block 3.

Find disk block 8 according to P2 pointer, read memory, [disk Icano operation for the third time]

Find keyword 28 in the keyword list in block 8

Disadvantages:

Each node has key, but also contains data, and the storage space of each page is limited. If the data is relatively large, it will cause the number of key stored in each node to become smaller.

When a large amount of data is stored, it will lead to a large depth, and increase the number of disk IO when querying, thus affecting the query performance.

2.4 B+ tree

B+Tree is an optimization based on BTree, with the following changes:

B+Tree can contain more nodes per node for two reasons. The first reason is to reduce the height of the tree, and the second reason is to change the data range into multiple intervals. The more intervals, the faster the data retrieval.

Non-leaf nodes store key (all disks are stored key), and leaf nodes store key and data

Leaf node two pointers are connected to each other (in line with the pre-read characteristics of the disk) sequential query performance is higher if there are no other nodes under the current disk block, that is, leaf nodes, and vice versa, non-leaf nodes

Structure diagram:

Note: there are two header pointers on the B+Tree, one pointing to the root node, the other pointing to the leaf node with the smallest keyword, and all the leaf nodes (that is, data nodes) are chained ring structure, so you can do two kinds of query operations on B+Tree, one is for the primary key range search and paging search, the other is to start from the root node, random search.

3. Mysql storage engine 3.1mysql innoDB (leaf nodes directly place data)

3.1 mysql innoDB (leaf nodes directly place data)

The corresponding row records are stored.

1. InnoDB creates an index on the primary key through the B+Tree structure, and then the records are stored in the leaf node. If there is no primary key, a unique key is selected. If there is no unique key, a 6-bit row_id is generated as the primary key.

2. If the key that creates the index is another field, then the primary key of the record is stored in the leaf node, and then the corresponding record is found through the primary key index

Build an index on name

The ID is stored on the name column, and then the corresponding key and data are found through ID.

3.1 mysql MyISAM

The following 0X0022 is actually the address, which shows that according to our ID, find our address, and then find the corresponding data in the corresponding table through the address.

IV. Classification of indexes

There are five types of mysql indexes: primary key index, unique index, general index and full-text index, and combined index. By adding an index to the field, the reading speed of the data can be improved, and the concurrency and pressure resistance of the project can be improved.

Primary key index: > Primary key is a unique index, but it must be specified as PRIMARY KEY, and there can be only one primary key per table.

All values of the unique index > index column can only appear once, that is, they must be unique, and the value can be empty.

General index > basic index type, the value can be empty, there is no restriction of uniqueness.

Full-text index > the index type of the full-text index is FULLTEXT, and the full-text index can be created on columns of varchar, char, and text types.

A composite index > an index consisting of multiple columns of values, dedicated to combinatorial search.

Storage engine of mysql

At this point, I believe you have a deeper understanding of "what is the MySQL index mechanism?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.