What is the cause of index failure? 07/09 Update SLTechnology News&Howtos

What is the cause of index failure?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

The main content of this article is to explain "what is the cause of index failure". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what is the cause of index failure"!

How is MySQL data stored?

Clustered index

Let's build the following table first.

CREATE TABLE `student` (`id` int (11) NOT NULL AUTO_INCREMENT COMMENT 'student number', `name` varchar (10) NOT NULL COMMENT 'student name', `age`int (11) NOT NULL COMMENT 'student age', PRIMARY KEY (`id`), KEY `idx_ name` (`name`) ENGINE=InnoDB

Insert the following sql

Insert into student (`name`, `age`) value ('aqian, 10); insert into student (`name`, `age`) value (' cages, 12); insert into student (`name`, `age`) value ('breadth, 9); insert into student (`name`, `age`) value (' dongs, 15); insert into student (`name`, `age`) value ('hype, 17); insert into student (`name`, `age`) value (' lure, 13); insert into student (`name`, `age`) value ('karma, 12) Insert into student (`name`, `age`) value ('x, 9)

The data are as follows

The picture mysql stores data on a page-by-page basis, and each page is 16k in size.

In MySQL, you can see the size of a page by executing the following statement

Show global status like 'innodb_page_size'

The result is 16384, or 16kb

In the InnoDB storage engine, data is organized with the primary key as the index. The records are joined in a single linked list on the page in the order of the primary key from small to large.

Some friends may ask, what if the primary key is not specified when the table is created?

If there is no defined primary key that appears when the table is created, the InnoDB storage engine selects or creates the primary key as follows.

First determine whether there is a unique index in the table that is not empty, and if so, the column is the primary key. If there are multiple non-empty unique indexes, the InnoDB storage engine selects the first non-empty unique index defined when the table is created as the primary key

If the above conditions are not met, the InnoDB storage engine automatically creates a 6-byte pointer as an index

Pages and pages are linked together in the form of a double linked list. And the primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous data page

Assuming that a page can hold only three pieces of data, the data storage structure is as follows.

You can see that when we want to query a piece of data or insert a piece of data, we need to start from the beginning of the page and traverse the linked list of each page in turn, which is not efficient.

We can make a directory for this page, save the mapping between the primary key and the page number, and quickly find the page where the data is located according to dichotomy. But this is done only if the mapping needs to be saved to a contiguous space, such as an array. If you do so, there will be the following problems

With the increase of data, the contiguous space needed by the directory becomes larger and larger, which is not realistic.

When all the data on a page is deleted, the corresponding directory items should also be deleted, and the directory entries behind it will be moved forward, and the cost is too high.

We can put the directory data in a structure similar to the user data, as shown below. A catalog item has two columns, a primary key and a page number.

When there is a lot of data, there must be many catalog entries. After all, the size of a page is 16k. We can set up multiple catalog items for the data and re-create the catalog items on the basis of the catalog items, as shown in the following figure.

This is actually a B + tree and a clustered index, that is, the data and the index are together. The leaf node holds all the column values

Take an integer segment index of InnoDB as an example, this N is about 1200. When the height of this tree is 4, you can save 1200 to the third power, which is already 1.7 billion. Considering that the data blocks at the root of the tree are always in memory, and the index of an entire field on a table with 1 billion rows, finding a value requires only 3 visits to disk at most. In fact, the second layer of the tree is also very likely to be in memory, so the average number of times to access the disk is even less.

Nonclustered index

Clustered and nonclustered indexes are very similar, with the following differences

The value of the clustered index leaf node is all column values. The value of the nonclustered index leaf node is the index column + primary key.

When we query the user information whose name is h (student number, name, age), because the index is built on name, first find the corresponding primary key id from the name nonclustered index, and then find the corresponding record from the clustered index according to the primary key id.

The process of finding the corresponding primary key value from the nonclustered index and then finding the corresponding record on the clustered index is to return to the table.

Joint index / index override

Assuming that the teacher table is defined as follows, create a federated index on the name and age columns

CREATE TABLE `teacher` (`id` int (11) NOT NULL AUTO_INCREMENT COMMENT 'teacher number', `name` varchar (10) NOT NULL COMMENT 'teacher name', `age`int (11) NOT NULL COMMENT 'teacher age', `ismale` tinyint (3) NOT NULL COMMENT 'whether male', PRIMARY KEY (`id`), KEY `idx_name_ age` (`name`, `age`) ENGINE=InnoDB

Insert the following sql

Insert into teacher (`name`, `age`, `ismale`) value ('aa', 10,1); insert into teacher (`name`, `age`, `ismale`) value (' dd', 12,0); insert into teacher (`name`, `age`, `ismale`) value ('cb', 9,1); insert into teacher (`name`, `age`, `ismale`) value (' cb', 15,1); insert into teacher (`name`, `age`, `ismale`) value ('bc', 17,0); insert into teacher (`name`, `age`, `ismale`) value (' bb', 15,1) Insert into teacher (`name`, `age`, `ismale`) value ('dd', 15,1); insert into teacher (`name`, `age`, `ismale`) value (' dd', 12,0)

Create a joint index on name and age columns

The catalog page consists of three parts: name column, age column and page number. The catalog is sorted by the name column first, and the age column is sorted only when the name column is the same.

The data page consists of three parts: name column, age column and primary key value. Similarly, the data page sorts the name column first, and then sorts the age column when the name column is the same.

There is a process of returning to the table when the following statement is executed

Select * from student where name = 'aa'

There is no procedure to return the table when the following statement is executed

Select name, age from student where name = 'aa'

Why don't you return to the watch?

Because the values stored in the leaf node of the idx_name_age index are the main key values, name values and age values, the desired column values can be obtained from the idx_name_age index, and there is no need to return to the table, that is, the index override

If you take a closer look at the joint index diagram, you can basically understand why indexes that do not meet the leftmost prefix principle will fail.

Index push-down

When executing the following statement

Select * from student where name like 'Zhang%' and age = 10 and ismale = 1

The execution process before version 5.6 is as follows: first find the corresponding primary key value from the idx_name_age index, and then go back to the table to find the corresponding row to determine whether the values of other fields meet the conditions.

Index push-down optimization is introduced in 5.6. In the process of traversing the index, you can judge the fields contained in the index, directly filter out the data that do not meet the conditions, and reduce the number of times to return to the table, as shown in the following figure.

Leftmost prefix principle

Speed up query

Mainly for combinatorial indexes, the left prefix principle can be satisfied if the following two conditions are met

The columns that need to be queried are in the same order as the columns of the combined index.

Do not cross columns for queries

The constructed data is as follows, where a federated index is built on name,address,country

CREATE TABLE `people` (`name` varchar (50) NOT NULL, `address` varchar (50) NOT NULL, `pragy` varchar (50) NOT NULL, KEY `idx_name_addr_ peoy` (`name`, `address`, `addresy`) ENGINE=InnoDB DEFAULT CHARSET=utf8

To give a few examples, the following involves some knowledge related to explain, followed by a long article to introduce

Example one

Explain select * from people where name = "jack" and address = "beijing" and country = "china"

Type is ref,key_len 456 = (503x2) * 3, and all columns of the federated index use the

Example two

Explain select * from people where name = "jack"

Type is 152 "50" 3 "2 for ref,key_len, and only name columns are used in the federated index.

Example three

Explain select * from people where address = "beijing"

Type is index, indicating that the entire index is scanned during the query and does not speed up the search.

Suppose you have the following federated index key idx_a_b_c (a _ ref _ bj _ c)

Whether sql uses index where a = x and b = x and c = x is where a = x and b = x Yes, partial index where a = x is yes, partial index where b = x No, does not include leftmost column namewhere b = x and c = x No, does not contain leftmost column name

If you look carefully at how the previous federated index is stored, you must be able to read the introduction of whether to use the index or not.

Catalog pages are sorted incrementally in the order of the a b c columns. Sort by column a first, then column b if column an is the same, and column c if column b is the same

So query the column value a b c, then this collation can be used, that is, the index will be taken. If you only look at the column value b, you can't use this collation, so you have to traverse all the records

Accelerated sorting

The leftmost prefix principle can be used not only in queries, but also in sorting. In MySQL, there are two ways to generate ordered result sets:

Return ordered data directly by sequential scanning of ordered index

Filesort sort, sort the returned data

Because the structure of the index is a B + tree, and the data in the index is arranged in a certain order, if the index can be used in the sorting query, the additional sorting operation can be avoided. When EXPLAIN parses the query, the Extra is displayed as Using index.

All operations that do not return sort results directly through the index are Filesort sorting, that is, additional sorting operations are performed. When EXPLAIN analyzes the query, Extra is displayed as Using filesort, which results in greater performance loss when Using filesort occurs, so try to avoid Using filesort.

Or give two examples first, and then sum up

Explain select * from people order by name

The Extra column has only Using index, which is scanned according to the index order.

Explain select * from people order by address

Insert a picture description here

Extra is listed with Using filesort

Summary: if there is a joint index as follows, key idx_a_b_c (a _

Order by can sort using index

Order by a const and b c order by a desc, b desc, c desc where a = const order by b desc c where a = const and b = const order by c where a = const and b > const order by b mai c

Order by cannot use indexes for sorting

Order by b order by c order by bjournal c order by an asc, b desc, c desc / / inconsistent sorting where g = const order by bmai c / / missing an index where a = const order by c / / missing b index where a = const order by a mai d / / d is not part of the index where an in (...) Order by BJM c / / scope query

There's no need for me to explain this reason. I'm sure you understand it.

Benefits of federated indexing

Index coverage reduces a lot of operations back to the table and improves the efficiency of the query.

The index is pushed down, and the more index columns, the less data filtered through the index. A table with 1000W items of data has the following sql:select * from table where col1=1 and col2=2 and col3=3. Assume that 10% of the data can be filtered out for each condition. If there is only a single-valued index, then you can filter out 1000W10%=100w pieces of data through this index, and then go back to the table to find col2=2 and col3=3-compliant data from 100w pieces of data. In the case of a federated index, you can imagine the improvement in efficiency by filtering out 1000w% "10% * 10%" 1w through the index!

Why does the index fail?

When people ask me under what conditions the index will fail, I can memorize a lot of rules.

Do not operate on index columns or use functions

Leading fuzzy queries do not use indexes, such as like% Lee

Negative conditional indexes do not use indexes, it is recommended to use in. Negative conditions are:! =, not in, not exists, not like, etc.

The index is sorted according to certain rules. If you use a function on the index column, or like% Li, you don't know the specific value, how can it speed up the query on the B+ tree?

At this point, I believe you have a deeper understanding of "what is the cause of index failure". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.