Analysis of the implementation principle of Mysql Index 07/19 Update SLTechnology News&Howtos

Analysis of the implementation principle of Mysql Index

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you the implementation principle of Mysql index analysis, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.

One: Mysql principle and slow query

With its excellent performance, low cost and rich resources, MySQL has become the first choice of relational database for most Internet companies. Although the performance is excellent, the so-called "good horse and saddle", how to make better use of it, has become a required course for development engineers. We often see requirements such as "proficient in MySQL", "SQL statement optimization", "understanding database principles" and so on from the job description. We know that in general application systems, the read-write ratio is about 10:1, and insert operations and general update operations rarely have performance problems, and those that encounter the most and are most prone to problems are some complex query operations. so the optimization of query statements is obviously the top priority.

Since July 13, I have been working in Meituan's core business system department to optimize slow queries, totaling more than ten systems, and accumulating hundreds of slow query cases. As the complexity of the business increases, the problems encountered are bizarre, varied and unimaginable. The purpose of this paper is to explain the principle of database index and how to optimize slow query from the point of view of development engineer.

Thinking caused by a slow query

Select count (*) from task where status=2 and operator_id=20839 and operate_time > 1371169729 and operate_timeconst > eq_ref > range > index > All (full table scan) generally at least range level, preferably ref

System: the table has only one row of records. This is a special case of const type. Const usually does not appear (ignored): it means that it is found once through the index. Const is used to compare primary key or unique indexes. Because only one row of data is matched, it is very fast. If the primary key is placed in the where list, MySQL can convert the query to a constant.

Eq_ref: unique index scan, with only one record in the table matching for each index key. Common in primary key or unique index scans.

Ref: a non-unique index scan that returns rows that match a single value, which is essentially an index access, which returns all rows that match a single value, but it may find multiple qualified rows, so it should belong to a mixture of lookup and scan range: retrieve only a given range of rows, using a single index to select rows.

The key column shows which index is used, which generally means that queries such as between, in, and so on appear in your where statement. This range scan index is better than a full table scan because it only needs to start at one point of the index and end at another point without scanning all the indexes. Index:FULL INDEX SCAN,index differs from all in that the index type only traverses the index tree. This is usually faster than all because index files are usually smaller than data files.

Extra

Contains additional information that is not suitable for display in other columns but is very important: * * (dangerous!) * * Using

Filesort: indicates that mysql will sort the data using an external index instead of reading it according to the index order within the table. The sorting operation in MYSQL that cannot be done by index is called "file sorting" * * (especially dangerous!) * * Using

Temporary: temporary tables are used to hold intermediate results, and MYSQL uses temporary tables when sorting query results. Common in sorting order by and grouping query group by Using

Index: indicates that the overlay index is used in the corresponding select operation to avoid accessing the data rows of the table, which is very efficient. If using appears at the same time

Where, indicating that the index is used to perform lookups of index key values; if using where does not appear at the same time, the index is used to read data rather than perform lookup operations.

Possible_keys

Displays one or more indexes that may be applied to this table. If there is an index on the fields involved in the query, the index will be listed, but not necessarily used by the query

Key

The index actually used, if NULL, is not used. If an overlay index is used in a query, the index only appears in the key list, and the key parameter can be used as a criterion for judging the index used.

Key_len

Indicates the number of bytes used in the index, which can be used to calculate the length of the index in the query. Without losing accuracy, the shorter the length, the better. The value displayed by key_len is the maximum possible length of the index field, not the actual used length, that is, key_len is calculated according to the table definition, not retrieved within the table.

Ref

Shows which column of the index is used and, if possible, a constant. Which columns or constants are used to find values on the index.

Rows

According to the statistical information of the table and the selection of the index, we roughly estimate the number of rows that need to be read to find the required records.

Four: slow query optimization

About the principle of MySQL index is a relatively boring thing, we only need to have a perceptual understanding, and do not need to understand very thoroughly and deeply. Let's go back to the slow query we talked about at the beginning. After understanding the principle of indexing, do you have any ideas? First, summarize the basic principles of the index.

Several principles of indexing

1. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, 3 and d = 4). If you build an index in the order of (ameme bforce c), d does not need an index, if you build an index (a meme bdre c), then you can use it, and the order of aforce b d can be adjusted at will.

2. = and in can be out of order, such as a = 1 and b = 2 and c = 3. Indexes can be built in any order, and mysql's query optimizer will help you optimize it into a form that the index can recognize.

3. Try to select a highly differentiated column as the index. The formula for distinguishing degree is count (distinct col) / count (*), indicating the proportion of non-repetitive fields. The larger the proportion, the less the number of records we scan, and the differentiation of the unique key is 1. While some status and gender fields may be 0 in front of big data, then some people may ask, is there any empirical value for this ratio? With different scenarios, this value is also difficult to determine. Generally, we need more than 0.1 for the fields that need join, that is, an average of 10 records are scanned.

4. Index columns can not participate in the calculation, keep the column "clean", for example, from_unixtime (create_time) = '2014-05-29' can not use the index, the reason is very simple, the b + tree is stored in the data table field values, but for retrieval, all elements need to be compared with the application function, obviously the cost is too high. So the statement should be written as create_time = unix_timestamp ('2014-05-29'); 5. Expand the index as much as possible, do not create a new index. For example, if you already have an index of an in the table, and now you want to add the index of (aforme b), you only need to modify the original index.

Go back to the initial slow query

According to the leftmost matching principle, the index of the first sql statement should be the joint index of status, operator_id, type and operate_time; the order of status, operator_id and type can be reversed, which is why I will say that finding all the relevant queries of this table will be analyzed comprehensively.

For example, there is also the following query

Select * from task where status = 0 and type = 12 limit 10

Select count (*) from task where status = 0

Then the index status,type,operator_id,operate_time is very correct, because it can cover all cases. This makes use of the leftmost matching principle of the index.

Query optimization artifact-explain command

I believe you are no stranger to the explain command. Please refer to the official website explain-output for specific usage and field meaning. It needs to be emphasized that rows is the core indicator, and most statements with small rows must be executed quickly (there are exceptions, which will be discussed below). So optimization statements are basically optimizing rows.

Basic steps of slow query optimization

0. First run to see if it is really slow, pay attention to set the SQL_NO_CACHE 1.where condition sheet look-up, lock the minimum return record table. This sentence means that the where of the query statement is applied to the table with the smallest number of records returned in the table, and each field in a single table is queried separately to see which field has the highest degree of differentiation 2.explain to check the execution plan, and whether it is consistent with the expectation of 1 (starting with locking the table with fewer records) the sql statement in the form of 3.order by limit makes the sorted table first look up 4. Understand the business side usage scenario 5. Refer to several principles of indexing when adding an index

6. Observation results, not in line with expectations, continue to analyze from zero

Fifth: leftmost prefix principle and related optimization

The first condition for efficient use of indexes is to know what kind of query will use indexes. This problem is related to the "leftmost prefix principle" in B+Tree. Here is an example to illustrate the leftmost prefix principle.

Let's start with the concept of federated indexes. In the above, we all assume that an index references only a single column. In fact, an index in MySQL can refer to multiple columns in a certain order, which is called a federated index. In general, a federated index is an ordered tuple in which each element is a column of a data table. In fact, relational algebra is needed to strictly define an index, but I don't want to discuss too much about relational algebra here. Because that would seem boring, so there is no strict definition here. In addition, a single-column index can be regarded as a special case where the number of elements in the federated index is 1.

Take the employees.titles table as an example, let's first look at what indexes are on it:

SHOW INDEX FROM employees.titles

+-+

+-+

| | titles | 0 | PRIMARY | 3 | from_date | A | 443308 | | BTREE | |

| | titles | 1 | emp_no | 1 | emp_no | A | 443308 | | BTREE | |

+-+

From the result, the primary index of the titles table is, and there is a secondary index. To avoid complicating things with multiple indexes (MySQL's SQL optimizer behaves more complex when it comes to multiple indexes), here we will drop the secondary index drop:

ALTER TABLE employees.titles DROP INDEX emp_no

This allows you to concentrate on analyzing the behavior of the index PRIMARY.

Case 1: all columns match.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title='Senior Engineer' AND from_date='1986-06-26'

+-- +

+-- +

+-- +

Obviously, an index can be used when an exact match is performed against all the columns in the index (in this case, an "=" or "IN" match). It is important to note that indexes are theoretically sensitive to order, but because MySQL's query optimizer automatically adjusts the conditional order of where clauses to use the appropriate index, for example, we reverse the order of conditions in where:

EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26 'AND emp_no='10001' AND title='Senior Engineer'

+-- +

+-- +

+-- +

The effect is the same.

Case 2: the leftmost prefix matches.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001'

+-- +

+-- +

| | 1 | SIMPLE | titles | ref | PRIMARY | PRIMARY | 4 | const | 1 |

+-- +

When the query condition exactly matches one or more consecutive columns on the left side of the index, such as or, so it can be used, but only part of it, that is, the leftmost prefix composed of the condition. The above query uses the PRIMARY index from the analysis results, but the key_len is 4, indicating that only the first column prefix of the index is used.

Case 3: the query condition uses an exact match of the columns in the index, but an intermediate condition is not provided.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26'

+-- +

+-- +

+-- +

At this time, the use of the index is the same as in case 2. Because title is not provided, the query only uses the first column of the index. Although the latter from_date is also in the index, it cannot be connected with the left prefix because title does not exist, so the result needs to be scanned and filtered from_date (here, because emp_no is unique, there is no scanning).

If you want from_date to also use indexes instead of where filtering, you can add a secondary index, which is used by the above query. In addition, you can use an optimization method called "isolated columns" to fill in the "pit" between emp_no and from_date.

First of all, let's take a look at the different values of title:

SELECT DISTINCT (title) FROM employees.titles

+-+

| | title |

+-+

| | Senior Engineer |

| | Staff |

| | Engineer |

| | Senior Staff |

| | Assistant Engineer |

| | Technique Leader |

| | Manager |

+-+

There are only seven. In cases where there are few values for columns that are called "pits", consider using "IN" to fill the "pits" to form the leftmost prefix:

EXPLAIN SELECT * FROM employees.titles

WHERE emp_no='10001'

AND title IN ('Senior Engineer',' Staff', 'Engineer',' Senior Staff', 'Assistant Engineer',' Technique Leader', 'Manager')

AND from_date='1986-06-26'

+-- +

+-- +

+-- +

This time the key_len is 59, indicating that the index is fully used, but you can see from type and rows that IN actually executed a range query, and here seven key are checked. Take a look at the performance comparison of the two queries:

SHOW PROFILES

+-- +

+-- +

| | 10 | 0.00058000 | SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26' | |

| | 11 | 0.00052500 | SELECT * FROM employees.titles WHERE emp_no='10001' AND title IN. | |

+-- +

The performance has been improved a little after "filling the hole". If there is a lot of data left after emp_no filtering, the performance advantage of the latter will be more obvious. Of course, if the title has a lot of values, it is not appropriate to fill the pit, and a secondary index must be established.

Case 4: the query condition does not specify the first column of the index.

EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26'

+-- +

+-- +

+-- +

Since it is not the leftmost prefix, a query such as an index obviously does not need an index.

Case 5: matches the prefix string of a column.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title LIKE 'Senior%'

+-- +

+-- +

+-- +

Indexes can be used at this time, but cannot be used if wildcards do not appear only at the end. (the original statement is incorrect. If the wildcard% does not appear at the beginning, the index can be used, but only one of the prefixes may be used depending on the situation.)

Case 6: scope inquiry.

Unfortunately, if the query condition contains a function or expression, MySQL will not use an index for this column (although some can be used mathematically). For example:

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND left (title, 6) = 'Senior'

+-- +

+-- +

+-- +

Although this query has the same functionality as in case 5, due to the use of the function left, you cannot apply an index to the title column, while in case 5, you can use LIKE. Another example is:

EXPLAIN SELECT * FROM employees.titles WHERE emp_no-1 million

+-- +

+-- +

+-- +

Obviously this query is equivalent to querying a function with an emp_no of 10001, but because the query condition is an expression, MySQL cannot use the index for it. It seems that MySQL is not intelligent enough to automatically optimize constant expressions, so when writing query statements, try to avoid expressions in the query, but first manual private algebraic operation, converted to non-expression query statements.

Index selectivity and prefix index

Since the index can speed up the query, is it necessary to build the index as long as the query statement needs it? The answer is no. Because although the index speeds up the query speed, the index also has a price: the index file itself consumes storage space, and the index increases the burden of inserting, deleting, and modifying records. In addition, MySQL also consumes resources to maintain the index at run time, so the more indexes the better. Generally speaking, indexing is not recommended in two cases.

The first case is that there are relatively few table records, such as a table with one or two thousand or even a few hundred records, there is no need to build an index, just let the query do a full table scan. As for how many records are considered, this individual has his own opinion. My personal experience is to take 2000 as the dividing line. No index can be considered if the number of records does not exceed 2000, and an index can be considered if there are more than 2000 records.

Another situation where indexing is not recommended is that the selectivity of the index is low. The so-called index selectivity (Selectivity) refers to the ratio of the non-repetitive index value (also known as cardinality, Cardinality) to the number of table records (# T):

Index Selectivity = Cardinality / # T

Obviously, the value range of selectivity is (0,1), and the higher the selectivity, the greater the index value, which is determined by the nature of B+Tree. For example, in the employees.titles table used above, if the title field is often queried separately, whether it needs to be indexed, let's take a look at its selectivity:

SELECT count (DISTINCT (title)) / count (*) AS Selectivity FROM employees.titles

+-+

| | Selectivity |

+-+

| | 0.0000 |

+-+

The selectivity of title is less than 0.0001 (the exact value is 0.00001579), so there is really no need to index it separately.

There is an index optimization strategy related to index selectivity called prefix index, which uses the prefix of the column instead of the whole column as the index key. When the current prefix length is appropriate, it can make the selectivity of the prefix index close to the full-column index, and reduce the size and maintenance overhead of the index file because the index key becomes shorter. Let's take the employees.employees table as an example to introduce the selection and use of prefix indexes.

You can see from figure 12 that the employees table has only one index, so if we want to search for a person by name, we have to scan the whole table:

EXPLAIN SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido'

+-- +

+-- +

+-- +

If you search for employees by name frequently, this is obviously inefficient, so we can consider indexing. There are two options, build or, look at the selectivity of the two indexes:

SELECT count (DISTINCT (first_name)) / count (*) AS Selectivity FROM employees.employees

+-+

| | Selectivity |

+-+

| | 0.0042 |

+-+

SELECT count (DISTINCT (concat (first_name, last_name)) / count (*) AS Selectivity FROM employees.employees

+-+

| | Selectivity |

+-+

| | 0.9313 |

+-+

Obviously, the selectivity is too low and the selectivity is very good, but the combined length of first_name and last_name is 30. Is there a way to strike a balance between length and selectivity? Consider indexing the first few characters of first_name and last_name, for example, to see their options:

SELECT count (DISTINCT (concat (first_name, left (last_name, 3) / count (*) AS Selectivity FROM employees.employees

+-+

| | Selectivity |

+-+

| | 0.7879 |

+-+

The choice is good, but it's still a little far from 0.9313, so add the last_name prefix to 4:

SELECT count (DISTINCT (concat (first_name, left (last_name, 4) / count (*) AS Selectivity FROM employees.employees

+-+

| | Selectivity |

+-+

| | 0.9007 |

+-+

At this point, the selectivity is ideal, and the length of this index is only 18, which is nearly half shorter, so let's build this prefix index:

ALTER TABLE employees.employees

ADD INDEX first_name_last_name4 (first_name, last_name (4))

At this point, run the query by name again to compare and analyze the results before indexing:

SHOW PROFILES

+-- +

+-- +

| | 87 | 0.11941700 | SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido' |

| | 90 | 0.00092400 | SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido' |

+-- +

The performance improvement is significant, and the query speed is increased by more than 120 times.

The prefix index takes into account both index size and query speed, but its disadvantage is that it cannot be used for ORDER BY and GROUP BY operations, nor can it be used for Covering index (that is, when the index itself contains all the data needed for the query, the data file itself is no longer accessed.

Six: primary key selection and insertion optimization of InnoDB

When using the InnoDB storage engine, if there is no special need, always use a business-independent self-increment field as the primary key.

We often see posts or blogs discussing the selection of primary keys. some people suggest using self-increasing primary keys that have nothing to do with business. Some people think that it is not necessary to use unique fields such as student number or ID card number as primary keys. No matter which argument is supported, most of the arguments are at the business level. From the perspective of database index optimization, using the InnoDB engine instead of using self-incrementing primary keys is definitely a bad idea.

The above is the analysis of the implementation principle of Mysql index. have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.