Detailed Analysis of MySQL Index and query Optimization 07/19 Update SLTechnology News&Howtos

Detailed Analysis of MySQL Index and query Optimization

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces MySQL indexing and query optimization detailed analysis, the content of the article is carefully selected and edited by the author, with a certain pertinence, for everyone's reference significance is still relatively great, the following with the author to understand the next topic content.

A simple comparison test

In the previous case, the c2c_zwdb.t_file_count table has only one self-incrementing id,FFileName field with no indexed sql implementation as follows:

In the figure above, type=all,key=null,rows=33777. The sql does not use indexes and is a very inefficient full table scan. If you add federated queries and other constraints, the database will consume memory crazily and affect the execution of the front-end program.

Add an index to the FFileName field:

Alter table c2c_zwdb.t_file_count add index index_title (FFileName)

Execute the above query statement again, and the comparison is obvious:

In the figure, the type=ref,key= index name (index_title), rows=1. The sql uses the index index_title and is a constant scan that scans only one row based on the index.

Compared with the case without indexing, after adding the index, the comparison of query efficiency is very obvious.

MySQL index

As can be seen from the above comparative tests, indexing is the key to fast search. The establishment of MySQL index is very important for the efficient operation of MySQL. For a small amount of data, the impact of not having a suitable index is not great, but when the amount of data increases, the performance will degrade sharply. If you index multiple columns (combined indexes), the order of the columns is so important that MySQL can only effectively find the leftmost prefix of the index.

Here are several common MySQL index types.

Indexes are divided into single-column indexes and combined indexes. A single-column index, that is, an index contains only a single column, and a table can have multiple single-column indexes, but this is not a combined index. A composite index, that is, an index contains multiple columns.

1. MySQL index type

(1) Primary key index PRIMARY KEY

It is a special unique index and no null values are allowed. Generally, the primary key index is created at the same time when the table is created.

Of course, you can use the ALTER command. Remember: a table can have only one primary key.

(2) unique index UNIQUE

The value of a unique index column must be unique, but null values are allowed. If it is a combined index, the combination of column values must be unique. You can specify or modify the table structure when you create the table, such as:

ALTER TABLE table_name ADD UNIQUE (column)

(3) General index INDEX

This is the most basic index, and it has no restrictions. You can specify or modify the table structure when you create the table, such as:

ALTER TABLE table_name ADD INDEX index_name (column)

(4) Composite index INDEX

A composite index, that is, an index contains multiple columns. You can specify or modify the table structure when you create the table, such as:

ALTER TABLE table_name ADD INDEX index_name (column1, column2, column3)

(5) full-text index FULLTEXT

Full-text index (also known as full-text retrieval) is a key technology used by search engines at present. It can use a variety of algorithms such as word segmentation technology to intelligently analyze the frequency and importance of keywords in the text, and then intelligently filter out the search results we want according to certain algorithm rules.

You can specify or modify the table structure when you create the table, such as:

ALTER TABLE table_name ADD FULLTEXT (column)

2. Index structure and principle

B+Tree is widely used as index in mysql, but the implementation is different according to clustered index and non-clustered index, which is not discussed in this paper.

Introduction to b + tree

The following picture of the b + tree can be seen in many places, and I chose this one here because I think this picture can well explain the search process of the index.

As pictured above, it is a b + tree. The light blue block is called a disk block, and you can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35, including pointers P1, P2, P3, P1 for blocks less than 17, P2 for blocks between 17 and 35, and P3 for blocks greater than 35.

The real data exists in the leaf nodes, that is, 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, but only store data items that guide the search direction, such as 17 and 35 do not really exist in the data table.

Search process

In the above figure, if you want to find data item 29, then disk block 1 will first be loaded into memory from disk. At this time, an IO occurs, and the P2 pointer of disk block 1 is locked by binary search in memory. The memory time is negligible because it is very short (compared to disk IO). Disk block 3 is loaded into memory from disk through the disk address of P2 pointer of disk block 1, and the second IO occurs. 29 between 26 and 30, lock the P2 pointer of disk block 3, load disk block 8 into memory through the pointer, and the third IO occurs. At the same time, do a binary search in memory to find 29, end the query, a total of three times IO. The truth is, the 3-tier b + tree can represent millions of data. If millions of data lookups only need three IO, the performance improvement will be huge. If there is no index, each data item will have to have one IO, then a total of millions of IO will be required, obviously the cost is very high.

Nature

(1) the index field should be as small as possible.

Through the search process of the above b + tree, or from the fact that the real data exists in the leaf node, the number of IO depends on the height h of the b + number.

Assuming that the amount of data in the current data table is N and the number of data items in each disk block is m, then the height of the tree is h = equal (masks 1) N. when the amount of data N is constant, the larger the m is, the smaller the h is.

And m = the size of the disk block / the size of the data item, the size of the disk block, that is, the size of a data page, is fixed; if the space occupied by the data item is smaller, the more the number of data items, the lower the height h of the tree. This is why each data item, that is, the index field, should be as small as possible. For example, int occupies 4 bytes, which is less than half of bigint8 bytes.

(2) the leftmost matching property of the index.

When the data item of the b + tree is a compound data structure, such as (name,age,sex), the b + number builds the search tree from left to right. For example, when the data such as (Zhang San, 20 ~ F) is retrieved, the b + tree will first compare name to determine the next search direction. If name is the same, then compare age and sex in turn, and finally get the retrieved data. But when there is no name data like (205F), the b + tree doesn't know which node to check next, because name is the first comparison factor when building a search tree, and you have to search according to name before you know where to query next. For example, when retrieving data such as (Zhang San, F), the b + tree can use name to specify the search direction, but the next field age is missing, so we can only find all the data whose name is equal to Zhang San, and then match the data whose gender is F, which is a very important property, that is, the leftmost matching feature of the index.

Several principles of indexing

(1) the principle of leftmost prefix matching

For multi-column indexes, always start with the front field of the index, and then you can't skip it in the middle. For example, if a multi-column index (name,age,sex) is created, it will first match the name field, then the age field, and then the sex field, which cannot be skipped. Mysql matches to the right until it encounters a range query (>, '2000000608201108010831508721'. The reason is very simple, if the index column is involved in the calculation, then each retrieval, will first calculate the index, and then make a comparison, obviously the cost is too high.

(5) expand the index as much as possible, do not create a new index.

For example, if you already have an index of an in the table, and now you want to add the index of (a), you only need to modify the original index.

The deficiency of the index

Although the index can improve the query efficiency, the index also has its own shortcomings.

Additional overhead of the index:

(1) Space: the index needs space.

(2) time: it takes time to query the index.

(3) maintenance: the index needs to be maintained (when data changes)

The use of indexes is not recommended:

(1) tables with a small amount of data

(2) Space is tight.

Summary of common optimization

There are many optimization statements, and there are also many things that need to be paid attention to. In view of the usual situation, summarize a few points:

1. Cases where there is an index but is not used (not recommended)

(1) when the parameter of Like begins with a wildcard

Try to avoid Like parameters starting with wildcards, otherwise the database engine will abandon the use of indexes and perform full table scans.

A sql statement that begins with a wildcard, for example: select * from t_credit_detail where Flistid like'% 0roomG

This is a full table scan, no index is used, and is not recommended.

Sql statements that do not start with a wildcard, for example: select * from t_credit_detail where Flistid like'2% SecretG

Obviously, this uses an index, and there is a wide range of lookups, which is much more efficient than sql statements that start with wildcards.

(2) when the where condition does not conform to the leftmost prefix principle

Examples have been given in the content of the leftmost prefix matching principle.

(3) use! When = or operator

Try to avoid using it! = or operator, otherwise the database engine will give up using the index and perform a full table scan. Use > or '2000000608201108010831508722G

(5) to judge the null value of the field.

Try to avoid judging the null value of the field in the where clause, otherwise it will cause the engine to give up using the index and do a full table scan, such as: inefficient: select * from t_credit_detail where Flistid is null

You can set the default value of 0 on Flistid to ensure that there are no null values for the Flistid column in the table, and then query it like this: efficient: select * from t_credit_detail where Flistid = 0

(6) use or to connect conditions

Try to avoid using or to join conditions in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, for example: inefficient: select * from t_credit_detail where Flistid = '2000000608201108010831508721' or Flistid =' 10000200001'

You can replace the above or query with the following query: efficient: select from t_credit_detail where Flistid = '2000000608201108010831508721' union all select from t_credit_detail where Flistid =' 10000200001'

2. Avoid select *

During the parsing process, the'*'is converted to all the column names in turn, which is done by querying the data dictionary, which means it will take more time.

Therefore, you should form a good habit of taking what you need.

3. Order by sentence optimization

Any non-indexed items or expressions that are evaluated in the Order by statement will slow down the query.

Methods: 1. Rewrite the order by statement to use the index

two。 Create another index for the column used 3. Absolutely avoid using expressions in the order by clause. 4. GROUP BY sentence optimization

To improve the efficiency of GROUP BY statements, you can filter out unwanted records before GROUP BY

Inefficient:

SELECT JOB, AVG (SAL)

FROM EMP

GROUP by JOB

HAVING JOB = 'PRESIDENT'

OR JOB = 'MANAGER'

Efficient:

SELECT JOB, AVG (SAL)

FROM EMP

WHERE JOB = 'PRESIDENT'

OR JOB = 'MANAGER'

GROUP by JOB

5. Replace in with exists

In many cases, using exists instead of in is a good choice: select num from a where num in (select num from b) is replaced with the following statement: select num from a where exists (select 1 from b where num=a.num)

6. Use varchar/nvarchar instead of char/nchar

Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, the storage space of longer fields is small, which can save storage space, and secondly, for queries, searching in a relatively small field is obviously more efficient.

7. If you can use DISTINCT, you don't need GROUP BY

SELECT OrderID FROM Details WHERE UnitPrice > 10 GROUP BY OrderID

Can be changed to:

SELECT DISTINCT OrderID FROM Details WHERE UnitPrice > 10

8. Do not use UNION if you can use UNION ALL

UNION ALL does not execute the SELECT DISTINCT function, which reduces a lot of unnecessary resources.

9. Use a fairly type of example in the Join table and index it

If your application has many JOIN queries, you should make sure that the Join fields in both tables are indexed. In this way, MySQL will start the mechanism to optimize the SQL statement of Join for you.

Moreover, these fields used for Join should be of the same type. For example, if you want to Join the DECIMAL field with an INT field, MySQL will not be able to use their indexes. For those STRING types, you also need to have the same character set. (the character sets of the two tables may not be the same)

After reading the above detailed analysis of MySQL indexing and query optimization, many readers must have some understanding. If you need to get more industry knowledge and information, you can continue to pay attention to our industry information column.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.