Example Analysis of Index and Optimization in MySQL 07/04 Update SLTechnology News&Howtos

Example Analysis of Index and Optimization in MySQL

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the example analysis of indexing and optimization in MySQL, which is very detailed and has certain reference value. Friends who are interested must read it!

Index and optimization

1. Select the data type of the index

MySQL supports many data types, and choosing the right data type to store data has a great impact on performance. In general, you can follow the following guidelines:

(1) smaller data types are usually better: smaller data types usually require less space in disk, memory, and CPU caches and are faster to process.

(2) simple data types are better: integer data is less expensive to process than characters, because the comparison of strings is more complex. In MySQL, you should use built-in date and time data types instead of strings to store time; and use integer data types to store IP addresses.

(3) avoid NULL as much as possible: you should specify the column NOT NULL unless you want to store NULL. In MySQL, columns with null values are difficult to optimize because they make indexes, index statistics, and comparison operations more complex. You should replace null values with 0, a special value, or an empty string.

1.1. Select an identifier

It is very important to choose the appropriate identifier. When choosing, you should consider not only the storage type, but also how MySQL is calculated and compared. Once the data type is selected, you should ensure that all relevant tables use the same data type.

(1) Integer: it is usually the best choice as an identifier because it can be processed faster and can be set to AUTO_INCREMENT.

(2) Strings: try to avoid using strings as identifiers, they consume more space and are slower to process. And, in general, strings are random, so their position in the index is random, which leads to page splitting, random disk access, and clustered index splitting (for storage engines that use clustered indexes).

2. Introduction to indexing

For any DBMS, the index is the most important factor in optimization. For a small amount of data, the impact of not having a suitable index is not great, but when the amount of data increases, the performance will degrade sharply.

If you index multiple columns (combined indexes), the order of the columns is so important that MySQL can only effectively find the leftmost prefix of the index. For example:

Assuming that there is a composite index it1c1c2 (c1Powerc2), the query statement select * from T1 where c1index 1 and c2 can use this index. The index can also be used by the query statement select * from T1 where c1room1. However, the query statement select * from T1 where c2 cannot use this index because there is no boot column for the combined index, that is, in order to use the c2 column for lookup, C1 must be equal to some value.

2.1. Type of index

Indexing is implemented in the storage engine, not in the server layer. Therefore, the indexes of each storage engine are not necessarily the same, and not all storage engines support all index types.

2.1.1, B-Tree index

Suppose you have the following table:

CREATE TABLE People (

Last_name varchar (50) not null

First_name varchar (50) not null

Dob date not null

Gender enum ('masked,' f') not null

Key (last_name, first_name, dob)

);

Its index contains the last_name, first_name, and dob columns for each row in the table. Its structure is roughly as follows:

The values stored in the index are arranged in the order in the index column. You can use B-Tree indexes to query full keywords, keyword ranges, and keyword prefixes. Of course, if you want to use an index, you must ensure that you query by the leftmost prefix (leftmost prefix of the index) of the index.

(1) match full value (Match the full value): specify specific values for all columns in the index. For example, the index in the image above can help you find Cuba Allen born in 1960-01-01.

(2) match the leftmost prefix (Match a leftmost prefix): you can use the index to find people whose last name is Allen, using only the first column in the index.

(3) match column prefix (Match a column prefix): for example, you can use the index to find people whose last name starts with J, which only uses the first column in the index.

(4) range query of matching values (Match a range of values): you can use the index to find people whose last name is between Allen and Barrymore, using only the first column in the index.

(5) the matching part is accurate and the other parts do range matching (Match one part exactly and match a range on another part): you can use the index to find people whose last name is Allen and first name starts with the letter K.

(6) query the index only (Index-only queries): if the columns of the query are in the index, you do not need to read the values of the tuples.

Because the nodes in the B-tree are stored sequentially, you can use the index to find (find some values), or you can ORDER BY the query results. Of course, there are some limitations to using B-tree indexes:

(1) the query must start with the leftmost column of the index. This point has been mentioned many times. For example, you can't use an index to find people born on a certain day.

(2) an index column cannot be skipped. For example, you can't use an index to find people whose last name is Smith and who was born one day.

(3) the storage engine cannot use the column to the right of the scope condition in the index. For example, if your query statement is WHERE last_name= "Smith" AND first_name LIKE'J% 'AND dob='1976-12-23, then the query will only use the first two columns in the index because LIKE is a range query.

2.1.2, Hash index

In MySQL, only the Memory storage engine shows that hash indexes are supported, which is the default index type for Memory tables, although Memory tables can also use B-Tree indexes. The Memory storage engine supports non-unique hash indexes, which is rare in the database world. If multiple values have the same hash code, the index stores their row pointers in a linked list to the same hash entry.

Suppose you create the following table:

CREATE TABLE testhash (

Fname VARCHAR (50) NOT NULL

Lname VARCHAR (50) NOT NULL

KEY USING HASH (fname)

) ENGINE=MEMORY

The data contained are as follows:

Suppose the index uses the hash function f (), as follows:

F ('Arjen') = 2323

F ('Baron') = 7437

F ('Peter') = 8784

F ('Vadim') = 2458

At this point, the structure of the index is roughly as follows:

The Slots is ordered, but the records are not ordered. When you execute

Mysql > SELECT lname FROM testhash WHERE fname='Peter'

MySQL calculates the hash value of 'Peter' and then uses it to query the row pointer of the index. Because f ('Peter') = 8784 MySQL will look for 8784 in the index and get a pointer to record 3.

Because the index itself stores only very short values, the index is very compact. The Hash value does not depend on the data type of the column, and the index of a TINYINT column is as large as the index of a long string column.

Hash indexes have the following limitations:

(1) because the index contains only hash code and record pointers, MySQL cannot avoid reading records by using the index. But accessing records in memory is very fast and doesn't have much impact on sex.

(2) hash index sorting cannot be used.

(3) the Hash index does not support partial matching of keys because the hash value is calculated through the entire index value.

(4) Hash indexes only support equivalent comparisons, such as using =, IN () and. It does not speed up the query for WHERE price > 100.

2.1.3, R-Tree index

MyISAM supports spatial indexing and is mainly used for geospatial data types such as GEOMETRY.

2.1.4. Full-text (Full-text) index

Full-text index is a special index type of MyISAM, which is mainly used for full-text retrieval.

3. High-performance indexing strategy

3.1Cluster index (Clustered Indexes)

Clustering index ensures that tuples with similar keyword values have the same physical location (so string types should not be clustered, especially random strings, which will cause the system to do a lot of moving operations). And a table can only have one cluster index. Because indexing is implemented by storage engines, not all engines support clustered indexing. Currently, only solidDB and InnoDB support it.

The structure of the clustered index is roughly as follows:

Note: the leaf page contains the complete tuple, while the inner node page contains only the columns of the index (the index is listed as an integer). Some DBMS allow users to specify clustered indexes, but MySQL's storage engine has so far not supported it. InnoDB builds a clustered index on the primary key. If you do not specify a primary key, InnoDB will replace it with an index with a unique and non-null value. If such an index does not exist, InnoDB defines a hidden primary key and then clusters it. Generally speaking, DBMS stores the actual data in the form of clustered indexes, which is the basis of other secondary indexes.

3.1.1. Comparison of data layout between InnoDB and MyISAM

To better understand clustered and non-clustered indexes, or primary and second indexes (MyISAM does not support clustered indexes), compare the data layout of InnoDB and MyISAM for the following tables:

CREATE TABLE layout_test (

Col1 int NOT NULL

Col2 int NOT NULL

PRIMARY KEY (col1)

KEY (col2)

);

Assume that the value of the primary key is between 1murmur10000 and inserted in random order, and then optimized with OPTIMIZE TABLE. Col2 randomly assigns values between 1MMI 100, so there will be a lot of duplicate values.

(1) data layout of MyISAM

The layout is very simple, and MyISAM stores data on disk in the order in which it is inserted, as follows:

Note: on the left is the line number (row number), starting with 0. Because the tuple is fixed in size, MyISAM can easily find the position of a byte from the beginning of the table.

According to these, the index structure of the primary key is roughly as follows:

Note: MyISAM does not support clustered indexes, each leaf node in the index contains only the line number (row number), and the leaf nodes are stored in the order of col1.

Take a look at the index structure of col2:

In fact, in MyISAM, primary key is no different from other indexes. Primary key is just a unique, non-empty index called PRIMARY.

(2) data layout of InnoDB

InnoDB stores data in the form of clustered indexes, so its data layout is very different. The structure of the storage table is roughly as follows:

Note: each leaf node in the clustered index contains the value of primary key, transaction ID and rollback pointer (rollback pointer)-- for transactions and MVCC, and the remaining columns (such as col2).

Compared with MyISAM, secondary indexes are very different from clustered indexes. The leaves of InnoDB's secondary index contain the value of primary key instead of the row pointer (row pointers), which reduces the overhead of maintaining the secondary index when moving data or splitting data pages, because InnoDB does not need to update the row pointer of the index. Its structure is roughly as follows:

Comparison of clustered index and non-clustered index table:

3.1.2. Insert rows in the order of primary key (InnoDB)

If you use InnoDB and do not need a special clustered index, it is a good practice to use a proxy primary key (surrogate key)-independent of the data in your application. The easiest thing to do is to use a column of AUTO_INCREMENT, which ensures that records are inserted sequentially and improves the performance of queries that are joined using primary key. Random clustering of primary keys should be avoided as much as possible. for example, a string primary key is a bad choice, which makes the insert operation random.

3.2.The overlay index (Covering Indexes)

If the index contains all the data that satisfies the query, it is called an override index. Overlay index is a very powerful tool that can greatly improve query performance. Just reading the index without reading the data has the following advantages:

(1) Index entries are usually smaller than records, so MySQL accesses less data

(2) Indexes are stored in the order of values, which requires less Icano than random access records.

(3) most data engines can cache indexes better. For example, MyISAM caches only indexes.

(4) override indexes are particularly useful for InnoDB tables because InnoDB uses a clustered index to organize data, and if the secondary index contains the data needed for the query, it is no longer necessary to look in the clustered index.

The override index cannot be any index, only the B-TREE index stores the corresponding value. And different storage engines implement overriding indexes in different ways, and not all storage engines support overriding indexes (Memory and Falcon do not).

For index coverage queries (index-covered query), when using EXPLAIN, you can see "Using index" in the Extra column. For example, in the inventory table of sakila, there is a composite index (store_id,film_id), and MySQL can use the index for queries that only need to access these two columns, as follows:

Mysql > EXPLAIN SELECT store_id, film_id FROM sakila.inventory/G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: inventory

Type: index

Possible_keys: NULL

Key: idx_store_id_film_id

Key_len: 3

Ref: NULL

Rows: 5007

Extra: Using index

1 row in set (0.17 sec)

In most engines, the index is overwritten only if the column accessed by the query statement is part of the index. However, InnoDB is not limited to this, and the secondary index of InnoDB stores the value of primary key in the leaf node. Therefore, the sakila.actor table uses InnoDB, and there is an index on last_name, so the index can override queries that access actor_id, such as:

Mysql > EXPLAIN SELECT actor_id, last_name

-> FROM sakila.actor WHERE last_name = 'HOPPER'/G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: actor

Type: ref

Possible_keys: idx_actor_last_name

Key: idx_actor_last_name

Key_len: 137

Ref: const

Rows: 2

Extra: Using where; Using index

3.3. Sort using the index

In MySQL, there are two ways to generate ordered result sets: one is to use filesort, and the other is to scan in index order. Sorting with an index is very fast, and you can use the same index to find and sort at the same time. You can use the index to sort when the order of the index is the same as that of the columns in ORDER BY and all columns are in the same direction (all ascending or all descending). If the query joins multiple tables, the index is used only if all columns in the ORDER BY are the columns of the first table. Filesort is used in all other cases.

Create table actor (

Actor_id int unsigned NOT NULL AUTO_INCREMENT

Name varchar (16) NOT NULL DEFAULT'

Password varchar (16) NOT NULL DEFAULT'

PRIMARY KEY (actor_id)

KEY (name)

) ENGINE=InnoDB

Insert into actor (name,password) values ('cat01','1234567')

Insert into actor (name,password) values ('cat02','1234567')

Insert into actor (name,password) values ('ddddd','1234567')

Insert into actor (name,password) values ('aaaaa','1234567')

Mysql > explain select actor_id from actor order by actor_id / G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: actor

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 4

Extra: Using index

1 row in set (0.00 sec)

Mysql > explain select actor_id from actor order by password / G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: actor

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 4

Extra: Using filesort

1 row in set (0.00 sec)

Mysql > explain select actor_id from actor order by name / G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: actor

Type: index

Possible_keys: NULL

Key: name

Key_len: 18

Ref: NULL

Rows: 4

Extra: Using index

1 row in set (0.00 sec)

When MySQL cannot use the index for sorting, it will use its own sorting algorithm (quick sort algorithm) to sort the data in memory (sort buffer). If the memory cannot be loaded, it will divide the data on the disk into blocks, sort each data block, and then merge each block into an ordered result set (actually an external sort). There are two sorting algorithms for filesort,MySQL.

(1) two-pass scan algorithm (Two passes)

The way of realization is to take out the fields to be sorted and the pointer information that can be directly located to the relevant row data, then sort them in the set memory (set by the parameter sort_buffer_size), and then retrieve the required Columns again through the row pointer information after the sorting is completed.

Note: this algorithm is the algorithm used before 4.1, it requires two accesses to the data, especially the second read operation will result in a large number of random Imax O operations. On the other hand, the memory overhead is small.

(3) one scan algorithm (single pass)

The algorithm takes out all the required Columns at one time, sorts the results in memory and outputs the results directly.

Note: this algorithm has been used since MySQL version 4.1. It reduces the number of Imax O and is more efficient, but the memory cost is also higher. If we take out the Columns that we don't need, we will waste a lot of memory needed for the sorting process. In versions later than MySQL 4.1, you can control whether MySQL chooses the first sorting algorithm or the second by setting the max_length_for_sort_data parameter. When the total size of all large fields fetched is larger than the max_length_for_sort_data setting, MySQL chooses the first sorting algorithm, and vice versa, the second. In order to improve sorting performance as much as possible, we naturally prefer to use the second sorting algorithm, so it is necessary to take out only the required Columns in Query.

When sorting join operations, if ORDER BY only references the columns of the first table, MySQL filesort the table and then perform join processing. At this point, EXPLAIN outputs "Using filesort"; otherwise, MySQL must generate a temporary table of the query result set, and filesort operation is performed after the join is complete, and EXPLAIN outputs "Using temporary;Using filesort".

3.4. Indexing and locking

An index is not always important for InnoDB because it allows queries to have fewer tuples to lock. This is important because in MySQL 5.0, InnoDB is not unlocked until the transaction commits. There are two reasons: first, even though the overhead of InnoDB row-level locks is very efficient and memory overhead is small, it is still stored in overhead anyway. Secondly, the locking of unwanted tuples will increase the overhead of locking and reduce concurrency.

InnoDB locks only the tuples that need to be accessed, while the index reduces the number of tuples accessed by InnoDB. However, this can only be achieved by filtering out unwanted data at the storage engine layer. Once the index does not allow InnoDB to do that (that is, it does not achieve the purpose of filtering), the MySQL server can only WHERE the data returned by InnoDB, and it is inevitable to lock those tuples: InnoDB has locked those tuples and the server cannot unlock them.

Let's look at an example:

Create table actor (

Actor_id int unsigned NOT NULL AUTO_INCREMENT

Name varchar (16) NOT NULL DEFAULT'

Password varchar (16) NOT NULL DEFAULT'

PRIMARY KEY (actor_id)

KEY (name)

) ENGINE=InnoDB

Insert into actor (name,password) values ('cat01','1234567')

Insert into actor (name,password) values ('cat02','1234567')

Insert into actor (name,password) values ('ddddd','1234567')

Insert into actor (name,password) values ('aaaaa','1234567')

SET AUTOCOMMIT=0

BEGIN

SELECT actor_id FROM actor WHERE actor_id

< 4 AND actor_id 1 FOR UPDATE; 该查询仅仅返回2---3的数据，实际已经对1---3的数据加上排它锁了。InnoDB锁住元组1是因为MySQL的查询计划仅使用索引进行范围查询（而没有进行过滤操作，WHERE中第二个条件已经无法使用索引了）： mysql>

EXPLAIN SELECT actor_id FROM test.actor

-> WHERE actor_id

< 4 AND actor_id 1 FOR UPDATE /G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: actor type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 4 Extra: Using where; Using index 1 row in set (0.00 sec) mysql>

Indicates that the storage engine starts at the beginning of the index and fetches all rows until actor_id

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.