How to create a high-performance index for MySQL 05/07 Update SLTechnology News&Howtos

How to create a high-performance index for MySQL

2025-05-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Most people do not understand the knowledge points of this article "how to create a high-performance index for MySQL", so the editor summarizes the following, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "how to create a high-performance index for MySQL" article.

1 Index basis 1.1 Index function

In MySQL, when looking for data, first find the corresponding value in the index, and then find the corresponding data row according to the matching index record, if you want to run the following query statement:

SELECT * FROM USER WHERE uid = 5

If there is an index in uid, MySQL will use the index to find the row with uid 5 first, that is, MySQL looks up by value on the index and then returns all rows of data containing that value.

1.2 Common data structures for MySQL indexing

MySQL indexing is implemented at the storage engine level, not on the server. Therefore, there is no uniform index standard: different storage engines work differently.

1.2.1 B-Tree

Most MySQL engines support this kind of index B-Tree, and even if multiple storage engines support the same type of index, the underlying implementation may be different. For example, InnoDB uses B+Tree.

Storage engines implement B-Tree in different ways, and their performance is different and has their own advantages. For example, when MyISAM uses prefix compression technology, the index is smaller, when InnoDB is stored in the original data format, the MyISAMy index references the indexed row through the physical location of the data, and InnoDB applies the indexed row according to the component.

All B-Tree values are stored sequentially, and each leaf page is the same distance from the root. The following figure roughly reflects how InnoDB indexes work, and MyISAM uses a different structure. But the basic implementation is similar.

The example diagram shows:

Each node occupies a disk block, and on one node there are two keywords sorted in ascending order and three pointers to the root node of the subtree, which stores the address of the disk block in which the child node resides. The three scope fields divided by the two keywords correspond to the scope fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the data range of the subtree pointed to by the keyword 16 and 34 P1 pointer is less than that pointed to by the 164th P2 pointer, and the data range of the subtree pointed to by the 164th P3 pointer is greater than 34. Find keyword process:

Find disk block 1 according to the root node and read it into memory. [disk Icano operation 1st time]

The comparison keyword 28 is in the interval (161.34), and the pointer P2 of disk block 1 is found.

Locate disk block 3 according to the P2 pointer and read it into memory. [disk Icano operation 2nd]

The comparison keyword 28 is in the interval (251.31), and find the pointer P2 of disk block 3.

Locate disk block 8 according to the P2 pointer and read it into memory. [disk Icano operation 3 times]

Find keyword 28 in the keyword list in Block 8.

Disadvantages:

Each node has key, but also contains data, and the storage space of each page is limited. If the data is relatively large, it will cause the number of key stored in each node to become smaller.

When storing a large amount of data, it will lead to a large depth, increase the number of disk io when querying, and then affect the query performance.

1.2.2 B+Tree Index

B + tree is a variety of B tree. Different from B-tree, B + tree only stores data in leaf nodes, while non-leaf nodes only store key values and pointers.

There are two pointers on the B+ tree, one to the root leaf node, the other to the leaf node with the smallest keyword, and there is a chain ring structure between all the leaf nodes (that is, data nodes). Therefore, there are two kinds of search operations for the B+ tree: one is to search the scope of the component, and the other is to search randomly starting from the root node.

The B * tree is similar to the B + number, except that there is a chain ring structure between the non-leaf nodes of the B * number.

1.2.3 Hash Index

Hash indexing is based on hash table implementation, and only queries that accurately match all columns of the index are valid. For each row of data, the storage engine calculates a hash code (hash code) for all index columns. The hash code is a small value, and the hash code is calculated differently for rows with different key values. The hash index stores all the hash codes in the index, while keeping pointers to each data row in the hash table.

In MySQL, only the Memory default index type is the hash index used, and memory also supports B-Tree indexes. At the same time, the Memory engine supports non-unique hash indexes. if multiple columns have the same hash value, the index will store multiple pointers in the same hash entry as a linked list. Similar to HashMap.

Advantages:

The index itself only needs to store the corresponding hash value, so the structure of the index is very compact, so the hash search speed is very fast.

Disadvantages:

If you use hash storage, you need to add all the data files to memory, which costs more memory space.

Hash index data is not stored sequentially, so it cannot be used for sorting

If all queries are equivalent queries, then hash is really fast, but there is more data to find in the enterprise or in the actual work environment, rather than equivalent queries, so hash is not very suitable.

If there are a lot of hash conflicts, the cost of index maintenance operations will also be high, which is also the problem of Hash conflicts solved by adding red-black trees in the later stage of HashMap.

2 High performance index strategy 2.1 clustered index and non-clustered index clustered index

Not a separate index type, but a form of data storage, where clustered indexes actually hold key values and data rows in the same structure in the InnoDB storage engine. When there is a clustered index in the table, its data rows are actually stored in the leaf pages of the index. Because data rows cannot be stored in different places at the same time, there can be only one clustered index in a table (index overrides can simulate multiple clustered indexes).

Advantages of clustered indexing:

Relevant data can be saved together; data access is faster because the index and data are saved in the same tree; queries that use override index scanning can directly use the primary key values in the page node

Disadvantages:

Clustered data maximizes the performance of IO-intensive applications. If the data is all in memory, the clustered index has no advantage; the insertion speed is heavily dependent on the insertion order, and inserting in the order of the primary key is the fastest way; updating the clustered index column is expensive because it forces each updated row to be moved to a new location A table based on a clustered index may face the problem of page splitting when new rows are inserted, or when the primary key is updated, resulting in moving rows; clustered indexes may cause slow scanning of the full table, especially when rows are sparse, or when data storage is discontinuous due to page splitting

Non-clustered index

The data file is stored separately from the index file

2.2 prefix index

Sometimes need to index a very long string, which will make the index become large and slow, usually you can use the part of the string at the beginning of a column, so as to greatly save the index space, thus improving the efficiency of the index, but this will reduce the selectivity of the index, which refers to the ratio of non-repetitive index values (also known as cardinality cardinality) to the total number of records in the data table, ranging from 1 to 1. The more selective the index, the more efficient the query, because the more selective index allows mysql to filter out more rows when searching.

In general, the selectivity of a column prefix is high enough to meet the query performance, but the corresponding BLOB,TEXT,VARCHAR type columns, must use the prefix index, because mysql does not allow indexing the full length of these columns, the trick to use this method is to select a long enough prefix to ensure high selectivity, but not too long to pass.

Give an example

Table structure and data can be downloaded from MySQL official website or GItHub.

City Table Columns

Field name means city_id city primary key IDcity city name country_id country IDlast_update: created or last updated-calculate the selection of complete columns select count (distinct left (city,3)) / count (*) as sel3, count (distinct left (city,4)) / count (*) as sel4, count (distinct left (city,5)) / count (*) as sel5, count (distinct left (city,6) / count (*) as sel6 Count (distinct left (city,7)) / count (*) as sel7, count (distinct left (city,8)) / count (*) as sel8 from citydemo

You can see that after the current suffix length reaches 7, and then increase the prefix length, the selective improvement has been very small. Therefore, the best prefix index length is 7.

2.3 return to the table

To understand the back table, you need to understand the clustered index and the general index. Clustered index is the primary key index set when the table is created. If MySQL is not set to automatically take the first non-empty unique value as the index, if there is still no InnoDB, a hidden row-id will be created as the index (oracle database row-id is explicitly displayed and can be used for paging); a normal index is an index created for ordinary columns. The ordinary column index stores not the whole row of data in the leaf node but the primary key. When searching by the ordinary index, it will first look for the primary key of the column in the B+ tree, and then look for the data to change rows according to the B+ tree where the primary key is located, which is called returning to the table.

2.4 override Index

Override indexes are particularly useful in InnoDB. In MySQL, you can use the index to get the data of the column directly. If the leaf node of the index already contains the data to be queried, then there is no need to return to the table query. If an index contains (overrides) the values of all the fields to be queried, then the index is overwritten. To put it simply: do not return to the table directly through the index to find the column of data is called an overlay index.

Table information

CREATE TABLE `t _ user` (`uid` int (11) NOT NULL AUTO_INCREMENT, `uname` varchar (255) DEFAULT NULL, `age` int (11) DEFAULT NULL, `update_ time` datetime DEFAULT NULL, PRIMARY KEY (`uid`) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4

Give an example

After setting uid as the primary key index, you can see "Using index" explain select uid from t_user where uid = 1 in the Extra column of explain through the following SQL query

Overlay indexes are often used in combinational indexes, for example

Explain select age,uname from t_user where age = 10

When a combined index is not established, a query is made back to the table.

Query again after setting the composite index

Create index index_user on t_user (age,uname)

2.5 Index matching 2.5.1 leftmost matching

In using a composite index, such as setting (age,name) to a composite index, you can match the index by using the leftmost column of the composite index alone, and do not walk the index if you do not use the leftmost column. For example, the following SQL

-- go to index explain select * from t_user where age=10 and uname='zhang'

The following SQL does not walk the index

Explain select * from t_user where uname='zhang'

2.5.2 match column prefix

You can match the beginning of the value of a column, such as like 'abc%'.

2.5.3 matching range valu

You can find a certain range of data.

Explain select * from t_user where age > 18

2.5.4 exactly match one column and range match another column

You can query all of the first column and parts of the second column

Explain select * from t_user where age=18 and uname like 'zhang%'

2.5.5 queries that access only the index

When querying, you only need to access the index, not the data rows, which is essentially overwriting the index.

Explain select age,uname,update_time from t_user where age=18 and uname= 'zhang' and update_time='123'

3 Best practices for index optimization

1. When querying with index columns, try not to use expressions and put the evaluation at the business tier rather than the database tier.

Select uid,age,uname from t_user where uid=1;-- is recommended, select uid,age,uname from t_user where uid+9=10 is not recommended.

two。 Try to use primary key queries instead of other indexes, because primary key queries do not trigger back table queries

3. Use prefix index reference 2.2 prefix index

4. Using index scan to sort mysql, there are two ways to produce ordered results: through a sort operation or by scanning by index order. If the value of the type column from explain is index, then mysql uses index scanning for sorting.

Scanning the index itself is fast because you only need to move from one index record to the next. However, if the index cannot cover all the columns required by the query, then you have to query the corresponding rows back to the table for each index record scanned, which is basically random IO, so the speed of reading data in index order is usually slower than that in sequential full table scan.

Mysql can use the same index to both satisfy sorting and find rows, and if possible, design the index to meet both tasks as much as possible.

Only when the order of the columns of the index is exactly the same as that of the orderby clause, and all columns are sorted in the same way, mysql can use the index to sort the results, and if the query needs to associate multiple tables, it can use the index to sort only if the fields referenced by the orderby clause are all the first table. The restriction of the order by clause is the same as that of the lookup query, and it needs to meet the requirement of the leftmost prefix of the index, otherwise, mysql needs to perform sequential operations and cannot make use of index sorting.

Sample table structure and data can be downloaded from MySQL official website or GItHub.

CREATE TABLE `rental` (`rental` int (11) NOT NULL AUTO_INCREMENT, `rental_ date` datetime NOT NULL, `inventory_ id` mediumint (8) unsigned NOT NULL, `customer_ id`smallint (5) unsigned NOT NULL, `return_ date` datetime DEFAULT NULL, `staff_ id` tinyint (3) unsigned NOT NULL, `last_ update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`rental_ id`), UNIQUE KEY `rental_ date` (`rental_ date`, `inventory_ id`, `customer_ id`), KEY `idx_fk_inventory_ id` (`inventory_ id`) KEY `idx_fk_customer_ id` (`customer_ id`), KEY `idx_fk_staff_ id` (`staff_ id`), CONSTRAINT `fk_rental_ customer` FOREIGN KEY (`customer_ id`) REFERENCES `customer` (`customer_ id`) ON UPDATE CASCADE, CONSTRAINT `fk_rental_ customory`FOREIGN KEY (`inventory_ id`) REFERENCES `customory` (`inventory_ id`) ON UPDATE CASCADE, CONSTRAINT `fk_rental_ room`FOREIGN KEY (`staff_ id`) REFERENCES `customer` (`staff_ id`) ENGINE=InnoDB AUTO_INCREMENT=16050 DEFAULT CHARSET=utf8mb4

The rental table has an index of rental_date on rental_date,inventory_id,customer_id. Use the rental_date index to sort the following query

This query provides a constant condition for the first column of the index, while sorting with the second column and combining the two columns together, the query under the leftmost prefix explain select rental_id,staff_id from rental where rental_date='2005-05-25' order by inventory_id desc-- of the index does not take advantage of the index explain select rental_id,staff_id from rental where rental_date > '2005-05-25' order by rental_date,inventory_id

5. Indexes can be used in union all,in,or, but in is recommended.

Explain select * from actor where actor_id = 1 union all select * from actor where actor_id = 2 * explain select * from actor where actor_id in (1); explain select * from actor where actor_id = 1 or actor_id = 2

6. Range columns can be used for index range conditions: =, between. An index can be used for a range column, but an index cannot be used for the column following the range column, and the index can be used for at most one range column.

7. Updates are very frequent, and indexing is not suitable for fields with low data differentiation.

Updates will change the B+ tree, and frequently updated fields suggest that the index will greatly reduce database performance.

For indistinguishable attributes such as gender, indexing is meaningless and cannot effectively filter data.

The index can be established when the degree of discrimination is more than 80%, and the degree of discrimination can be calculated using count (distinct (column name)) / count (*).

8. The column that creates the index, which is not allowed to be null, may get results that do not meet expectations.

9. When you need to join tables, it is best not to have more than three tables. If you need join fields, the data type must be the same.

10. Try to use limit when you can use limit

11. It is recommended that the index of a single table be controlled within 5.

twelve。 No more than 5 fields per index (combined index)

13. The following misconceptions should be avoided when creating indexes

The more indexes, the better.

Optimize prematurely and optimize without knowing the system

4 Index Monitoring show status like 'Handler_read%'

The parameter indicates the number of times Handler_read_first reads the first entry of the index Handler_read_key gets data through index, the number of times Handler_read_last reads the last entry of the index, the number of times Handler_read_next reads the next piece of data through the index, the number of times Handler_read_prev reads the previous piece of data through the index, the number of times Handler_read_rnd_next reads data from a fixed location. The number of times you read the next piece of data is the content of the article "how to create a high-performance index for MySQL". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.