In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article focuses on "performance analysis and summary of MySQL index optimization". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn the performance analysis and summary of MySQL index optimization.
Case analysis
Let's first take a brief look at the difference between non-relational databases and relational databases.
MongoDB is a kind of NoSQL. The full name of NoSQL is Not only SQL, non-relational database. It is characterized by high performance, strong expansibility and flexible mode, especially in high concurrency scenarios. But at present, it is only a supplement to the relational database, and there is still a certain gap between it and the relational database in terms of data consistency, data security and query complexity.
MySQL is a kind of relational database, which has strong query function, high data consistency, high data security and supports secondary index. However, the performance is slightly lower than that of MongoDB, especially for data with a level of more than one million, it is easy to query slowly. At this time, we need to analyze the reasons for the slow query, which is usually caused by the poor writing of the programmer sql, or the lack of key index, or the index failure and so on.
The company's ERP system database is mainly MongoDB (NoSQL, which is closest to relational data), followed by Redis,MySQL. Now MySQL is back in use, thanks to Alibaba's Qimen system and Jishi Tower system. Considering that the number of orders is more than one million, the performance analysis of MySQL is particularly important.
Let's get started with two simple examples. The role and significance of each parameter will be described in detail later.
Description: the sql you need has been put on github. If you like, you can click star. Haha. Https://github.com/ITDragonBlog/daydayup/tree/master/MySQL/
Scenario 1: import the order and avoid repeating the order through the transaction number
Business logic: when an order is imported, in order to avoid repeating the guide order, it is generally used to query the database through the transaction number to determine whether the order already exists.
The most basic sql statement
Mysql > select * from itdragon_order_list where transaction_id = "81X97310V32236260E"
+- -+
| | id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date |
+- -+
| | 10000 | 81X97310V32236260E | 6.66.13 | 1 | 10 | ok | ok | auto | 1 | itdragon | 2017-08-18 17:01:49 |
+- -+
Mysql > explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"
+-- + -+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-- + -+
| | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | 3 | 33.33 | Using where |
+-- + -+
There is nothing wrong with the query itself, and there is nothing wrong with the offline test environment. However, once the function is online, the problem of slow query is coming. Millions of orders are scanned with a full table? Huh? Huh!
How do you know that the sql is a full table scan? The explain command makes it clear how MySQL handles sql statements. The printed contents are as follows:
Id: the query sequence number is 1.
Select_type: the query type is a simple query, and simple select statements do not have union and subqueries.
Table: the table is itdragon_order_list.
Partitions: no partitions.
Type: connection type. All means full table scan.
Possible_keys: the index may be null.
Key: the index actually used is null.
Key_len: the index length is, of course, null.
Ref: no column or parameter is used with key.
Extra: where query is used.
Because there are only three pieces of data in the database, the information of rows and filtered is of little use. The important thing to understand here is that type is ALL, and the performance of full table scanning is the worst, assuming that there are millions of pieces of data in the database, which will cause stutters without the help of an index.
Preliminary optimization: create an index for transaction_id
Mysql > create unique index idx_order_transaID on itdragon_order_list (transaction_id)
Mysql > explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"
+-+- -- +
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-+- -- +
| | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL |
+-+- -- +
The index created here is a unique index, not a normal index.
The type value printed by the unique index is const. Indicates that it can be found once through the index. As soon as the value is found, the scan ends and returns the query result.
The type value printed by a normal index is ref. Represents a non-unique index scan. Find the value and continue to scan until the index file is scanned. (there is no code posted here), it is obvious that the performance of const is much higher than ref. And according to the business logic, it is reasonable to create a unique index.
Optimize again: overwrite the index
Mysql > explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E"
+-+- -- +
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-+- -- +
| | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index |
+-+- -- +
After changing select * from to select transaction_id from, Extra displays Using index, indicating that the query uses an overlay index, which is a very good news that the performance of the sql statement is good. Prompting for Using filesort (using internal sorting) and Using temporary (using temporary tables) indicates that the sql needs to be optimized immediately.
According to the business logic, the query structure returns transaction_id can meet the business logic requirements.
Scenario 2: order management page, sorted by order level and order entry time
Business logic: give priority to orders with high level and long entry time.
Since it is sorting, the first thing that comes to mind is order by, and there is a terrible Using filesort waiting for you.
The most basic sql statement
Mysql > explain select * from itdragon_order_list order by order_level,input_date
+-- + -+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-- + -+
| | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | 3 | 100 | Using filesort |
+-- + -+
First of all, it doesn't make sense to use a full table scan, and file sorting Using filesort is used, which slows down performance even more.
Before version 4.1, MySQL used a two-way sorting algorithm, which took too long because it scanned the disk twice. Then it is optimized into an one-way sorting algorithm. Its essence is to trade space for time, but if the amount of data is too large and the space of buffer is insufficient, it will lead to many times of Imano. The effect is even worse. Instead of asking operation and maintenance colleagues to modify the MySQL configuration, you might as well build the index obediently.
Preliminary optimization: create a composite index for order_level,input_date
Mysql > create index idx_order_levelDate on itdragon_order_list (order_level,input_date)
Mysql > explain select * from itdragon_order_list order by order_level,input_date
+-- + -+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-- + -+
| | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | 3 | 100 | Using filesort |
+-- + -+
After creating a composite index, you will be surprised to find that it is the same as not creating an index. All are full table scans, and file sorting is used. Is the index invalid? Or did the index creation fail? Let's try to take a look at the following print.
Mysql > explain select order_level,input_date from itdragon_order_list order by order_level,input_date
+- -+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+- -+
| | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index |
+- -+
After replacing select * from with select order_level,input_date from. Type is upgraded from all to index, which means (full index scan) a full index scan, and Extra also shows that an overwritten index is used. But it's not right! Although the retrieval is fast, there are only two fields returned: order_level and input_date. How can business colleagues use them? Is it possible to create a composite index for each field?
MySQL is not that stupid, and you can use force index to force indexes to be specified. Modify force index (idx_order_levelDate) on the original sql statement.
Mysql > explain select * from itdragon_order_list force index (idx_order_levelDate) order by order_level,input_date
+- -+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+- -+
| | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL |
+- -+
Optimize again: do you really want to sort the order level?
In fact, it doesn't make much sense to sort the order level, nor does it make much sense to add an index to the order level. Because the values of order_level may only be, low, medium, high, accelerated, these four. Sorting and indexing are of little use for such repetitive and evenly distributed fields.
Can we fix the value of order_level and then sort input_date? If the query effect is obvious, you can recommend business colleagues to use this query method.
Mysql > explain select * from itdragon_order_list where order_level=3 order by input_date
+-+ -+-+
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-+ -+-+
| | 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition |
+-+ -+-+
Compared to the previous sql, type is upgraded from index to ref (non-unique index scan). The length of the index has changed from 68 to 5, indicating that only one index is used. Ref is also a constant. Extra indicates that Using index condition automatically selects an index scan or a full table scan based on the critical value. Overall, the performance is much better than the previous sql.
The above two cases are just a quick start, and we need to keep in mind that optimization is based on business logic. Never modify the business logic for the sake of optimization. Of course it's best if you can modify it.
Brief introduction to Index
Official definition: index (Index) is a data structure that helps MySQL obtain data efficiently.
You must be curious about why an index is a data structure and how it improves the speed of queries. We use the most commonly used binary tree to analyze how the index works. Look at the picture below:
Advantages of creating an index:
1. Improve the speed of data retrieval and reduce the cost of database IO: the meaning of using an index is to speed up the search by reducing the number of records in the table that need to be queried.
two。 Reduce the cost of data sorting and reduce CPU consumption: the index is searched quickly because the data is sorted first. If this field happens to need sorting, it is good to reduce the cost of sorting.
Disadvantages of creating an index:
1. Take up storage space: an index is actually a table that records primary keys and index fields, usually stored on disk in the form of an index file.
two。 Slow down the speed of updating the table: the data of the table has changed, and the corresponding index needs to be changed together, thus reducing the update speed. Otherwise, the physical data that the index points to may be incorrect, which is one of the reasons why the index fails.
3. It is difficult to create a quality index: the creation of an index is not done in a day, nor is it constant all the time. It is necessary to create the best index frequently according to the user's behavior and specific business logic.
Index classification
The index we often talk about generally refers to the index organized by the BTree (Multipath search Tree) structure. There are aggregate index, secondary index, composite index, prefix index, unique index, collectively referred to as index, and of course, in addition to B + tree, there is also hash index and so on.
Single-valued index: an index contains only a single column, and a table can have multiple single-column indexes
Unique index: the value of the index column must be unique, but null values are allowed
Composite index: an index contains multiple columns, which is recommended in actual development
Compound indexes are recommended in actual development, and it is recommended that the number of indexes created by a single table should not exceed five.
Basic syntax:
Create:
Create [unique] index indexName on tableName (columnName...)
Alter tableName add [unique] index [indexName] on (columnName...)
Delete:
Drop index [indexName] on tableName
View:
Show index from tableName
What situations need to be indexed:
1. Primary key, unique index
two。 Fields that are often used as query criteria need to be indexed
3. Fields that often require sorting, grouping, and statistics need to be indexed
4. The fields associated with other tables in the query are indexed by foreign key relationships
Under what circumstances should not be indexed:
1. There are too few records in the table, and there is no need to create an index for data below the million level.
two。 Tables that are often added, deleted and modified do not need to be indexed
3. Fields with duplicate and evenly distributed data do not need to be indexed, such as true,false.
4. Fields that are updated frequently are not suitable for index creation
5. Fields that are not used in where conditions do not need to be indexed.
Performance analysis.
MySQL's own bottleneck
The performance problems referred to by MySQL itself are insufficient disk space, too large disk Imax O, and low server hardware performance.
1. When CPU:CPU is saturated, it usually occurs when the data is loaded into memory or read from disk.
2. IO: disk Ihambo bottleneck occurs when the load data is much larger than the memory capacity
3. Server hardware performance bottlenecks: top,free,iostat and vmstat to view the performance status of the system
Explain parses sql statements
Use the explain keyword to simulate the optimizer's execution of sql query statements to see how MySQL handles sql statements.
+-- +
| | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-- +
Id
The sequence number of the select query, containing a set of repeatable numbers that represent the order in which the sql statements are executed in the query. There are generally three situations:
The first one: the id is all the same, and the execution order of sql is from top to bottom.
The second kind: id is all different, and the execution order of sql is based on the priority of id.
The third kind: id has both the same and different ones. First according to the priority of id, and then according to the same id from top to bottom.
Select_type
The type of select query, mainly used to distinguish between ordinary queries, federated queries, and nested complex queries
Simple: a simple select query that does not include subqueries or union
Primary: if the query contains any complex subqueries, the outermost query is marked as primary
Subquery: subqueries are included in the select or where list
Derived: subqueries included in the from list are marked as derived (derivative) MySQL. These subqueries are executed recursively, placing the results in a temporary table.
Union: if the second select appears after union, it will be marked as union. If union is included in the subquery of the from clause, the outer select will be marked as: derived
Union result: the select that gets the result from the union table
Partitions
For the partition used in the table, if you want to count the amount of company orders over the past ten years, the data can be divided into ten regions, representing one area each year. This can greatly improve the query efficiency.
Type
This is a very important parameter, connection type, the common are: all, index, range, ref, eq_ref, const, system, null eight levels. The order of performance from best to worst is system > const > eq_ref > ref > range > index > all.
For java programmers, it is an excellent and responsible programmer to ensure that the query reaches at least range level or preferably ref.
All: (full table scan) full table scanning is undoubtedly the worst, if the amount of data in the millions of levels, full table scanning will be very slow.
Index: (full index scan) full index file scanning is much better than all. After all, finding data from an index tree is faster than finding data from a whole table.
Range: retrieves only a given range of rows, using an index to match rows. The scope has been narrowed down, which is certainly faster than full table scans and full index file scans. There are usually between,in, >, < and other queries in sql statements.
Ref: a non-unique index scan, essentially an index access, that returns all rows that match a single value. For example, if you query all the colleagues in the company who belong to the R & D team, the matching result is that multiple values are not unique.
Eq_ref: unique index scan, with one record in the table matching for each index key. For example, if you query the company's CEO, the matching result can only be one record.
Const: indicates that it can be found once through the index, and const is used to compare primary key or unique indexes. Because only one row of data is matched, soon, if you put the primary key in the where list, MySQL can convert the query to a constant.
System: the table has only one record (equal to the system table). This is a special column of const type, which usually does not appear. Just understand it.
Possible_keys
Displays the indexes (one or more or null) that may be used by the query statement and is not necessarily used by the query. For reference only.
Key
Displays the index actually used by the query statement. If null, the index is not used.
Key_len
Displays the number of bytes used in the index, and the index length used in the query can be calculated by key_len. The shorter the index length, the better without losing accuracy. The value displayed by key_len is the most likely length of the index field, not the actual used length, that is, the key_len is calculated based on the table definition and not retrieved within the table.
Ref
Displays which column or constant of the index is used to find the value on the index column.
Rows
According to the statistical information of the table and the selection of the index, we roughly estimate the number of rows that need to be read to find the required records. The larger the value, the worse.
Extra
Using filesort: indicates that MySQL sorts the data using an external index instead of reading it in the order in which it is indexed within the table. The sort operation that cannot be done with indexes in MySQL is called "file sorting". When this occurs, you need to optimize sql immediately.
Using temporary: temporary tables are used to hold intermediate results, and MySQL uses temporary tables when sorting query results. It is common in sorting order by and grouping query group by. When this occurs, you need to optimize sql immediately.
Using index: indicates that the overlay index (Covering index) is used in the corresponding select operation to avoid accessing the data rows of the table, which works well! If Using where appears at the same time, the index is used to perform the lookup of the index key value. If Using where does not appear at the same time, the index is used to read data rather than perform lookup actions.
Override index (Covering Index): also known as index overwrite, that is, the data columns of select can be obtained only from the index without having to read data rows. MySQL can use the index to return fields in the select list without having to read the data file again according to the index.
Using index condition: a new feature added after version 5.6, the optimizer will choose whether to use the index or full table traversal based on the proportion of the number of entries and the total number of entries in the RANGE range when the index exists.
Using where: indicates that where filtering is used.
Using join buffer: indicates that connection caching is used.
Impossible where: the value of the where statement is always false, not available, and cannot be used to get any elements.
Distinct: optimize the distinct operation and stop looking for the same value after finding the first matching tuple.
Filtered
A percentage value, used with the value of the rows column, can estimate the result set of the previous table in the query execution plan (QEP), thus determining the number of cycles for the join operation. A small table drives a large table to reduce the number of connections.
Through the introduction of the parameters of explain, we can know:
1. Read order of the table (id)
two。 Operation type of data read operation (type)
3. Which indexes are actually used (key)
4. References between tables (ref)
5. How many rows per table are queried by the optimizer (rows)
Reasons for performance degradation
From the programmer's point of view
1. The query statement is not well written.
two。 No index, unreasonable index or invalid index
3. Associative queries have too many join
From the server's point of view
1. The server does not have enough disk space
two。 Unreasonable setting of server tuning configuration parameters
Summary
1. An index is a data structure that is sorted and looked up quickly. Its purpose is to improve the efficiency of the query.
two。 After the index is created, the query data becomes faster, but the update data becomes slower.
3. The performance degradation is likely to be caused by index failure.
4. According to the principle of index creation, the fields that are queried frequently are suitable for index creation, and the data that need to be updated frequently is not suitable for index creation.
5. Index fields are updated frequently, or table data is physically deleted, which can easily lead to index failure.
6. Using explain to analyze sql statements
7. In addition to optimizing sql statements, you can also optimize the design of tables. For example, try to make a single table query to reduce the association between tables. Design archive table and so on.
At this point, I believe that everyone on the "MySQL index optimization performance analysis and summary" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.