In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you a sample analysis of MySQL query optimization, I hope you will gain something after reading this article, let's discuss it together!
I. what are the ideas and principles of optimization
1. Optimize queries that need to be optimized more
2. Locate the performance bottleneck of the optimized object
3. Define the goal of optimization.
4. Start with Explain
5. Use profile more often
6. Always use small result sets to drive large result sets
7. Complete sorting in the index as much as possible
8. Take out only the fields you need (Columns)
9. Use only the most effective filter conditions
Avoid complex join as much as possible
Related free learning recommendation: mysql video tutorial
1. Optimize queries that need to be optimized more
queries with high concurrency and low consumption (relative) have a much greater impact on the whole system than queries with low concurrency and high consumption.
2. Locate the performance bottleneck of the optimized object
When gets a query that needs to be optimized, we first need to determine whether the bottleneck of the query is IO or CPU. Whether database access consumes more or data operations (such as grouping sorting) consume more.
3. Define the goal of optimization.
By understanding the current overall state of the database, can know the maximum pressure that the database can withstand, that is, we know the most pessimistic situation.
If wants to grasp the database object information related to the query, we can know how many resources need to be consumed in the best and worst conditions.
If wants to know the status of the query in the application system, we can analyze the proportion of system resources that can be occupied by the modified query, and we can also know how much the efficiency of the query affects the customer's experience.
4. Start with Explain
Explain can tell you what kind of execution plan this query is implemented in the database. First of all, we need to have a goal, by constantly adjusting and trying, and then with the help of Explain to verify that the results meet our needs until we get the expected results.
5. Always use small result sets to drive large result sets
Many people in like to say that "small tables drive large tables" when optimizing SQL, which is not rigorous. Because the result set returned by a large table after where conditional filtering is not necessarily larger than that returned by a small table, using a large table to drive a small table will have the opposite performance effect.
The result of is also very easy to understand. In Join in MySQL, there is only one Join way of Nested Loop, that is, MySQL's Join is implemented through nested loops. The larger the driven result set, the more loops are needed at this time, then the more visits to the driven table, and each access to the driven table, even if the logical IO required is very small, the total number of loops can not be very small, and each loop inevitably needs to consume CPU, so the amount of CPU computation will increase. Therefore, if we only use the size of the table as the judgment basis for driving the table, if the result set left after filtering by the small table is much more than that of the large table, the result is that more cycles are needed in the nested loop, otherwise, fewer cycles are needed, and the total amount of IO and CPU operations will be less. Moreover, even non-Nested Loop Join algorithms, such as Hash Join in Oracle, are also small result sets that drive large result sets are the best choice.
, when optimizing Join Query, the most basic principle is "small result sets drive large result sets". Through this principle, we can reduce the number of loops in nested loops, so as to reduce the total amount of IO and the number of CPU operations. Sort in the index as much as possible
6. Take out only the fields you need (Columns)
for any query, the returned data needs to be transmitted to the client through network packets. If you take out more Column, the more data you need to transfer, the greater the amount of data you need to transmit, which is a waste in terms of network bandwidth and network transmission buffer.
7. Use only the most effective filter conditions
, for example, a user table user has fields such as id and nick_name, and the indexes are id and nike_name. Here are two query statements.
# 1select * from user where id = 1 and nick_name = 'zs';#2selet * from user where id = 1
The results of the two queries are the same, but the index used by the first statement takes up much more space than the second statement. Taking up more space also means more data to be read. That is to say, the query statement of 2 is the optimal query
8. Avoid complex join queries
the more tables our query involves, the more resources we need to lock. That is, the more complex the Join statement, the more resources you need to lock and the more other threads are blocked. On the contrary, if we split the more complex query statements into several simpler query statements to execute step by step, there will be much less resources locked each time and fewer other threads blocked.
many people may wonder, after splitting the complex Join statement into several simple query statements, isn't that our network interaction will be more? The overall consumption of network latency is also greater, ah, isn't it longer to complete the entire query? Yes, it is possible, but it is not certain that this will be the case. We can analyze again that when a complex query statement is executed, it needs to lock more resources, so it is more likely to be blocked by others. If it is a simple query, because fewer resources need to be locked, the probability of being blocked will be much lower. So more complex join queries are also likely to be blocked before execution, wasting more time. And our database serves not only this query request, but also many other requests. in a highly concurrent system, it is worthwhile to improve the overall processing capacity at the expense of the short response time of a single query. Optimization itself is an art of balance and trade-off. Only by knowing how to choose and balance the whole, can we make the system better.
Second, using Explain and Profiling
1. Explain usage
All kinds of information display
The field describes the serial number Select_type query type of the query in the ID execution plan:
DEPENDENT SUBQUERY: the first SELECT in the inner layer of a subquery, depending on the external query result set
DEPENDENT UNION: all subsequent SELECT in the UNION in the subquery starting from the second SELECT, also dependent on the external query result set
PRIMARY: the outermost query in a subquery, not a primary key query
SUBQUERY: the first SELECT of the inner query of the subquery, and the result does not depend on the external result set
UNCACHEABLE SUBQUERY: subqueries for which result sets cannot be cached
All subsequent SELECT at the beginning of the second SELECT in the UNION:UNION statement, with the first SELECT being PRIMARY
The table name in the database accessed by the merge result Table in UNION RESULT:UNION is accessed by TYPE access:
ALL: full table scan
Const: constant. At most one record matches. Because it is a constant, it only needs to be read once.
Eq_ref: there is at most one matching result, which is usually accessed by the primary key or unique index
Index: full index scan
Range: index range scan
Ref: reference queries for driven table indexes in jion statements
System: system table with only one row of data in the table. Index Possible_keys may be used. Index Key_len index length Rows estimated number of result set records Extra additional information.
2. Use of Profiling
The tool can obtain a Query in the entire execution process of a variety of resource consumption, such as CPU,IO,IPC,SWAP, as well as the occurrence of PAGE FAULTS, CONTEXT SWITCHE, etc., but also can get the location of each function called by MySQL in the source file during the execution of the Query.
1. Enable profiling parameter 1-enable, 0-disable
# enable profiling parameter 1-enable, 0-disable set profiling=1;SHOW VARIABLES LIKE'% profiling%'
2. Then execute a Query
3. Obtain the profiling information saved by the system
Show PROFILES
4. Obtain the details of profile through QUERY_ID (take obtaining CPU and IO as an example)
Show profile cpu, block io for QUERY 7
III. Rational use of the index
1. What is an index
simply put, in a relational database, an index is a separate, physical storage structure that sorts the values of one or more columns in a database table. Just like the catalog of a book, you can quickly find what you need according to the page number in the catalog.
has four main types of indexes in MySQL, namely: B-Tree index, Hash index, FullText index, R-Tree index, the following is mainly about our commonly used B-Tree index, other indexes can find information on their own.
2. The data structure of the index
generally speaking, most of the physical files of the B-Tree index in MySQL are stored in the structure of a balanced tree, that is, all the data that actually needs to be stored are stored in the leaf node of the tree, and the length of the shortest path to any leaf node is exactly the same. The storage engine in MySQL will also be slightly modified. For example, the B-Tree index of the Innodb storage engine actually uses a storage structure of B+Tree, which stores pointer information to adjacent leaf nodes in addition to index key information on each leaf node, in order to speed up the efficiency of retrieving multiple adjacent leaf nodes.
in Innodb, there are two forms of index, one is the clustered primary key index, and the other is the ordinary B-Tree index which is basically the same as other storage engines (such as MyISAM). This kind of index is called secondary index in Innodb storage engine.
In the diagram, the left side is the Primary Key stored in the form of Clustered, and the right side is the normal B-Tree index. Both indexes are exactly the same in terms of root and branch nodes. On the other hand, the leaf nodes are different. In the primary key index, the leaf node stores the actual data of the table, including not only the data of the primary key field, but also the data of other fields. The secondary index is not much different from other ordinary B-Tree indexes, only the leaf node stores not only the relevant information of the index key, but also the primary key value of Innodb.
, therefore, in Innodb, it is very efficient to access data through the primary key, and if the data is accessed through the secondary index, Innodb first retrieves the leaf node through the relevant information of the secondary index and the corresponding index key, and then needs to obtain the corresponding data row through the primary key value stored in the leaf node.
There is little difference between the primary key index and the non-primary key index of the MyISAM storage engine, except that the index key of the primary key index is a unique and non-empty key. And the storage structure of the index of the MyISAM storage engine is basically the same as that of the secondary index of Innodb. The main difference is that the MyISAM storage engine not only stores the index key information on the leaf node, but also stores the information that can directly locate the corresponding data rows in the MyISAM data file (such as Row Number), but does not store the key value information of the primary key.
3. Advantages and disadvantages of index
Advantages: improve the speed of data retrieval and reduce the IO cost of the database
Disadvantages: the query needs to update the index information, which leads to additional resource consumption, and the index also takes up additional storage space.
4. How to judge whether it is necessary to establish an index
mentioned the pros and cons of indexes. We know that indexes are not as many as possible. Indexes can also bring side effects. So how do we tell if an index is needed?
1. Fields that are more frequently used as query criteria should be indexed.
2. Fields that are updated frequently are not suitable for indexing
3. Those with poor uniqueness are not suitable for creating indexes, such as status fields.
4. Fields that do not appear in where are not suitable for index creation.
5. Single index or combined index?
in general application scenarios, as long as not one of the filter fields can filter more than 90% of the data in most scenarios, and other filter fields will be updated frequently, I generally prefer to create a combined index, especially in scenarios with high concurrency. Because of the concurrency, even if we save a small amount of IO consumption for each query, the total amount of resources saved is still very large because of the large amount of execution.
but when we create a composite index, we don't mean that all the fields in the query condition should be placed in one index. We should let one index be used by multiple queries to minimize the number of indexes, so as to reduce the update cost and storage cost.
MySQL provides us with a feature to reduce the optimization of the index itself, and that is the "prefix index". That is, we can use only the first part of a field as the index key to index the field, reducing the space occupied by the index and improving the access efficiency of the index. Of course, prefix indexes are only suitable for fields where prefixes are randomly repeated.
6. The choice of index
1. For single-key indexes, try to filter the best indexes for the current query.
2. When selecting a combined index, the best filtered fields in the current query are arranged as high as possible in the order of the index fields.
3. When selecting a combined index, try to select an index that can contain more fields in the where sentence of the current query.
4. As far as possible, by analyzing the statistical information and adjusting the writing of the query to select the appropriate index to reduce the choice of artificial Hint control index, thinking that the later maintenance cost will be very high.
7. Restrictions on MySQL index
1. The total index key length of MyISAM storage engine cannot exceed 1000 bytes.
2. BLOB and TEXT type fields can only create prefix indexes
3. MySQL does not support functional indexes.
4. Use! = or when the MySQL index cannot be used
5. After the filter field is calculated by function, the MySQL index cannot be used.
6. When the type of near field in jion statement is inconsistent, the MySQL index cannot be used.
7. Use like if it is a pre-match (e.g.'% aaa'), the MySQL index cannot be used.
8. MySQL cannot use HASH index when using non-equivalent query
9. Use ='1' when the character type is a number, not directly = 1
Do not use or. You can use in instead or union all.
8. Join principle and optimization
Join principle: in MySQL, there is only one join algorithm, the famous nested loop, which actually drives the result set of the table as the basic data of the loop, and then one by one through the data in the result set as a filter condition to the next table to query the data, and then merge the results. If there is still near participation, then loop through the previous near result set as the basic data, and then cycle through it, and so on.
Optimization:
1. Minimize the total number of loops in the Join statement (remember the small result set that drives the large result set)
2. Give priority to optimizing the inner cycle
3. Make sure that the Join conditional field on the driven table in the Join statement has been indexed
4. When there is no guarantee that the Join condition fields of the driven table are indexed and the memory resources are sufficient, do not be stingy with the setting of Join buffer (join buffer can only be used on All,index,range)
9. ORDER BY optimization
In MySQL, there are only two types of implementations for ORDER BY:
1. Get the ordered data directly through the ordered index, so that the ordered data required by the client can be obtained without any sort operation.
2. Sort the data returned in the stored engine through the MySQL sorting algorithm, and then return the sorted data to the client.
Using index sorting is the best method, but if there is no index Lin Yong, MySQL mainly implements two algorithms:
1. Take out the fields that meet the filtering criteria for sorting and the row pointer information that can be directly located to the row data, perform the actual sorting operation in Sort Buffer, and then use the sorted data to return the data of other fields requested by the client to the table according to the row pointer information, and then return it to the client.
2. Take out the data of the sorted field and all other fields requested by the client at once according to the filter conditions, and store the fields that do not need to be sorted in a memory area, then sort the sorted field and row pointer information in Sort Buffer, and finally use the sorted row pointer to match the row pointer information stored in the memory area with other fields to merge the result set. And then return it to the client in order.
Compared with the first algorithm, the second algorithm of mainly reduces the secondary access to data. After sorting, there is no need to go back to the table to fetch data, saving the IO operation. Of course, the second algorithm consumes more memory, a typical optimization way of exchanging space for time.
sorts multi-table Join by storing the result set of the previous Join into the temporary table through a temporary table, and then fetching the data of the temporary table into the Sort Buffer for operation.
For non-index sorting, try to choose the second algorithm for sorting by:
1. Increase the parameter setting of max_length_for_sort_data:
MySQL decides which algorithm to use is determined by the parameter max_length_for_sort_data. When the maximum length of the returned field is less than this parameter, MySQL will choose the second algorithm and the first algorithm. So when there is enough memory, increasing this parameter value allows MySQL to choose the second algorithm.
2. Reduce unnecessary return fields
is the same as above. If there are fewer fields, it will be smaller than the max_length_for_sort_data parameter as far as possible.
3. Increase the sort_buffer_size parameter setting:
increases sort_buffer_size not so that MySQL can choose an improved version of the sorting algorithm, but so that MySQL can minimize the segmentation of the data that needs to be sorted during the sorting process, because this will cause MySQL to use temporary tables for interchange sorting.
Fourth, finally
tuning is actually a difficult thing, and tuning is not limited to the query tuning above. Such as table design optimization, database parameter tuning, application tuning (reduce cyclic operation of the database, batch add; database connection pool; cache;) and so on. Of course, there are many tuning techniques that can only be realized in practice. Only by taking theory as the basis, facts as the basis, and constantly trying to improve yourself, can we become a real tuning master.
After reading this article, I believe you have some understanding of "sample Analysis of MySQL query Optimization". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.