In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail the example analysis of SQL optimization for MySQL database performance optimization. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.
Note: this article is based on MySQL. A lot of content is applicable to other relational databases, and some indexing knowledge is required.
Optimization goal
1. Reduce number of IO
IO is always the most likely bottleneck of the database, which is determined by the responsibility of the database. More than 90% of the time in most database operations is occupied by IO operations. Reducing the number of IO is the first priority in SQL optimization. Of course, it is also the most effective optimization method.
two。 Reduce CPU calculation
In addition to the IO bottleneck, what needs to be considered in SQL optimization is the optimization of CPU operations. Order by, group by,distinct... All are large consumers of CPU (these operations are basically data comparison operations in CPU processing memory). When our IO optimization has reached a certain stage, reducing CPU computing has become an important goal of our SQL optimization.
Optimization method
Change the SQL execution plan
After defining the optimization goal, we need to determine the way to achieve our goal. For the SQL statement, there is only one way to achieve the above two goals, and that is to change the execution plan of SQL, let him "take fewer detours" as far as possible, and try to find the data we need through various "shortcuts", so as to achieve the goals of "reducing the number of IO" and "reducing CPU computing".
Common misunderstandings
1.count (1) and count (primary_key) are better than count (*)
In order to count the number of records, many people use count (1) and count (primary_key) instead of count (*), which they think is better, but this is a misunderstanding. For some scenarios, this may result in worse performance, and some special optimizations should be made for the database to count (*) counting operations.
2.count (column) and count (*) are the same.
This misunderstanding is common even in many senior engineers or DBA, and many people will take it for granted. In fact, count (column) and count (*) are completely different operations and represent completely different meanings.
Count (column) is a record that indicates how many column fields are not empty in the result set
Count (*) indicates how many records there are in the entire result set
3.select a dint b from... It's better than select a. B. C. From. It allows the database to access less data.
This misunderstanding mainly exists in a large number of developers, mainly because they do not know much about the storage principle of the database.
In fact, most relational databases are stored in rows (row), while data access operations are based on a fixed-size IO unit (called block or page), usually 4KB, 8KB. Most of the time, multiple rows are stored in each IO cell, and each row stores all the fields of that row (except for special types of fields such as lob).
So, whether we take one field or multiple fields, in fact, the amount of data that the database needs to access in the table is actually the same.
Of course, there are exceptions, that is, our query can be completed in the index, that is to say, when only two fields of aformab are taken, there is no need to return to the table, while the field c is not in the index used, so we need to go back to the table to get its data. In this case, there will be a big difference in the amount of IO between the two.
4.order by must require a sort operation
We know that the index data is actually ordered, and if the data we need is in the same order as an index, and our query is executed through this index, then the database will generally omit the sorting operation and return the data directly, because the database knows that the data has met our sorting needs.
In fact, using indexes to optimize SQL with sorting requirements is a very important optimization means.
Extended reading: there is a more in-depth analysis of the implementation analysis of MySQL ORDER BY, the basic implementation principle of GROUP BY in MySQL and the basic implementation principle of MySQL DISTINCT, especially the first one.
5. If there is filesort in the execution plan, disk files will be sorted.
In fact, we are not to blame for this misunderstanding, but because of the problems with the words used by MySQL developers. Filesort is the information we might see in the "Extra" column when we use the explain command to view the execution plan of a SQL.
In fact, whenever a SQL statement requires a sort operation, it will display "Using filesort", which does not mean that there will be a file sort operation.
Extended reading: understand the filesort in the output of the MySQL Explain command, which I introduce in more detail here.
basic principle
1. Try to reduce join.
The advantage of MySQL is simplicity, but this is actually its disadvantage in some ways. The MySQL optimizer is efficient, but because of the limited amount of statistical information, the optimizer is more likely to deviate in its working process. For complex multi-table Join, on the one hand, the optimizer is limited, and the effort on Join is not enough, so the performance is still far away from the predecessors of relational databases such as Oracle. However, if it is a simple single-table query, the gap will be very small or even better than these database predecessors in some scenarios.
two。 Sort as little as possible
Sorting operations consume more CPU resources, so reducing sorting can greatly affect the response time of SQL in scenarios where the cache hit rate is high and the IO capability is sufficient.
For MySQL, there are several ways to reduce sorting, such as:
The optimization mentioned in the above misunderstanding is through the use of indexes to sort.
Reduce the number of records participating in sorting
Do not sort the data without necessity
...
3. Try to avoid select *
Many people find it difficult to understand when they see this. Didn't they just say in the misunderstanding that the number of fields in the select clause will not affect the data read?
Yes, most of the time it does not affect the amount of IO, but when we still have order by operations, the fields in the select clause will affect our sorting efficiency to a great extent, which can be described in more detail in my previous article on implementation analysis of MySQL ORDER BY.
In addition, the above misunderstanding also said, but most of the time will not affect the amount of IO, when our query results only need to be found in the index, it will greatly reduce the amount of IO.
4. Replace subqueries with join as much as possible
Although Join performance is not good, it still has a significant performance advantage over MySQL's subqueries. There has been a big problem with the sub-query execution plan of MySQL. Although this problem has existed for many years, it is common in all stable versions that have been released so far and has not been greatly improved. Although officials have long acknowledged the problem and promised to solve it as soon as possible, at least so far we have not seen which version has solved the problem better.
5. Try to reduce or.
When there are multiple conditions coexisting with "or" in the where clause, the optimizer of MySQL does not solve the problem of optimizing its execution plan very well. coupled with the unique hierarchical architecture of SQL and Storage of MySQL, its performance is relatively low, so it is more effective to use union all or union (when necessary) instead of "or".
6. Try to use union all instead of union
The main difference between union and union all is that the former needs to merge two (or more) result sets and then carry out unique filtering operation, which will involve sorting, increasing a large number of CPU operations, and increasing resource consumption and delay. So when we can confirm that duplicate result sets are not possible or don't care about duplicate result sets, try to use union all instead of union.
7. Filter as early as possible
This optimization strategy is actually the most common in the optimization design of indexes (putting more filterable fields higher).
You can also use this principle in SQL writing to optimize some Join SQL. For example, when we query paging data in multiple tables, we'd better be able to filter the data on one table and page it, and then use the paged result set and another table Join, which can reduce unnecessary IO operations as much as possible and greatly save the time consumed by IO operations.
8. Avoid type conversion
The "type conversion" here refers to the type conversion that occurs when the type of the column field is inconsistent with the type of the parameter passed in the where clause:
Convert artificially through conversion function on column_name
It directly causes MySQL (in fact, other databases will have the same problem) that the index cannot be used. If you have to convert, you should convert on the passed parameters.
Conversion by the database itself
If the data type and field type we passed are inconsistent, and we do not do any type conversion processing, MySQL may convert our data by itself, or leave it to the storage engine without processing. In this way, the index cannot be used, resulting in execution plan problems.
9. Priority is given to optimizing SQL with high concurrency rather than performing some "large" SQL with low frequency
For destructive, high concurrency SQL is always greater than low frequency SQL, because if there is a problem with high concurrency SQL, it will crush the system without even giving us any respite. For some SQL which need to consume a lot of IO and respond slowly, due to the low frequency, even if encountered, the most is to let the whole system respond slowly, but at least hold on for a while, so that we have a chance to buffer.
10. Optimize from the overall situation, not one-sided adjustment
SQL optimization should not be carried out alone, but should fully consider all the SQL in the system, especially when optimizing the execution plan of SQL by adjusting the index.
11. Explain every SQL running in the database as much as possible
To optimize SQL, we need to know the execution plan of SQL in order to judge whether there is room for optimization and whether there is a problem of implementation plan. After optimizing the SQL running in the database for a period of time, it is obvious that there are few SQL problems, most of which need to be explored, so a large number of explain operations are needed to collect execution plans and determine whether optimization is needed.
This is the end of this article on "sample Analysis of SQL Optimization for MySQL Database performance Optimization". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.