In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "query optimization methods for MySQL performance tuning". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
First, the reason for the slow query
1. Network
2 、 CPU
3 、 IO
4. Context switching
5. System call
6. Generate statistical information
7. Lock waiting time
Second, optimize data access
1. The main reason for the poor query performance is that too much data is accessed, and some queries inevitably need to filter a large amount of data. We can optimize it by reducing the amount of data accessed.
(1) confirm that the application is retrieving more data than is needed.
(2) confirm whether the mysql server layer is analyzing a large number of data rows that are more than needed.
2. Whether the unwanted data is requested from the database
(1) query unwanted records (we often mistakenly think that mysql will only return the needed data, but in fact mysql first returns all the results and then calculates. In daily development habits, we often use select statements to query a large number of results, and then get the previous N rows and then close the result set. The optimization method is to add limit after the query)
(2) return all columns when multiple tables are associated (select * from actor inner join film_actor using (actor_id) inner join film using (film_id) where film.title='Academy Dinosaur';select actor.* from actor...;)
(3) always take out all the columns (in the company's enterprise requirements, the use of select * is prohibited. Although this approach can simplify development, it will affect the performance of the query, so try not to use it.)
(4) query the same data repeatedly (if you need to execute the same query repeatedly and return exactly the same data each time, therefore, based on such an application scenario, we can cache this part of the data. This can improve the efficiency of the query. )
III. Optimization of the implementation process
1. Query cache
Before parsing a query statement, if the query cache is open, mysql will first check whether the query hits the data in the query cache. If the query happens to hit the query cache, it will check the user permissions before returning the results. If there is no problem with the permissions, mysql will skip all stages and directly get the results from the cache and return them to the client.
2. Query optimization processing
After querying the cache, mysql will go through the following steps: parsing the SQL, preprocessing, and optimizing the SQL execution plan. Any errors in these steps may terminate the query.
(1) Syntax parser and preprocessing
Mysql parses SQL statements through keywords and generates a parsing tree. The mysql parser will use MySQL syntax rules to validate and parse queries, such as verifying whether incorrect keywords are used or in the correct order, and so on. The preprocessor will further check whether the parsing tree is legal, such as table and column names, ambiguity, permissions, and so on.
(2) query optimizer
When there is no problem with the syntax tree, it is necessary for the optimizer to convert it into an execution plan. A query statement can be executed in many ways, and the corresponding results can be obtained in the end. But the efficiency brought by different execution methods is different, and the main purpose of the optimizer is to choose the most effective execution plan.
Mysql uses a cost-based optimizer that tries to predict the cost of a query when it uses a query plan and selects the one with the lowest cost.
A, select count (*) from film_actor; show status like 'last_query_cost'
You can see that this query requires about 1104 data pages to find the corresponding data, which is calculated by a series of statistical information.
(a) the number of pages per table or index
(B) cardinality of the index
(C) length of indexes and data rows
(d) Distribution of the index
B. In many cases, mysql will choose the wrong execution plan for the following reasons:
(a) inaccurate statistics (InnoDB cannot maintain accurate statistics on the number of rows in a data table because of its mvcc architecture)
(B) the cost estimate of the execution plan is not equal to the actual cost of execution (sometimes the cost of an execution plan is lower although it needs to read more pages, because if these pages are read sequentially or if these pages are already in memory, then its access cost will be very small, and the mysql level does not know which pages are in memory and which are on disk. So it is impossible to know how many times IO is needed during the execution of the query)
(C) the optimization of mysql may not be what you think (the optimization of mysql is based on the cost model, but it may not be the fastest optimization)
(d) mysql does not consider other queries executed concurrently
(e) mysql does not consider operating costs beyond its control (the cost of executing stored procedures or user-defined functions)
C. Optimization strategy of optimizer
(a) static optimization (directly analyze the parsing tree and complete the optimization)
(B) dynamic optimization (dynamic optimization is related to the context of the query and may also be related to the number of rows corresponding to values and indexes)
(C) mysql only needs to optimize the query statically once, but dynamic optimization needs to be re-evaluated each time it is executed.
D. Optimization type of optimizer
(a) redefine the order of the associated tables (data tables are not always associated in the order specified in the query, which is an important function of the optimizer when determining the association order)
(B) convert the outer connection to the inner connection, and the efficiency of the inner connection is higher than the outer connection.
(C) using equivalent transformation rules, mysql can use some equivalent changes to simplify and plan expressions
(d) optimize count (), min (), max () (whether indexes and columns can be empty usually help mysql optimize such expressions: for example, to find the minimum value of a column, you only need to query the leftmost record of the index, without the need for full-text scan comparison)
(e) estimate and convert to a constant expression, and when mysql detects that an expression can be converted to a constant, it always treats the expression as a constant. (explain select film.film_id,film_actor.actor_id from film inner join film_actor using (film_id) where film.film_id = 1)
(F) Index override scanning, which can be used when the columns in the index contain columns that are needed in all queries.
(G) subquery optimization (mysql can transform subqueries into a more efficient form in some cases, thereby reducing multiple queries' access to data, such as caching frequently queried data. )
(h) equivalent propagation (if the values of two columns are associated by equality, then mysql can pass the where condition of one column to the other:
Explain select film.film_id from film inner join film_actor using (film_id) where film.film_id > 500
Here, the film_id field is used for equivalent association, and the film_id column applies not only to the film table but also to the film_actor table.
Explain select film.film_id from film inner join film_actor using (film_id) where film.film_id > 500 and film_actor.film_id > 500;)
E, associated query
The association query of mysql is very important, but in fact, the strategy of association query execution is relatively simple: mysql performs a nested loop association operation on any association, that is, mysql first loops out a single piece of data in one table, then nests it into the next table to find matching rows, and then goes on until it finds matching rows in all tables. Then the columns needed in the query are returned based on the rows matched by each table. Mysql tries to find all matching rows in the last associated table, and if the last associated table cannot find more rows, mysql returns to the upper-level associated table to see if more matching records can be found. The overall idea is like this, but it should be noted that there are several variants in the actual implementation process:
F, sorting optimization
Sorting is a costly operation anyway, so from a performance point of view, you should avoid sorting or sorting large amounts of data as much as possible.
It is recommended to use the index for sorting, but when the index cannot be used, mysql needs to sort it itself, if the amount of data is small, it is done in memory, and if the amount of data is large, you need to use disk, which is called filesort in mysql.
If the amount of data to be sorted is less than the sort buffer (show variables like'% sort_buffer_size%') Mysql uses memory for quick sorting. If there is not enough memory for sorting, mysql will first divide the tree into blocks, sort each independent block using quick sort, store the sorting results of each block on disk, then merge the sorted blocks, and finally return the sorting result. The following is the sorting algorithm:
(a) sort two transmissions
The first time of data reading is to read out the fields that need to be sorted, and then to sort them, and the second time is to read the rows of data as needed.
This method is relatively inefficient, because the second time to read the data has been sorted, need to read all the records, at this time is more random IO, the cost of reading data will be higher
The advantage of two transmissions is to store as little data as possible when sorting, so that the sort buffer can hold as many rows as possible for sorting operations.
(B) single transmission sorting
First read all the columns needed by the query, then sort them according to the given columns, and finally return the sorting results directly. This method only requires sequential IO to read all the data once without any random IO. The problem is that when there are a lot of columns in the query, it will take up a lot of storage space and cannot store a large amount of data.
(C) how to choose
When the total size of the column to be sorted exceeds the bytes defined by max_length_for_sort_data, mysql will choose double sort and use single sort instead. Of course, the user can set the value of this parameter to choose how to sort.
Fourth, optimize specific types of queries
1. Optimize count () query
Count () is a special function that has two different functions, one is the number of values in a column, and the other is the number of rows.
(1) some people always think that the count function of myisam is relatively fast, which is a prerequisite, and only count (*) without any where condition is relatively fast.
(2) use approximate values
In some application scenarios, exact values are not required. Reference can be made to use approximate values instead. For example, explain can be used to obtain approximate values.
In fact, in many OLAP applications, it is necessary to calculate the cardinality of a column value, and there is an algorithm called hyperloglog to calculate the approximate value.
(3) more complex optimization
In general, count () needs to scan a large number of rows to get accurate data, but it is difficult to optimize. In practice, you can consider using index override scanning, or adding summary tables, or adding external caching systems.
2. Optimize the associated query
(1) make sure there is an index on the column in the on or using clause, and the order of associations should be taken into account when creating the index
When table An and table B are associated with column C, if the association order of the optimizer is B and A, then there is no need to build an index on the corresponding column of table B. the index that is not used will only bring additional burden. In general, you only need to create an index on the corresponding column of the second table in the association order.
(2) make sure that expressions in any groupby and order by involve only the columns in a table, so that mysql can use indexes to optimize the process.
3. Optimize subquery
The most important optimization recommendation for subquery optimization is to use associative queries instead of
4. Optimize limit paging
In many application scenarios, we need to paginate the data, usually using limit plus offset and adding appropriate orderby clauses. If this method has the help of indexing, the efficiency is usually good, otherwise a large number of file sorting operations are required. In another case, when the offset is very large, most of the previous data will be discarded, which is too expensive.
To optimize this query, you can either limit the number of pages in the page or optimize the performance of large offsets.
The easiest way to optimize such queries is to use overlay indexes as much as possible, rather than querying all columns.
Select film_id,description from film order by title limit 50pr 5 explain select film.film_id,film.description from film inner join (select film_id from film order by title limit 50m 5) as lim using (film_id)
5. Optimize union query
Mysql always executes union queries by creating and populating temporary tables, so many optimization strategies are not well used in union queries. It is often necessary to manually push down where, limit, order by and other clauses into each subquery so that the optimizer can make full use of these conditions for optimization.
Unless you really need the server to eliminate duplicate rows, be sure to use union all, so there is no all keyword, and mysql adds the distinct keyword to the temporary table when querying, which is expensive.
6. it is recommended to use user-defined variables
User-defined variables are an easy-to-forget mysql feature, but if you can use them well, you can write very efficient query statements in some scenarios, and custom variables can be very useful when you mix procedural and relational logic in queries.
A user-defined variable is a temporary container for storing content that exists throughout the connection to the mysql.
(1) the use of custom variables
Set @ one: = 1set @ min_actor: = (select min (actor_id) from actor) set @ last_week: = current_date-interval 1 week
(2) restrictions on custom variables
A. Cannot use query cache
B. you cannot use custom variables, such as table names, column names, or limit clauses, where constants or identifiers are used
C. The life cycle of user-defined variables is valid in a connection, so they cannot be used for communication between connections.
D. Types of custom variables cannot be explicitly declared
E. The mysql optimizer may optimize these variables in some scenarios, which may cause the code not to run as expected
F, assignment symbol: the priority of = is very low, so you should explicitly use parentheses when using assignment expressions.
G. the use of undefined variables does not produce any syntax errors.
(3) use cases of custom variables
A, optimize the ranking statement
Use a variable while assigning a value
Select actor_id,@rownum:=@rownum+1 as rownum from actor limit 10
Query to get the top 10 actors who have acted in the most movies, and then rank them according to the number of movies they have starred in.
Select actor_id,count (*) as cnt from film_actor group by actor_id order by cnt desc limit 10
B. Avoid re-querying the newly updated data
When you need to update the timestamp of a record efficiently, and want to query what the timestamp stored in the current record is.
Update T1 set lastUpdated=now () where id = 1 * select lastUpdated from T1 where id = 1 * * update T1 set lastupdated = now () where id = 1 and @ now:=now (); select @ now
C. Determine the order of values
It may be at different stages of the query when assigning and reading variables
(a) set @ rownum:=0
Select actor_id,@rownum:=@rownum+1 as cnt from actor where @ rownum
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
SQL > @ awrsqrpt.sqlCurrent Instance~DB Id DB Name Inst Num Instance--
© 2024 shulou.com SLNews company. All rights reserved.