How to optimize sql statement in mysql 07/19 Update SLTechnology News&Howtos

How to optimize sql statement in mysql

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to optimize the sql statement in mysql, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Optimization goal

1. Reduce number of IO

IO is always the most likely bottleneck of the database, which is determined by the responsibility of the database. More than 90% of the time in most database operations is occupied by IO operations. Reducing the number of IO is the first priority in SQL optimization. Of course, it is also the most effective optimization method.

two。 Reduce CPU calculation

In addition to the IO bottleneck, what needs to be considered in SQL optimization is the optimization of CPU operations. Order by, group by,distinct... All are large consumers of CPU (these operations are basically data comparison operations in CPU processing memory). When our IO optimization has reached a certain stage, reducing CPU computing has become an important goal of our SQL optimization.

Optimization method

Change the SQL execution plan

After defining the optimization goal, we need to determine the way to achieve our goal. For the SQL statement, there is only one way to achieve the above two goals, and that is to change the execution plan of SQL, let him "take fewer detours" as far as possible, and try to find the data we need through various "shortcuts", so as to achieve the goals of "reducing the number of IO" and "reducing CPU computing".

Common misunderstandings

1.count (1) and count (primary_key) are better than count (*)

In order to count the number of records, many people use count (1) and count (primary_key) instead of count (*), which they think is better, but this is a misunderstanding. For some scenarios, this may result in worse performance, and some special optimizations should be made for the database to count (*) counting operations.

2.count (column) and count (*) are the same.

This misunderstanding is common even in many senior engineers or DBA, and many people will take it for granted. In fact, count (column) and count (*) are completely different operations and represent completely different meanings.

Count (column) is a record that indicates how many column fields are not empty in the result set

Count (*) indicates how many records there are in the entire result set

3.select a dint b from... It's better than select a. B. C. From. It allows the database to access less data.

This misunderstanding mainly exists in a large number of developers, mainly because they do not know much about the storage principle of the database.

In fact, most relational databases are stored in rows (row), while data access operations are based on a fixed-size IO unit (called block or page), usually 4KB, 8KB. Most of the time, multiple rows are stored in each IO cell, and each row stores all the fields of that row (except for special types of fields such as lob).

So, whether we take one field or multiple fields, in fact, the amount of data that the database needs to access in the table is actually the same.

Of course, there are exceptions, that is, our query can be completed in the index, that is to say, when only two fields of aformab are taken, there is no need to return to the table, while the field c is not in the index used, so we need to go back to the table to get its data. In this case, there will be a big difference in the amount of IO between the two.

4.order by must require a sort operation

We know that the index data is actually ordered, and if the data we need is in the same order as an index, and our query is executed through this index, then the database will generally omit the sorting operation and return the data directly, because the database knows that the data has met our sorting needs.

In fact, using indexes to optimize SQL with sorting requirements is a very important optimization means.

Extended reading: there is a more in-depth analysis of the implementation analysis of MySQL ORDER BY, the basic implementation principle of GROUP BY in MySQL and the basic implementation principle of MySQL DISTINCT, especially the first one.

5. If there is filesort in the execution plan, disk files will be sorted.

In fact, we are not to blame for this misunderstanding, but because of the problems with the words used by MySQL developers. Filesort is the information we might see in the "Extra" column when we use the explain command to view the execution plan of a SQL.

In fact, whenever a SQL statement requires a sort operation, it will display "Using filesort", which does not mean that there will be a file sort operation.

basic principle

1. Try to reduce join.

The advantage of MySQL is simplicity, but this is actually its disadvantage in some ways. The MySQL optimizer is efficient, but because of the limited amount of statistical information, the optimizer is more likely to deviate in its working process. For complex multi-table Join, on the one hand, the optimizer is limited, and the effort on Join is not enough, so the performance is still far away from the predecessors of relational databases such as Oracle. However, if it is a simple single-table query, the gap will be very small or even better than these database predecessors in some scenarios.

two。 Sort as little as possible

Sorting operations consume more CPU resources, so reducing sorting can greatly affect the response time of SQL in scenarios where the cache hit rate is high and the IO capability is sufficient.

For MySQL, there are several ways to reduce sorting, such as:

The optimization mentioned in the above misunderstanding is through the use of indexes to sort.

Reduce the number of records participating in sorting

Do not sort the data without necessity

...

3. Try to avoid select *

Many people find it difficult to understand when they see this. Didn't they just say in the misunderstanding that the number of fields in the select clause will not affect the data read?

Yes, most of the time it does not affect the amount of IO, but when we still have order by operations, the fields in the select clause will affect our sorting efficiency to a great extent, which can be described in more detail in my previous article on implementation analysis of MySQL ORDER BY.

In addition, the above misunderstanding also said, but most of the time will not affect the amount of IO, when our query results only need to be found in the index, it will greatly reduce the amount of IO.

4. Replace subqueries with join as much as possible

Although Join performance is not good, it still has a significant performance advantage over MySQL's subqueries. There has been a big problem with the sub-query execution plan of MySQL. Although this problem has existed for many years, it is common in all stable versions that have been released so far and has not been greatly improved. Although officials have long acknowledged the problem and promised to solve it as soon as possible, at least so far we have not seen which version has solved the problem better.

5. Try to reduce or.

When there are multiple conditions coexisting with "or" in the where clause, the optimizer of MySQL does not solve the problem of optimizing its execution plan very well. coupled with the unique hierarchical architecture of SQL and Storage of MySQL, its performance is relatively low, so it is more effective to use union all or union (when necessary) instead of "or".

6. Try to use union all instead of union

The main difference between union and union all is that the former needs to merge two (or more) result sets and then carry out unique filtering operation, which will involve sorting, increasing a large number of CPU operations, and increasing resource consumption and delay. So when we can confirm that duplicate result sets are not possible or don't care about duplicate result sets, try to use union all instead of union.

7. Filter as early as possible

This optimization strategy is actually the most common in the optimization design of indexes (putting more filterable fields higher).

You can also use this principle in SQL writing to optimize some Join SQL. For example, when we query paging data in multiple tables, we'd better be able to filter the data on one table and page it, and then use the paged result set and another table Join, which can reduce unnecessary IO operations as much as possible and greatly save the time consumed by IO operations.

8. Avoid type conversion

The "type conversion" here refers to the type conversion that occurs when the type of the column field is inconsistent with the type of the parameter passed in the where clause:

Convert artificially through conversion function on column_name

It directly causes MySQL (in fact, other databases will have the same problem) that the index cannot be used. If you have to convert, you should convert on the passed parameters.

Conversion by the database itself

If the data type and field type we passed are inconsistent, and we do not do any type conversion processing, MySQL may convert our data by itself, or leave it to the storage engine without processing. In this way, the index cannot be used, resulting in execution plan problems.

9. Priority is given to optimizing SQL with high concurrency rather than performing some "large" SQL with low frequency

For destructive, high concurrency SQL is always greater than low frequency SQL, because if there is a problem with high concurrency SQL, it will crush the system without even giving us any respite. For some SQL which need to consume a lot of IO and respond slowly, due to the low frequency, even if encountered, the most is to let the whole system respond slowly, but at least hold on for a while, so that we have a chance to buffer.

10. Optimize from the overall situation, not one-sided adjustment

SQL optimization should not be carried out alone, but should fully consider all the SQL in the system, especially when optimizing the execution plan of SQL by adjusting the index.

11. Explain every SQL running in the database as much as possible

To optimize SQL, we need to know the execution plan of SQL in order to judge whether there is room for optimization and whether there is a problem of implementation plan. After optimizing the SQL running in the database for a period of time, it is obvious that there are few SQL problems, most of which need to be explored, so a large number of explain operations are needed to collect execution plans and determine whether optimization is needed.

Optimize Group By statement

By default, MySQL sorts all GROUP BY col1,col2,..... The method of querying is like specifying ORDER BY col1,col2,... in the query. If you explicitly include an ORDER BY clause that contains the same column, MySQL can optimize it without slowing down, although it is still sorted. If the query includes GROUP BY but you want to avoid the consumption of sorting results, you can specify ORDER BY NULL to disable sorting.

Optimize Order by statement

In some cases, MySQL can use an index to satisfy the ORDER BY clause without the need for additional sorting. The where condition uses the same index as the order by, and the order of the order by is the same as the order of the index, and the fields of the order by are either ascending or descending.

Optimize insert statement

If you insert many rows from the same customer at the same time, use the INSERT statement of multiple value tables. This is faster than using separate INSERT statements (several times in some cases). Insert into test values (1) 2), (1) 3), (1) 4).

If you insert many rows from different customers, you can get a higher speed by using the INSERT DELAYED statement. Delayed means that the insert statement is executed immediately, but the data is put in the queue in memory and is not actually written to disk; this is much faster than inserting each statement separately; LOW_PRIORITY, on the contrary, inserts only after all other users have finished reading and writing to the table.

Separate index files and data files on different disks (using the options in table creation)

If you insert in bulk, you can increase the speed by increasing the value of the bulk_insert_buffer_size variable, but this can only be used for myisam tables

Use LOAD DATA INFILE when loading a table from a text file. This is usually 20 times faster than using many INSERT statements

Use replace statements instead of insert according to the application

Use the ignore keyword to ignore duplicate records depending on the application.

Insert data in bulk

1. For tables of type Myisam, you can import large amounts of data quickly in the following ways.

ALTER TABLE tblname DISABLE KEYS

Loading the data

ALTER TABLE tblname ENABLE KEYS

These two commands are used to turn on or off updates to non-unique indexes of the Myisam table. When importing a large amount of data into a non-empty Myisam table, you can improve the efficiency of import by setting these two commands. For importing a large amount of data into an empty Myisam table, the default is to import the data before creating the index, so there is no need to set it up.

two。 For Innodb-type tables, this approach does not improve the efficiency of importing data. For tables of type Innodb, we have the following ways to improve the efficiency of import:

a. Because the tables of Innodb type are saved in the order of primary keys, arranging the imported data according to the order of primary keys can effectively improve the efficiency of importing data. If the Innodb table does not have a primary key, the system creates an internal column as the primary key by default, so if you can create a primary key for the table, you can take advantage of this advantage to improve the efficiency of importing data.

b. The efficiency of import can be improved by performing SETUNIQUE_CHECKS= 0 before importing data, turning off uniqueness checking, and performing SETUNIQUE_CHECKS=1 after the end of import.

c. If the application uses autocommit, it is recommended to execute SET AUTOCOMMIT=0 before import, disable autocommit, execute SET AUTOCOMMIT=1 after import is over, and turn on autocommit, which can also improve the efficiency of import.

Optimization of query

You can set low_priority_updates=1 for the read master, lower the priority of writing, and tell MYSQL to deal with the read request first as far as possible.

Optimize your query for query caching

Most MySQL servers have query caching turned on. This is one of the most effective ways to improve performance, and it is handled by MySQL's database engine. When many of the same queries are executed multiple times, the query results are placed in a cache so that subsequent identical queries access the cached results directly without manipulating the table.

The main problem here is that it is easy for programmers to ignore. Because, some of our query statements will cause MySQL not to use caching. Take a look at the following example:

/ / query whether cache is enabled or not

$r = mysql_query ("SELECT username FROM user WHERE signup_date > = CURDATE ()")

/ / enable query caching

$today = date ("Y-m-d")

$r = mysql_query ("SELECT username FROM user WHERE signup_date > ='$today'")

Split large DELETE or INSERT statements

If you need to perform a large DELETE or INSERT query on an online site, you need to be very careful not to stop your entire site from responding. Because these two operations will lock the table, once the table is locked, other operations can not come in.

Apache will have many child processes or threads. Therefore, it works quite efficiently, and our server does not want to have too many child processes, threads and database links, which takes up a lot of server resources, especially memory.

If you lock your table for a period of time, say 30 seconds, then for a site with high traffic, the accumulated access processes / threads, database links, and the number of files opened in those 30 seconds may not only allow you to park the WEB service Crash, but also cause your entire server to crash immediately.

So, if you have a big deal, you must split it, using the LIMIT condition is a good way. Here is an example:

While (1) {

/ / only do 1000 at a time

Mysql_query ("DELETE FROM logs WHERE log_date"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.