What are the knowledge points of mysql database optimization? 04/25 Update SLTechnology News&Howtos

What are the knowledge points of mysql database optimization?

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you what are the knowledge points of mysql database optimization, I hope you will gain something after reading this article, let's discuss it together!

Mysql database optimization

Reduce data access: set reasonable field types, enable compression, reduce disk IO through index access, etc.

Return less data: return only the required fields and data paging processing to reduce disk io and network io

Reduce the number of interactions: batch DML operations, function storage, etc., reduce the number of data connections

Reduce server CPU overhead: minimize database sorting operations and full table queries, reduce cpu memory footprint

Use more resources: using table partitioning can increase parallel operations and make greater use of cpu resources

When it comes to SQL optimization, there are three points:

Maximize the use of index

Avoid full table scanning as much as possible

Reduce queries for invalid data

To understand the principle of SQL optimization, you must first understand the order in which SQL is executed:

SELECT statements-syntax order: 1. SELECT 2. DISTINCT 3. FROM 4. JOIN 5. ON 6. WHERE 7. GROUP BY 8. HAVING 9. ORDER BY 10.LIMIT SELECT statements-execution order:

FROM

# Select tables and turn multiple table data into a single table through Cartesian product.

# filter the virtual table of Cartesian product

JOIN

# specify join, which is used to add data to the virtual table after on, for example, left join will add the remaining data from the left table to the virtual table

WHERE

# filter the above virtual tables

GROUP BY

# grouping

# used for having clause judgment. In writing, this kind of aggregate function is written in having judgment

HAVING

# aggregate and filter the grouped results

SELECT

# the returned single column must be in the group by clause, except for aggregate functions

DISTINCT

# data de-duplication

ORDER BY

# sort

LIMIT

SQL optimization strategy

Disclaimer: the following SQL optimization strategy is suitable for scenarios with a large amount of data. If the amount of data is small, it is not necessary to take this as the basis to avoid icing on the cake.

First, avoid the scene where the index is not taken.

1. Try to avoid fuzzy queries at the beginning of the field, which will cause the database engine to abandon the index for a full table scan. As follows:

SELECT * FROM t WHERE username LIKE'% Chen%'

Optimization: try to use a fuzzy query after the field. As follows:

SELECT * FROM t WHERE username LIKE 'Chen%'

If the requirement is to use a fuzzy query before

Use the MySQL built-in function INSTR (str,substr) to match, which is similar to indexOf () in java. For the corner position of the query string, please see "MySQL Fuzzy query usage (regular, wildcard, built-in function, etc.)"

Use FullText full-text index and search with match against

When the amount of data is large, it is recommended to quote ElasticSearch and solr, which can retrieve hundreds of millions of data in seconds.

When the amount of data in the table is small (thousands of items), don't be fancy and use like'% xx%' directly.

two。 Avoid using in and not in as much as possible, which can cause the engine to run full table scans. As follows:

SELECT * FROM t WHERE id IN (2pm 3)

Optimization method: if it is a continuous value, you can use between instead. As follows:

SELECT * FROM t WHERE id BETWEEN 2 AND 3

If it is a subquery, you can use exists instead. For more information, please see "how to replace in with exists in MySql" as follows:

-- No index select * from A where A.id in (select id from B);-- Index select * from A where exists (select * from B where B.id = A.id)

3. Avoid using or as much as possible, which will cause the database engine to abandon the index for a full table scan. As follows:

SELECT * FROM t WHERE id = 1 OR id = 3

Optimization: union can be used instead of or. As follows:

SELECT * FROM t WHERE id = 1 UNIONSELECT * FROM t WHERE id = 3

4. Try to avoid the judgment of null value, which will cause the database engine to abandon the index for full table scan. As follows:

SELECT * FROM t WHERE score IS NULL

Optimization method: you can add the default value of 0 to the field and judge the value of 0. As follows:

SELECT * FROM t WHERE score = 0

5. Try to avoid the operation of expressions and functions on the left side of the equal sign in the where condition, which will cause the database engine to abandon the index for full table scanning.

You can move expressions and function operations to the right of the equal sign. As follows:

-- full table scan SELECT * FROM T WHERE score/10 = 9 Murray-walking index SELECT * FROM T WHERE score = 10: 9

6. Avoid using the condition of where 1 to 1 when the amount of data is large. Usually in order to facilitate the assembly of query conditions, we will use this condition by default, and the database engine will abandon the index for full table scanning. As follows:

SELECT username, age, sex FROM T WHERE 1 # 1

Optimization method: to judge when assembling sql with code, remove where without where condition, and add and if there is where condition.

7. Query conditions cannot be used or! =

When using index columns as criteria for queries, you need to avoid using judgment conditions such as or! =. If you really need to use a symbol that is not equal to the symbol, you need to re-evaluate the index to avoid establishing an index on this field and replace it with other index fields in the query conditions.

8. The where condition contains only composite index non-prefixed columns

As follows: the composite (federated) index contains three columns of key_part1,key_part2,key_part3, but the SQL statement does not contain the index prefix column "key_part1". According to the leftmost matching principle of the MySQL federated index, it will not take the federated index. For more information, please refer to the principles for the use of the federated index.

Select col1 from table where key_part2=1 and key_part3=2

9. Implicit type conversion causes no index to be used

The following SQL statement cannot walk the index correctly because the index pair column type is varchar, but the given value is a numeric value, which involves implicit type conversion.

Select col1 from table where col_varchar=123

10. The order by condition must be the same as that in where, otherwise order by will not use the index for sorting

-- not age index SELECT * FROM t order by age;-- age index SELECT * FROM t where age > 0 order by age

For the above statements, the order in which the database is processed is:

The first step: generate an execution plan based on where conditions and statistics to get the data.

Step 2: sort the resulting data. When processing data (order by) is executed, the database first looks at the execution plan of the first step to see if the fields of order by take advantage of the index in the execution plan. If so, the already sorted data can be obtained directly by using the index order. If not, reorder the operation.

Step 3: return the sorted data.

When the fields in the order by appear in the where condition, the index is used instead of a secondary sort, or more precisely, the fields in the order by take advantage of the index in the execution plan without sorting operations.

This conclusion is valid not only for order by, but also for other operations that require sorting. Such as group by, union, distinct and so on.

11. Correct use of hint optimization statements

You can use hint in MySQL to specify that the optimizer selects or ignores specific indexes at execution time. Generally speaking, when the table structure index changes caused by the version change, it is more recommended to avoid using hint, but to collect more statistics through Analyze table. However, in certain situations, specifying hint can eliminate other index interference and specify a better execution plan.

USE INDEX after the table name in your query statement, add USE INDEX to provide a list of indexes that you want MySQL to refer to, so that MySQL can no longer consider other available indexes. Example: SELECT col1 FROM table USE INDEX (mod_time, name)...

IGNORE INDEX can use IGNORE INDEX as Hint if it simply wants MySQL to ignore one or more indexes. Example: SELECT col1 FROM table IGNORE INDEX (priority)...

To force MySQL to use a specific index, FORCE INDEX can use FORCE INDEX as Hint in a query. Example: SELECT col1 FROM table FORCE INDEX (mod_time)...

When querying, the database system automatically analyzes the query statement and selects the most appropriate index. However, in many cases, the query optimizer of the database system may not always be able to use the optimal index. If we know how to select an index, we can use FORCE INDEX to force the query to use the specified index. "several very useful SQL sentences in MySQL are given to you" blog post suggests reading, practical information

For example:

SELECT * FROM students FORCE INDEX (idx_class_id) WHERE class_id = 1 ORDER BY id DESC; II. Other optimizations of SELECT statements

1. Avoid select *

First of all, select * operations are not a good SQL writing habit in any type of database.

Using select * to fetch all columns will prevent the optimizer from completing optimizations such as index coverage scanning, which will affect the optimizer's choice of execution plan, increase network bandwidth consumption, and lead to additional Icano, memory and CPU consumption.

It is recommended that the number of columns that the business actually needs, and the column name will be specified in place of select *. For more information, please see "Why does everyone say that SELECT * is inefficient":

two。 Functions that avoid uncertain results

Specifically for business scenarios such as master-slave replication. Because the statements executed by the master library are copied from the slave library in principle, the use of functions with uncertain results, such as now (), rand (), sysdate (), current_user () and so on, can easily lead to data inconsistency between the master library and the slave library. In addition, for functions with uncertain values, the resulting SQL statement cannot take advantage of query cache.

3. When you associate a query with multiple tables, the small table comes first and the large table comes after.

In MySQL, the table association query after from is executed from left to right (on the contrary, Oracle), and the first table will involve full table scanning, so put the small table in front, scan the small table first, and scan the large table quickly and efficiently. After scanning the large table, maybe only the first 100 rows of the large table will meet the return condition and return.

For example: table 1 has 50 pieces of data, Table 2 has 3 billion pieces of data; if table 2 is scanned and you taste it, then go to dinner first.

4. Use the alias of the table

When joining multiple tables in a SQL statement, use the alias of the table and prefix the alias on each column name. In this way, the time of parsing can be reduced and the grammatical errors caused by the ambiguity of friend listing can be reduced.

5. Replace where sentences with HAVING sentences

Avoid using HAVING words, because HAVING only filters the result set after all records have been retrieved, while where brushes records before aggregation, which can be reduced if you can limit the number of records through where words. The conditions in HAVING are generally used to filter aggregate functions, but in addition, conditions should be written in where sentences.

The difference between where and having: group functions cannot be used after where

6. Adjust the connection order in Where sentences

MySQL parses the where clause from left to right and from top to bottom. According to this principle, the condition of filtering more data should be put forward and the result set should be reduced as quickly as possible.

III. Optimization of adding, deleting and modifying DML sentences

1. Insert data in bulk

If you perform a large number of inserts at the same time, it is recommended that you use INSERT statements with multiple values (method 2). This is faster than using separate INSERT statements (method 1). In general, the efficiency of bulk insertion is several times different.

Method 1:

Insert into T values (1pm 2); insert into T values (1pr 3); insert into T values (1pr 4)

Method 2:

Insert into T values (1d2), (1d3), (1d4)

There are three reasons for choosing the latter method.

To reduce the operation of parsing SQL statements, MySQL does not have a share pool similar to Oracle, and adopts method 2, which only needs to be parsed once to insert data.

The number of connections to DB can be reduced in specific scenarios.

The SQL statement is short, which can reduce the IO of network transmission.

two。 Appropriate use of commit

Proper use of commit can release the resources occupied by transactions and reduce consumption. The resources that can be released after commit are as follows:

Undo data blocks occupied by transactions

Blocks of data recorded by a transaction in redo log

Release the transaction imposed to reduce lock contention to affect performance. Especially when you need to use delete to delete a large amount of data, you must break down the deletion amount and commit it regularly.

3. Avoid repeatedly querying updated data

MySQL does not support the UPDATE RETURNING syntax like PostgreSQL, which can be realized by variables in order to update lines and get line-changing information that often occurs in business.

For example, to update the timestamp of a row of records and query what the timestamp is stored in the current record, a simple method is implemented:

Update T1 set time=now () where col1=1; Select time from T1 where id = 1

Using variables, you can rewrite them in the following ways:

Update T1 set time=now () where col1=1 and @ now: = now (); Select @ now

Both need to go back and forth twice, but the use of variables avoids accessing the data table again, especially when the T1 table has a large amount of data, the latter is much faster than the former.

4. Query priority or update (insert, update, delete) priority

MySQL also allows you to change the priority of statement scheduling, which enables queries from multiple clients to better collaborate so that a single client does not have to wait a long time because of locking. Changing the priority also ensures that certain types of queries are processed faster. First of all, we should determine the type of application, determine whether the application is query-based or update-based, whether to ensure query efficiency or update efficiency, and decide whether query priority or update priority. The method of changing the scheduling policy mentioned below is mainly for storage engines that only have table locks, such as MyISAM, MEMROY, and MERGE. For Innodb storage engines, the execution of statements is determined by the order in which row locks are obtained. The default scheduling policies for MySQL are summarized as follows:

1) write operations take precedence over read operations.

2) write operations to a data table can only occur once at a time, and write requests are processed in the order in which they arrive.

3) multiple reads to a data table can be carried out at the same time. MySQL provides several statement modifiers that allow you to modify its scheduling policy:

The LOW_PRIORITY keyword is applied to DELETE, INSERT, LOAD DATA, REPLACE and UPDATE

The HIGH_PRIORITY keyword is applied to SELECT and INSERT statements

The DELAYED keyword applies to INSERT and REPLACE statements.

If the write operation is a LOW_PRIORITY (low priority) request, then the system will not consider it to take precedence over the read operation. In this case, if the second reader arrives while the writer is waiting, then the second reader is allowed to insert in front of the writer. The writer is allowed to start operation only when there are no other readers. This scheduling modification may have a situation in which LOW_PRIORITY write operations are permanently blocked.

The HIGH_PRIORITY (high priority) keyword for SELECT queries is similar. It allows SELECT to insert a waiting write operation before, even if the write operation is higher priority under normal circumstances. Another effect is that high-priority SELECT executes before normal SELECT statements because they are blocked by write operations. If you want all statements that support the LOW_PRIORITY option to be processed at low priority by default, use the-- low-priority-updates option to start the server. By using INSERTHIGH_PRIORITY to raise the INSERT statement to its normal write priority, you can eliminate the effect of this option on a single INSERT statement.

IV. Optimization of query conditions

1. For complex queries, you can use intermediate temporary tables to temporarily store data

two。 Optimize group by statement

By default, MySQL sorts all the values of the GROUP BY grouping, such as a "GROUP BY col1,col2,....;" query like specifying "ORDER BY col1,col2,...;" in a query if it explicitly includes an ORDER BY clause that contains the same column, MySQL can optimize it without slowing down, although it is still sorting.

Therefore, if the query includes GROUP BY and you do not want to sort the grouped values, you can specify ORDER BY NULL to disable sorting. For example:

SELECT col1, col2, COUNT (*) FROM table GROUP BY col1, col2 ORDER BY NULL

3. Optimize join statement

In MySQL, you can use SELECT statements to create a single-column query result through a subquery, and then use this result as a filter condition in another query. Using subqueries can accomplish many SQL operations that logically require multiple steps to complete at one time, avoid transaction or table locking, and are easy to write. However, in some cases, subqueries can be JOIN more efficiently. Replace.

Example: suppose you want to take out all users who do not have an order record, you can complete it with the following query:

SELECT col1 FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo)

If you use connection (JOIN).. To complete this query, the speed will be improved. Especially when there is an index on CustomerID in the salesinfo table, the performance will be better, as shown in the query:

SELECT col1 FROM customerinfo LEFT JOIN salesinfoON customerinfo.CustomerID=salesinfo.CustomerID WHERE salesinfo.CustomerID IS NULL

Connect (JOIN).. It is more efficient because MySQL does not need to create temporary tables in memory to complete this logically two-step query.

4. Optimize union query

MySQL executes union queries by creating and populating temporary tables. It is recommended that you use union all unless you really want to eliminate duplicate lines. The reason is that without the keyword all, MySQL will add the distinct option to the temporary table, which will result in a uniqueness check on the data of the entire temporary table, which is quite expensive.

Efficient:

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 UNION ALL SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST'

Inefficient:

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 UNION SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST'

5. Split a complex SQL into multiple small SQL to avoid large transactions

Simple SQL is easy to use MySQL's QUERY CACHE.

Reduce table locking time, especially for tables using the MyISAM storage engine

Multicore CPU can be used.

6. Use truncate instead of delete

When deleting records in the whole table, the operation using the delete statement will be recorded in the undo block, and the delete record will also record the binlog. When it is confirmed that the whole table needs to be deleted, it will generate a lot of binlog and occupy a lot of undo data blocks, which is not very efficient but also takes up a lot of resources.

With truncate substitution, no recoverable information is recorded and the data cannot be recovered. Therefore, the use of truncate operations has its very small resource consumption and extremely fast time. In addition, you can use truncate to recover the water level of the table and return the value of the self-increment field to zero.

7. Use reasonable paging methods to improve paging efficiency

Use a reasonable paging way to improve the paging efficiency according to the paging needs such as presentation, the appropriate paging method can improve the efficiency of paging.

Case 1:

Select * from t where thread_id = 10000 and deleted = 0 order by gmt_create asc limit 0,15

The above example is returned by sorting all the fields at once according to the filter criteria. Data access cost = index IO+ index the table data IO corresponding to all recorded results. Therefore, the more this method is turned to the back, the worse the execution efficiency is and the longer the time is, especially when the amount of data in the table is very large.

Applicable scenarios: when the intermediate result set is small (less than 10000 rows) or the query conditions are complex (involving multiple different query fields or multi-table joins).

Case 2:

Select t.* from (select id from t where thread_id = 10000 and deleted = 0 order by gmt_create asc limit 0,15) a, t where a.id = t.id

The above example must satisfy that the primary key of the t table is an id column and has an overlay index secondary key: (thread_id, deleted, gmt_create). According to the filter conditions, the overlay index is used to extract the primary key id for sorting, and then the join operation is performed to extract other fields. Data access cost = IO of the table data corresponding to the paged result of the index IO+ index (15 rows in the example). Therefore, each page turn consumes basically the same resources and time, just like turning the first page.

Applicable scenario: when the query and sorting fields (that is, the fields involved in the where clause and the order by clause) have corresponding overlay indexes, and the intermediate result set is very large.

V. Optimization of table building

1. Index in the table, giving priority to the fields used by where and order by.

two。 Try to use numeric fields (such as gender, male: 1 female: 2), and try not to design character fields that contain only numeric information, which will reduce the performance of queries and connections, and increase storage overhead.

This is because the engine compares each character in the string one by one when processing queries and connections, while for numeric types, it only needs to be compared once.

3. Querying tables with a large amount of data will cause the query to be slow. The main reason is that there are too many scan lines. At this time, you can use the program, segment paging for query, loop traversal, merge the results to display. To query data from 100000 to 100050, it is as follows:

SELECT * FROM (SELECT ROW_NUMBER () OVER (ORDER BY ID ASC) AS rowid,* FROM infoTab) t WHERE t.rowid > 100000 AND t.rowid

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.