Homemade gadgets greatly accelerate the optimization of MySQL SQL statements (with source code) 07/06 Update SLTechnology News&Howtos

Homemade gadgets greatly accelerate the optimization of MySQL SQL statements (with source code)

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction

Optimizing SQL is one of the common tasks of DBA. How to optimize a statement efficiently and quickly is a problem that every DBA often faces. In the daily optimization work, I find that there are many operations that are essential steps in the optimization process. However, the repetitive implementation of these steps will cost DBA a lot of energy. So I came up with the idea of writing my own gadgets to improve the efficiency of optimization.

So what language do you choose to develop tools?

For a DBA, it is necessary to master a language to match his work. Compared with the simplicity of shell and the elegance of perl, Python is a rigorous high-level language. It has many advantages, such as quick start, simple syntax, rich expansion, cross-platform and so on. Many people call it a "glue" language, through a large number of rich class libraries, modules, you can quickly build the tools you need.

As a result, this gadget became my first assignment to learn Python, and I call it the "MySQL statement optimization aid." And from then on, I fell in love with Python and developed a lot of database-related gadgets, which I have the opportunity to introduce to you later.

I. means and steps of optimization

Before introducing the use of the tool, we first explain the common means, methods and problems that should be paid attention to in sentence optimization in MySQL. This is also what we need to understand and master in the daily manual optimization.

1. Execute the plan-EXPLAIN command

The execution plan is the main entry point of sentence optimization, and the execution process of the statement is understood through the interpretation of the execution plan. In terms of execution plan generation, MySQL is significantly different from Oracle in that it does not cache the execution plan and performs "hard parsing" each time. The way to see the execution plan is to use the EXPLAIN command.

1) basic usage

EXPLAIN QUERY

When the keyword EXPLAIN is used before a Select statement, MySQL explains how the Select statement is about to be run, showing how the tables are joined, the order in which they are joined, and so on.

EXPLAIN EXTENDED QUERY

When using the EXTENDED keyword, EXPLAIN generates additional information that can be browsed with SHOW WARNINGS. This information shows what the optimizer qualifies the table and column names in the SELECT statement, what the SELECT statement looks like after rewriting and executing the optimization rules, and may include other comments about the optimization process. It can be used in both MySQL5.0 and newer versions, and it adds an additional filter column (filtered) in MySQL5.1.

EXPLAIN PARTITIONS QUERY

Displays the data shards to be accessed by the query-if any. It can only be used in MySQL5.1 and newer versions.

EXPLAIN FORMAT=JSON (what's new in 5.6)

Another format displays the execution plan. You can see information such as how tables are associated.

2) output field

Let's explain the field meaning of the EXPLAIN output and learn how to judge an execution plan.

The sequence number of the query in the execution plan selected by MySQL. If there is no subquery and so on in the statement, there is only one SELECT in the entire output, so that each row displays a 1 on this column. If subqueries, collection operations, temporary tables, and so on are used in the statement, it will bring a lot of complexity to the ID column. As in the example above, the WHERE part uses a subquery whose id=2 row represents an associated subquery.

Select_type

The type of query used by the statement. Whether it is a simple SELECT or a complex SELECT (if the latter, show which complex type it belongs to). The following types of tags are commonly used.

DEPENDENT SUBQUERY

The first SELECT in the inner layer of the subquery depends on the result set of the external query.

DEPENDENT UNION

The UNION in the subquery, and all subsequent SELECT in the UNION starting from the second SELECT, also depends on the result set of the external query.

PRIMARY

The outermost query in the subquery, note that it is not a primary key query.

SIMPLE

Queries other than subqueries or UNION.

SUBQUERY

The first SELECT of the inner query of the subquery, and the result does not depend on the external query result set.

UNCACHEABLE SUBQUERY

Subqueries for which the result set cannot be cached.

UNION

The second SELECT in the UNION statement starts all subsequent SELECT, and the first SELECT is PRIMARY.

UNION RESULT

The result of the merge in UNION. The SELECT that gets the result from the UNION temporary table.

DERIVED

Derived table queries (subqueries in the FROM clause). MySQL recursively executes these subqueries and puts the results in a temporary table. Internally, the server refers to it as a "derived table" because temporary tables are derived from subqueries.

Table

The name of the table in the database accessed in this step or an alias table specified in the SQL statement. This value may be a table name, an alias for the table, or an identifier for a temporary table generated for the query, such as a derived table, subquery, or collection.

Type

The access method of the table. The following lists the different types of table joins, from the best to the worst.

System

The system table has only one row of records. This is a special case of the const table join type.

Const

Read constant, there is only one row of matching records at most. Since there is only one row of records, the field value of the row record in the optimizer can be treated as a constant value. Const is used in situations where there is a fixed value comparison with PRIMARY KEY or UNIQUE indexes.

Eq_ref

There will be at most one match, usually accessed through a primary key or a unique key index. A row of records is read from the table to join with the records read from the previous table. Unlike the const type, this is the best connection type. It is used in all parts of the index for joins and the index is of type PRIMARY KEY or UNIQUE. Eq_ref can be used to retrieve fields when making "=" comparisons. The value of the comparison can be a fixed value or an expression, and the fields in the table can be used in the representation, which are ready before reading the table.

Ref

The query in the JOIN statement that drives the table index reference. All records in the table that match the retrieval values are taken out and combined with the records taken from the previous table. Ref is used by the linker to use the leftmost prefix of the key or when the key is not a PRIMARY KEY or UNIQUE index (in other words, the linker cannot get only one record based on the key value). This is a good connection type when only a few matching records are queried based on the key value. Ref can also be used when retrieving fields are compared using the "=" operator.

Ref_or_null

The only difference from ref is to add a null value to the query referenced by the index. This connection type is similar to ref, except that MySQL searches for additional records that contain null values when retrieving. The optimization of this join type starts with MySQL 4.1.1 and is often used in subqueries.

Index_merge

Use two (or more) indexes at the same time in the query, then merge the index results, and then read the table data. This connection type means that the Index Merge optimization method is used.

Unique_subquery

The combination of the returned result fields in the subquery is the primary key or the only constraint.

Index_subquery

The returned result field combination in a subquery is an index (or index combination), but not a primary key or unique index. This connection type is similar to unique_subquery. It replaces IN with a subquery, but it is used when there is no unique index in the subquery.

Range

Index range scan. Only records in a given range are taken out and an index is used to obtain a record.

Index

Full index scan. The connection type is the same as ALL, except that it only scans the index tree. It is usually faster than ALL because index files are usually smaller than data files. MySQL uses this join type in cases where the field knowledge of the query is part of a separate index.

Fulltext

Full-text index scan.

All

Full table scan.

Possible_keys

This field refers to which index MySQL might use when searching for table records. If no index is available, it will be displayed as null.

Key

The index selected by the query optimizer from the possible_keys. The key field shows the index that MySQL actually uses. When no indexes are used, the value of this field is NULL.

Key_len

The key length of the index selected to use the index. The key_len field shows the length of the index used by MySQL. When the value of the key field is NULL, the length of the index is NULL.

Ref

Whether the list is filtered by a constant or a field in a table. The ref field shows which fields or constants are used to query records from the table in conjunction with key.

Rows

This field shows the number of result set records estimated by the query optimizer based on the statistics collected by the system.

Extra

This field displays additional information about the MySQL in the query.

Filtered

This column is newly added to MySQL5.1 and appears only when EXPLAIN EXTENDED is used. It shows a pessimistic estimate of the percentage of records in the table that meet a condition (WHERE clause or join condition).

3) SQL rewriting

In addition to displaying the execution plan, EXPLAIN can also display SQL rewriting. The so-called SQL rewriting means that before MySQL optimizes the SQL statement, it will rewrite the statement based on some principles in order to facilitate the optimizer to generate a better execution plan. This feature is used in conjunction with EXPLAIN EXTENDED+SHOW WARNINGS. Here is an example to illustrate.

From the above example, you can see that the in subquery in the original statement is rewritten as a way of associating between tables.

2. Statistical information

Viewing statistics is also an essential step in optimizing statements. Through statistical information, you can quickly understand the storage characteristics of objects. The two main types of statistics-tables and indexes-are described below.

1) Table Statistics-SHOW TABLE STATUS

Name: table name

Engine: the type of storage engine for the table (ISAM, MyISAM, or InnoDB)

Row_format: row storage format (Fixed- fixed, Dynamic- dynamic, or Compressed- compressed) Rows: number of rows. In some storage engines, such as MyISAM and ISAM, they store the exact number of records. However, in other storage engines, it may just be an approximation. Avg_row_length: average row length. Data_length: the length of the data file. Max_data_length: the maximum length of the data file. Index_length: the length of the index file. Data_free: number of bytes allocated but not used. Auto_increment: the next autoincrement (auto plus 1) value. Create_time: the time when the table was created. Update_time: the time when the data file was last updated. Check_time: the last time a check was run on the table. Update after executing the mysqlcheck command, valid only for MyISAM. Create_options: an extra option for CREATE TABLE. Comment: comments used when creating tables (or why MySQL cannot access some information about table information). Version: the '.frm' file version number of the datasheet. Collation: the character set and correction character set of the table. Checksum: real-time checksum value, if any. 3. Index statistics-SHOW INDEX

Table: table name.

Non_unique:0, if the index cannot contain duplicates.

Key_name: index name

Seq_in_index: the column sequence number in the index, starting with 1.

Column_name: column name.

Collation: how columns are sorted in the index. In MySQL, this can have a value of A (ascending order) or NULL (unsorted).

Cardinality: the number of unique values in the index.

Sub_part: if the column is only partially indexed, the number of indexed characters. When the entire field is indexed, its value is NULL.

Packed: indicates how the key is compressed, and NULL indicates that there is no compression.

Null: when the record that includes NULL in the field is YES, its value is, and vice versa.

Index_type: which indexing algorithm is used (BTREE, FULLTEXT, HASH, RTREE).

Comment: remarks.

System parameters: system parameters will also affect the execution efficiency of the statement. To view the system parameters, use the SHOW VARIABLES command. 1) Parameter description

There are many system parameters, a few of which are introduced below.

Sort_buffer_size

Sort area size. Its size directly affects the algorithm used for sorting. If the sorting in the system is large, the memory is sufficient, and the concurrency is not very large, you can increase this parameter appropriately. This parameter is for a single Thead.

Join_buffer_size

The Join operation uses the memory area size. Join Buffer is used only if Join is ALL, index, range, or index_merge. If there are many join statements, you can increase the join_buffer_size appropriately. Note that this value is for a single Thread. Each Thread will create its own independent Buffer, rather than the Buffer shared by the whole system, do not set too large to cause the system to run out of memory.

Tmp_table_size

If the temporary table in memory exceeds this value, MySQL automatically converts it to a MyISAM table on the hard disk. If you perform many advanced GROUP BY queries and have a lot of memory, you can increase the value of tmp_table_size.

Read_buffer_size

The size of the buffer that can be used by read query operations. This parameter is for a single Thead.

4. Optimizer switch

In MySQL, there are also parameters that can be used to control the optimizer behavior.

1) Parameter description

Optimizer_search_depth

This parameter controls the limits of the optimizer's exhaustive execution of the plan. If the query has been in the "statistics" state for a long time, consider lowering this parameter.

Optimizer_prune_level

The default is on, which allows the optimizer to decide whether to skip some execution plans based on the number of rows that need to be scanned.

Optimizer_switch

This variable contains some flag bits to turn the optimizer feature on / off.

Example-interfere with optimizer behavior (ICP feature)

By default, the ICP feature is on. Check out the optimizer behavior.

Filtering queries based on secondary indexes, using the ICP feature, are visible from the "Using index condition" in Extra. What if you interfere with the optimizer's behavior through the optimizer switch?

As can be seen from Extra, the ICP feature has been disabled.

5. System status (SHOW STATUS)

Some states are also built into MySQL, and these state variables can also reflect the execution of the statement, which is convenient to locate the problem. If you execute it manually, you can execute the SHOW STATUS command before and after the statement is executed to see the state change. Of course, because there are a lot of state variables, it is not very convenient to compare. The gadget I introduced later can solve this problem.

1) State variable

There are many state variables, several of which are introduced here.

Sort_merge_passes

The number of merges that have been performed by the sorting algorithm. If the value of this variable is large, consider increasing the value of the sort_buffer_size system variable.

Sort_range

The number of sorts performed within the range.

Sort_rows

Number of rows that have been sorted.

Sort_scan

The number of sorts completed by scanning the table.

Handler_read_first

The number of times the first item in the index has been read. The number of times the index header is read. If this value is high, there are a lot of full index scans.

Handler_read_key

The number of requests to read a row according to the key. If high, the query and table are indexed correctly.

Handler_read_next

The number of requests to read the next row in key order. If you use a range constraint or if you perform an index scan to query the index column, this value is increased.

Handler_read_prev

The number of requests that read the previous line in key order.

Handler_read_rnd

The number of requests to read a line according to a fixed position. This value is higher if you execute a large number of queries and need to sort the results. It is possible that a large number of queries or joins that require MySQL to scan the entire table do not use keys correctly.

Handler_read_rnd_next

The number of requests to read the next line in the data file. This value is high if a large number of table scans are being performed. It usually indicates that the table index is incorrect or that the query written does not take advantage of the index.

6. SQL performance Analyzer (Query Profiler)

MySQL's Query Profiler is a very convenient Query diagnosis and analysis tool. Through this tool, we can obtain the consumption of a variety of resources in the whole execution process of a Query, such as CPU, IO, IPC, SWAP, etc., as well as the occurrence of PAGE FAULTS, CONTEXT SWITCHE and so on. At the same time, we can also get the location of each function called by the MySQL in the source file.

1) usage

open

Mysql > select @ @ profiling;mysql > set profiling=1

By default, a value of 0 for profiling indicates that MySQL SQL Profiler is in the OFF state, and the value of profiling is 1 when SQL profiler is turned on.

Execute SQL statement

Mysql > select count (*) from T1

Get summary information

Use the "show profile" command to get a summary of the profile of multiple Query saved on the current system.

Mysql > show profiles +-- + | Query_ID | Duration | Query | +-- + | 1 | 0.00039300 | select count (*) from T1 | +-- +

Get detailed profile information for a single Query

After obtaining the profile information, you can obtain the detailed profile information during the execution of a Query according to the Query_ID of the profile information.

Mysql > show profile for query 1 * MySQL > show profile cpu,block io for query 1; II. Tool description

A variety of tools have been mentioned earlier, all of which are helpful for tuning SQL statements. Through the following gadget, you can automatically call the command to push the above to DBA at one time, greatly speeding up the optimization process.

1. Preparation conditions

Module-MySQLDB

Module-sqlparse

Python version = 2.7.3 (2.6.x version should be fine, 3.x version is not tested)

2. Parameter description of calling method python mysql_tuning.py-p tuning_sql.ini-s' select xxx'1)

-p specifies the profile name

-s specifies the SQL statement

3. Configuration file

It is divided into two sections of information, which are [database] describing database connection information and [option] running configuration information.

1) [database] server_ip = 127.0.0.1db_user = testuserdb_pwd = testpwddb_name = test2) [option] sys_parm = ON / / whether to display the system parameter sql_plan = ON / / whether to display the execution plan obj_stat = ON / / whether to display related objects (table, Index) Statistics ses_status = ON / / whether to display pre-and post-run status information (real SQL after activation) sql_profile = ON / / whether to display PROFILE tracking information (real SQL after activation) 4, output description 1) header section

Contains the address information and data version information of the running database.

2) original SQL

The user executes the input SQL, which is mainly used for subsequent comparison of SQL rewriting. The statement is displayed using formatting.

3) system-level parameters

The script selection shows some parameters related to SQL performance. This part is written in the code, if you need to expand, you need to modify the script.

4) Optimizer switch

Here are some parameters related to the optimizer that can be artificially interfered with by adjusting them.

5) implement the plan

Is the output of the call to explain extended. If the result is too long, there may be a serial display problem (not resolved for the time being).

6) SQL rewritten by optimizer

Here you can determine whether the optimizer has made some kind of optimization for SQL (such as the processing of subqueries).

7) Statistics

Statistics for all tables and their indexes involved in the SQL statement are displayed here.

8) running status information

At the session level, the state before and after execution (SHOW STATUS) is compared, and the parts that have changed are displayed. It should be noted that because the status data is collected in a SELECT way, it will cause errors in individual indicators (such as Com_select).

9) PROFILE details

The details obtained by calling SHOW PROFILE.

10) PROFILE summary information

According to the resource consumption of PROFILE, the comparison of consumption in different stages (TOP N) is displayed, and the "bottleneck" is intuitively shown.

Download address of source code file: https://pan.baidu.com/s/1slF3zS5?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=

Author: Han Feng

Content source: Yixin Institute of Technology

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.