Example Analysis of SQL Optimization recommendation in MySQL 04/27 Update SLTechnology News&Howtos

Example Analysis of SQL Optimization recommendation in MySQL

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you an example analysis of SQL optimization suggestions in MySQL. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

This morning I saw a demand for optimization from my colleagues. I didn't have much time to optimize, but I thought a lot about the optimization of this SQL. I hope to have some reference.

The SQL provided by business students is as follows:

SELECT b.order_id FROM (SELECT a.order_id, a.order_time AS create_time FROM trade_order a WHERE a.user_id = 12345678). AND a.deleted = 0 UNION SELECT v.order_id, v.create_time FROM virtual_order v WHERE v.user_id = 12345678. ORDER BY order_id DESC) AS b LIMIT 0,10

According to feedback, the execution time of this SQL is 200ms, and it will reach about 500ms under pressure test. From a business point of view, it does not meet the demand at present. We would like to see if we have any suggestions for optimization.

First impression of this SQL execution time of 20000500 milliseconds, to optimize it seems that there are not many cards to play ah, if you want to get an acceptable benchmark, of course, the feedback will be as soon as possible. So from this point of view, we might as well look at what additional work this SQL needs to do according to the millisecond optimization standard.

First of all, look at the logic through SQL, the overall logic is to query two data sources (trade_order and virtual_order) according to the user id, and query 10 items of order number data back from the two data sources. This user may or may not have an order number in the two data sources. As long as there is a match, he or she will return a total of 10 entries. It seems that he chose the combination method of union in order to remove duplicates.

Without looking at the table structure information, I generally have the following suggestions:

The mode of union is more recommended to use union all, and it is unreasonable for two data sources to have data overlap.

Order_time is used in the query statement, but the data return is not used at all. It is recommended to remove it.

The SQL layer carries too much pressure on data processing, such as multiple data sources, de-duplication and filtering, paging, and whether it is possible to simplify.

Of course, when you get here, you are out of touch with the needs of the business, which belongs to the kind of state that you don't like everything, and you always want to find out some problems, and for business students, even if there are ten or eight needs, you have to have one demand with higher returns, and the more likely they are to adopt other requirements, otherwise they will not do anything.

So here, we start to do the next analysis, to optimize SQL without looking at the execution plan is not passed, before execution, I generally feel that the table data is very large, should be generated derived tables, and then in the data to re-filter the level of consumption is relatively large, and the two sub-queries, the result set returned should be very small. The performance of the forecast is:

1) subquery trade_order should be very fast, millimeter level response

2) the subquery virtual_order should also be very fast, but finally there is an order by operation, which may be a little more expensive.

3) the de-duplication filtering cost of union is relatively high, which involves the merging of two result sets. If more results are returned, it may be a bottleneck.

From the point of view of the execution result, it surprised me that the return result of virtual_order actually had more than 400,000 rows, which is equivalent to directly scanning the whole table.

Other parts will also be affected, so the follow-up processing will be affected.

In order to quickly locate the problem, I took the two subqueries apart and executed them separately to view the execution plan, which is the fastest way to analyze the bottleneck.

> > explain SELECT-> v.order_id,-> v.create_time-> FROM-> virtual_order v-> WHERE-> v.user_id = 12345678.

The implementation plan is as follows:

You can see that the full table scan is done directly, which is a basic requirement. Business students will not miss the index, and then check the table structure:

CREATE TABLE `virtual_ order` (`order_ id` varchar (255th) NOT NULL COMMENT 'order ID',. `user_ id` varchar (255) DEFAULT NULL COMMENT 'user ID',. `refund` tinyint (3) DEFAULT NULL COMMENT 'whether to refund (1: none, 2: yes)', `refund` int (3) NOT NULL DEFAULT'0' COMMENT',. PRIMARY KEY (`prepaid_ id`), KEY `order_ status` (`order_ status`), KEY `user_ id` (`user_ id`), KEY `prepaid_ statt` (`prepaid_ statt`) ENGINE=InnoDB DEFAULT CHARSET=utf8

It is found that user_id is indexed, so the question is, since user_id is an index, why is there still a full table scan in the SQL statement?

Think here for 10 seconds and move on.

In fact, the boundary of the problem is very clear at this time. The SQL statement is very simple, and the index also exists. After a full table scan, you can temporarily eliminate the influence of histogram in MySQL. Currently, there is no histogram feature in version 5.7, then there is only one result: the type of the field produces implicit type conversion.

You can refer to an article in this article in this section.

Implicit conversion that needs to be emphasized in MySQL

For example, the initialization statement is as follows:

Create table test (id int primary key,name varchar (20), key idx_name (name)); insert into test values (1recorder 10'), (2meme 20')

Then we use the following two statements to perform a comparative test of the execution plan.

Explain select * from test where name=20; explain select * from test where name='20'

When name is listed as a character type, the list of execution plans is as follows:

You can clearly see that when name is a string type, if the where condition is name=20, then a full index scan is performed, and viewing the warning information will clearly prompt:

Message: Cannot use range access on index 'idx_name' due to type or collation conversion on field' name'

So the problem here is obvious.

After changing the condition of the subquery to characters, the execution efficiency of the entire SQL is immediately much better.

Test using sql_no_cache.

SQL pre-modification performance:

+-+ 2 rows in set (0.27 sec) modified performance: +-+ 2 rows in set (0.00 sec)

Then look at the implementation plan again, and we will be well-behaved, so that we can solve the bottleneck problem, and those specifications, better improvements can be gradually unfolded, and from the proposed point of view, the probability of adoption will be higher.

Of course, on this basis, there are indeed some supplementary suggestions, which can be spread out after locating the bottleneck.

Optimization is not an one-shot sale, on this basis, also found some other problems, you can look at the table structure of this table information, in fact, you can find some minor design problems.

1) the character type of the table field is basically varchar (255). This habit should be avoided as much as possible, which will have a significant impact on storage performance.

2) the int type int (3) is used, which still stores 4 bytes for int, but the limited range is greatly reduced, and smaller numerical types can be considered.

3) the index of the table is loose, so you can create a composite index according to the business model. For example, there are more scenarios for the combination of user_id and status, and you should create a composite index of (user_id,status).

The above is the example analysis of the SQL optimization proposal in MySQL shared by Xiaobian. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.