What are the pitfalls in development after upgrading to MySQL5.7? 07/15 Update SLTechnology News&Howtos

What are the pitfalls in development after upgrading to MySQL5.7?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces what is the pit in the development after upgrading to MySQL5.7. It is very detailed and has a certain reference value. Friends who are interested must read it!

Preface

Some time ago, the online MySQL database was upgraded to 5.7. Considering the possible incompatibility, it is indeed trembling before the upgrade, although the test environment and the development environment were upgraded in advance as early as six months ago.

Based on previous research and feedback from friends, there are two main points related to development:

Sql_mode

In MySQL 5.6, the default value is "NO_ENGINE_SU BSTITUTION", which can be understood as a non-strict mode, for example, inserting an empty string''for a self-increasing primary key prompts warning, but does not affect the generation of a self-increasing primary key.

However, in MySQL 5.7, it is adjusted to strict mode, and for the above one, it does not prompt warning, but directly reports an error.

Group to find the maximum value

Some writing methods of grouping to find the maximum value do not get the expected results in MySQL5.7, which is relatively hidden.

Among them, the first point is controllable, after all, the parameters can be adjusted. The second point, however, is uncontrollable, there are no parameters related to it, and you need to develop Review code.

Let's take a look at the details.

test data

Mysql > select * from emp +-+ | empno | ename | sal | deptno | +-+ | 1001 | emp_1001 | 100.00 | 10 | 1002 | emp_1002 | 200.00 | 10 | 1003 | emp_1003 | 300.00 | 20 | | 1004 | | emp_1004 | 400.00 | 20 | | 1005 | emp_1005 | 500.00 | 30 | | 1006 | emp_1006 | 600.00 | 30 | +-+ rows in set (600.00 sec) |

Where empno is the employee number, ename is the employee's name, sal is the salary, and deptno is the department number of the employee.

The requirement of the business is to find out the relevant information about the highest-paid employees in each department.

In MySQL5.6, we can do this with the following SQL

SELECT deptno,ename,sal FROM (SELECT * FROM emp ORDER BY sal DESC) t GROUP BY deptno

The results are as follows, we can see that it has indeed achieved the desired results.

+-+ | deptno | ename | sal | +-+ | 10 | emp_1002 | 200.00 | | 20 | emp_1004 | 400.00 | | 30 | emp_1006 | 600.00 | +-+

Let's take a look at the results of MySQL5.7, which is actually different.

+-+ | deptno | ename | sal | +-+ | 10 | emp_1001 | 100.00 | | 20 | emp_1003 | 300.00 | | 30 | emp_1005 | 500.00 | +-+

In fact, in MySQL5.7, the SQL is rewritten, and the rewritten SQL can be viewed through explain (extended) + show warnings.

From the point of view of the rewritten SQL, it eliminates the subquery, resulting in the result failed to achieve the desired results, which has also been confirmed by the government, https://bugs.mysql.com/bug.php?id=80131

Many people may disagree, thinking that no one will write this, but in the famous stackoverflow, there are 116 likes to be achieved-which shows its wide audience, second only to the "method 2" mentioned later.

Https://stackoverflow.com/questions/12102200/get-records-with-max-value-for-each-group-of-grouped-sql-results

It should be noted that the SQL cannot be run directly in 5.7. it will prompt the following error:

ERROR 1055 (42000): Expression # 2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't.ename' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

This is related to sql_mode. In MySQL 5.7, sql_mode is adjusted to

ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

Among them, ONLY_FULL_GROUP_BY is related to the group by statement, which requires that only grouped columns (that is, columns after group by) and aggregate functions (sum,avg,max, etc.) can appear in the select list, which is also the standard of SQL92.

But in my work, I often see developers write the following SQL.

Mysql > select deptno,ename,max (sal) from emp group by deptno +-+ | deptno | ename | max (sal) | +-+ | 10 | emp_1001 | 200.00 | | 20 | emp_1003 | 400.00 | | 30 | emp_1005 | 600.00 | +-+- -+-+ rows in set (0.01sec)

I really don't understand what the ename here means in the business level. after all, he is not the highest-paid employee.

Grouping to find the maximum value, the implementation of MySQL

In fact, it is a very common demand to find the maximum value in groups. In the work, also often be asked by the developer colleagues. Let's take a look at what is implemented in MySQL.

Method 1

SELECT e.deptno, ename, sal FROM emp e, (SELECT deptno, max (sal) maxsal FROM emp GROUP BY deptno) t WHERE e.deptno = t.deptno AND e.sal = t.maxsal

Method 2

SELECT a.deptno, a.ename, a.sal FROM emp a LEFT JOIN emp b ON a.deptno = b.deptno AND a.sal

< b.sal WHERE b.sal IS NULL; 这两种实现方式，其实是通用的，不仅适用于MySQL，也适用于其它主流关系型数据库。方法3 MySQL 8.0推出了分析函数，其也可实现类似功能。 SELECT deptno, ename, sal FROM ( SELECT deptno, ename, sal, LAST_VALUE ( sal ) OVER ( PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) maxsal FROM emp ) a WHERE sal = maxsal; 三种实现方式的性能对比因上面测试案例的数据量太小，三种实现方式的结果都是秒出，仅凭执行计划很难直观地看出实现方式的优劣。下面换上数据量更大的测试数据，官方示例数据库employees中的dept_emp表，https://github.com/datacharmer/test_db 表的相关信息如下，其中emp_no是员工编号，dept_no是部门编号，from_date是入职日期。 mysql>

Show create table dept_emp\ dept_ * 1. Row * * Table: dept_empCreate Table: CREATE TABLE `dept_ emp` (`emp_ no` int (11) NOT NULL, `dept_ no` char (4) NOT NULL, `from_ date`date NOT NULL, `to_ date`date NOT NULL, KEY `dept_ no` (`dept_ no`) `from_ date`) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_cirow in set (0.00 sec) mysql > select count (*) from dept_emp +-+ | count (*) | +-+ | 331603 | +-+ row in set (0.09 sec) mysql > select * from dept_emp limit 1 +-+ | emp_no | dept_no | from_date | to_date | +-+ | 10001 | d005 | 1986-06-26 | 9999-01 -01 | +-+ row in set (0.00 sec)

Method 1

Method 2

Mysql > explain select a. Deptorino on a.dept_no=b.dept_no and a.from_date a. Emptiness no. A. From the date of on a.dept_no=b.dept_no and a.from_date

< b.from_date where b.from_date is null;+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+--------------------------+| 1 | SIMPLE | a | NULL | ALL | NULL | NULL | NULL | NULL | 331008 | 100.00 | NULL || 1 | SIMPLE | b | NULL | ref | dept_no | dept_no | 16 | slowtech.a.dept_no | 41376 | 19.00 | Using where; Using index |+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+--------------------------+rows in set, 1 warning (0.00 sec) 方法3 mysql>

In terms of execution time

Method 1 has the shortest time, and when there is a composite index (deptno, fromdate), the result comes out instantly, even without an index, it only consumes 0.75s.

The time of method 2 was the longest, and there was no result for 3 hours. It took 87 minutes and 49 seconds for the same data, the same SQL, to be checked by Oracle.

The time of method 3 is relatively fixed, whether there is an index or not, it is maintained at about 1.5s, which is longer than that of method 1.

Here, we also test the previously mentioned implementation that is no longer compatible in MySQL 5.7. without any index, it is stable at 0.7s (performance is not weak, no wonder people use it), while in the same case, method 1 is stable at 0.5s (ha, MySQL 5.6 is even faster than 8.0). But unlike method 1, it cannot be optimized by index.

From the perspective of the implementation plan

Method 1, first put the results of group by into the temporary table, and then use the temporary table as the driving table to associate the query with the dept_ emptable. The driver table is small (only 9 records), and the associated columns have indexes. No wonder the results can be obtained in seconds.

Method 2, the two tables are associated. It violates two big taboos in SQL optimization.

1. The driver table is too large, it has 331603 records.

two。 Although the driven table also has an index, but from the execution plan, it only uses the dept_no in the composite index (dept_no, from_date), and the selection rate of dept_no is too low, after all, there are only 9 departments.

Method 3, first put the results of the analysis into a temporary table, and then process the temporary table. It carried out two full table scans, one for the dept_ emptable and the other for the temporary table.

Therefore, for the requirement of finding the maximum value of grouping, it is recommended to use method 1, which not only conforms to the SQL specification, but also has the best query performance, especially in the case of federated indexes.

The above is all the contents of this article entitled "what are the pitfalls in the development after upgrading to MySQL5.7?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.