What are the 30 suggestions for writing high-quality SQL? 07/15 Update SLTechnology News&Howtos

What are the 30 suggestions for writing high-quality SQL?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What are the 30 suggestions for writing high-quality SQL? I believe many inexperienced people don't know what to do about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Preface

Combined with the example demo, the editor will elaborate 30 suggestions on optimizing SQL, most of which are summarized in the actual development.

1. Try not to use select * to query SQL, but select specific fields.

Counterexample:

Select * from employee

Positive example:

Select id,name from employee

Reason:

Only take the required fields to save resources and reduce network overhead.

When select * makes a query, it is likely that the overlay index will not be used, resulting in a query back to the table.

2. If you know that there is only one query result or only the maximum / minimum record, it is recommended to use limit 1.

Suppose there is an employee employee table, and you want to find a person named jay.

CREATE TABLE `employee` (`id` int (11) NOT NULL, `name` varchar (255) DEFAULT NULL, `age` int (11) DEFAULT NULL, `date`datetime DEFAULT NULL, `sex` int (1) DEFAULT NULL, PRIMARY KEY (`id`) ENGINE=InnoDB DEFAULT CHARSET=utf8

Counterexample:

Select id,name from employee where name='jay'

Positive example

Select id,name from employee where name='jay' limit 1

Reason:

With the addition of limit 1, as long as a corresponding record is found, it will not continue to scan down, and the efficiency will be greatly improved.

Of course, if name is the only index, it is not necessary to add limit 1, because the existence of limit is mainly to prevent full table scanning, so as to improve performance, if a statement itself can predict that there is no full table scan, there is no significant difference in performance.

3. Try to avoid using or to connect conditions in the where clause.

Create a new user table with a normal index userId. The table structure is as follows:

CREATE TABLE `user` (`id` int (11) NOT NULL AUTO_INCREMENT, `userId` int (11) NOT NULL, `age` int (11) NOT NULL, `name` varchar (255) NOT NULL, PRIMARY KEY (`id`), KEY `userId` (`userId`) ENGINE=InnoDB DEFAULT CHARSET=utf8

Suppose you now need to query users whose userid is 1 or 18 years old, it is easy to have the following SQL

Counterexample:

Select * from user where userid=1 or age = 18

Positive example:

/ / use union all select * from user where userid=1 union all select * from user where age = 18 / / or separate two sql to write: select * from user where userid=1 select * from user where age = 18

Reason:

Using or may invalidate the index, resulting in a full table scan.

For the case of age where or+ does not have an index, suppose it has taken the index of userId, but when it comes to the age query condition, it still has to scan the whole table, that is, it needs a three-step process: full table scan + index scan + merge. If it starts with a full table scan, it is finished by scanning it directly. Mysql has an optimizer for efficiency and cost considerations, and it seems reasonable that the index may fail when it comes to or conditions.

4. Optimize limit paging

When we do paging requirements, we usually use limit, but when the offset is particularly large, the query efficiency becomes inefficient.

Counterexample:

Select id,name,age from employee limit 10000,10

Positive example:

/ / solution 1: return the maximum record (offset) of the last query select id,name from employee where id > 10000 limit 10. / / scenario 2: order by + index select id,name from employee order by id limit 10000 no. 10 / / scenario 3: limit the number of pages if the business allows:

Reason:

When the offset is maximum, the query efficiency will be lower, because Mysql does not skip the offset to get the following data directly, but first put the offset + the number of entries to be taken, and then discard and return the data of the previous offset.

If you use optimization scheme 1, return the last maximum query record (offset), which can skip the offset and improve the efficiency a lot.

The second scheme uses order by+ index, which can also improve the query efficiency.

Plan 3, it is recommended to discuss with the business, whether it is necessary to check the paging after this. Because most users don't turn back too many pages.

5. Optimize your like statement

In daily development, if you use fuzzy keyword queries, it's easy to think of like, but like is likely to invalidate your index.

Counterexample:

Select userId,name from user where userId like'3'

Positive example:

Select userId,name from user where userId like '123%'

Reason:

Put% in front without going to the index, as follows:

Put% after the keyword and you will still go to the index. As follows:

6. Use where conditions to limit the data to be queried to avoid returning redundant rows

Suppose the business scenario goes like this: query whether a user is a member. I've seen the old implementation code look like this.

Counterexample:

List userIds = sqlMap.queryList ("select userId from user where isVip=1"); boolean isVip= userIds.contains (userId)

Positive example:

Long userId= sqlMap.queryObject ("select userId from user where userId='userId' and isVip='1'") boolean isVip= userId! = null

Reason:

Check whatever data you need to avoid returning unnecessary data and save money.

7. Try to avoid using mysql's built-in functions on index columns

Business requirements: query users who have logged in in the last seven days (assuming loginTime is indexed)

Counterexample:

Select userId,loginTime from loginuser where Date_ADD (loginTime,Interval 7 DAY) > = now ()

Positive example:

Explain select userId,loginTime from loginuser where loginTime > = Date_ADD (NOW (), INTERVAL-7 DAY)

Reason:

Index invalidation due to the use of mysql's built-in function on the index column

If the index column does not add a built-in function, the index will still go.

8. Try to avoid expression operations on fields in the where clause, which will cause the system to abandon the use of indexes and perform full table scans

Counterexample:

Select * from user where age-1 = 10

Positive example:

Select * from user where age = 11

Reason:

Although age added the index, the index got lost directly because of the operation on it.

9. Inner join, left join, right join. Use Inner join first. If it is left join, the result of the table on the left should be as small as possible.

Join within Inner join. When two tables are joined to query, only the result sets that match exactly in the two tables are retained.

When left join makes a join query between two tables, it returns all the rows of the left table, even if there are no matching records in the right table.

When right join makes a join query between two tables, it returns all the rows of the right table, even if there are no matching records in the left table.

On the premise that all SQL requirements are met, it is recommended to give priority to using Inner join (inner join). If you want to use left join, the data result of the table on the left should be as small as possible, and if conditional, try to put it on the left for processing.

Counterexample:

Select * from tab1 T1 left join tab2 T2 on t1.size = t2.size where t1.id > 2

Positive example:

Select * from (select * from tab1 where id > 2) T1 left join tab2 T2 on t1.size = t2.size

Reason:

If the inner join is an equivalent join, perhaps the number of rows returned is relatively small, so the performance is relatively better.

Similarly, the left join is used, the data result of the left table is as small as possible, and the condition is processed on the left as far as possible, which means that the number of rows returned may be less.

Avoid using the! = or operator in the where clause as much as possible, otherwise the engine will give up using the index and do a full table scan.

Counterexample:

Select age,name from user where age 18

Positive example:

/ / you can consider writing select age,name from user where age 18 with two separate sql

Reason:

Using! = and is likely to invalidate the index

11. When using federated indexes, pay attention to the order of index columns and generally follow the leftmost matching principle.

Table structure: (there is a federated index idxuseridage,userId, followed by age)

CREATE TABLE `user` (`id` int (11) NOT NULL AUTO_INCREMENT, `userId` int (11) NOT NULL, `age` int (11) DEFAULT NULL, `name` varchar (255) NOT NULL, PRIMARY KEY (`id`), KEY `userage` (`userId`, `age`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8

Counterexample:

Select * from user where age = 10

Positive example:

/ / comply with the leftmost matching principle select * from user where userid=10 and age = 10; / / comply with the leftmost matching principle select * from user where userid=10

Reason:

When we create a joint index, such as (K1 ~ K2 ~ K3), it is equivalent to creating (K1), (K1 ~ 2 ~ 2) and (K1 ~ ~ 2 ~ K3). This is the leftmost matching principle.

The federated index does not meet the leftmost principle, and the index will generally fail, but this is also related to the Mysql optimizer.

12. To optimize the query, we should consider establishing indexes on the columns involved in where and order by to avoid full table scanning as far as possible.

Counterexample:

Select * from user where address = 'Shenzhen' order by age

Positive example:

Add index alter table user add index idx_address_age (address,age)

13. If you insert too much data, consider bulk insertion.

Counterexample:

For (User u: list) {INSERT into user (name,age) values (# name#,#age#)}

Positive example:

/ / insert insert into user (name,age) values (# {item.name}, # {item.age}) in batches at a time

Reason:

Batch insertion has good performance and saves time.

Take an analogy: if you need to move 10,000 bricks to the roof, you have an elevator, which can put the right amount of bricks at a time (up to 500). You can choose to transport one brick at a time or 500 bricks at a time. Which time do you think is more consuming?

14. Use an overlay index when appropriate.

Overriding the index can make your SQL statements do not need to return to the table, just access the index to get all the data you need, greatly improving the query efficiency.

Counterexample:

/ / like fuzzy query, without indexing select * from user where userid like'3%'

Positive example:

If / / id is the primary key, then it is a normal index, that is, an overlay index. Select id,name from user where userid like'3%'

15. Use the distinct keyword carefully

The distinct keyword is typically used to filter duplicate records to return records that are not duplicated. When used in the case of querying a field or few fields, it brings optimization to the query. However, when there are a lot of fields, it will greatly reduce the query efficiency.

Counterexample:

SELECT DISTINCT * from user

Positive example:

Select DISTINCT name from user

Reason:

Statements with distinct take more cpu time and occupancy time than statements without distinct. Because when querying many fields, if distinct is used, the database engine will compare the data and filter out the duplicate data, but the process of comparison and filtering will take up system resources and cpu time.

16. Delete redundant and duplicate indexes

Counterexample:

KEY `idx_ UserId` (`userId`) KEY `Userage` (`userId`, `age`)

Positive example:

/ / delete the userId index, because the combined index (A Magi B) is equivalent to the creation of (A) and (A Magi B) indexes KEY `UserId` (`userId`, `age`)

Reason:

Duplicate indexes need to be maintained, and the optimizer needs to consider them one by one when optimizing queries, which can affect performance.

17. If the amount of data is large, optimize your modify / delete statements.

Avoid modifying or deleting too much data at the same time, as it will cause excessive cpu utilization and affect others' access to the database.

Counterexample:

/ / delete 100000 or 1 million + at a time? Delete from user where id isnull,isnotnull is often thought of as invalidating the index, but in fact, in general, the query is expensive and the optimizer automatically abandons the index.

If you change the null value to the default value, it is often possible to leave the index, at the same time, the meaning will be relatively clear.

19. Do not have more than 5 table joins

The more tables you have, the more time and overhead it takes to compile.

Split the join table into smaller implementations for better readability.

If you have to join a lot of tables to get the data, it means bad design.

20. Rational use of exist&in

Assuming that Table A represents the employee table of an enterprise, and Table B represents the department table, and query all employees in all departments, it is easy to have the following SQL:

Select * from A where deptId in (select deptId from B)

Writing like this is equivalent to:

Check department table B first.

Select deptId from B

Then the department deptId will inquire about the staff of A.

Select * from A where A.deptId = B.deptId

It can be abstracted into a loop like this:

List resultSet; for (int itemositani

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.