How to write a good SQL 07/12 Update SLTechnology News&Howtos

How to write a good SQL

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to write a good SQL. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

MySQL performance

Maximum amount of data

Putting aside the amount of data and the number of concurrency, talking about performance is rogue. MySQL does not limit the maximum number of records in a single table, depending on the operating system's restrictions on file size.

Alibaba Java Development Manual proposes that only when the number of rows in a single table exceeds 5 million rows or the capacity of a single table exceeds 2GB, sub-database and sub-tables are recommended. Performance is determined by comprehensive factors. Apart from business complexity, the degree of influence is hardware configuration, MySQL configuration, data table design and index optimization. The value of 5 million is for reference only, not an iron law.

Bloggers have manipulated a single table with more than 400 million rows of data. It takes 0.6 seconds for paging to query the latest 20 records, and the SQL statement is roughly select field_1,field_2 from table where id.

< #{prePageMinId} order by id desc limit 20，prePageMinId是上一页数据记录的最小ID。虽然当时查询速度还凑合，随着数据不断增长，有朝一日必定不堪重负。分库分表是个周期长而风险高的大活儿，应该尽可能在当前结构上优化，比如升级硬件、迁移历史数据等等，实在没辙了再分。对分库分表感兴趣的同学可以阅读分库分表的基本思想。最大并发数并发数是指同一时刻数据库能处理多少个请求，由max_connections和max_user_connections决定。max_connections是指MySQL实例的最大连接数，上限值是16384，max_user_connections是指每个数据库用户的最大连接数。 MySQL会为每个连接提供缓冲区，意味着消耗更多的内存。如果连接数设置太高硬件吃不消，太低又不能充分利用硬件。一般要求两者比值超过10%，计算方法如下： max_used_connections / max_connections * 100% = 3/100 *100% ≈ 3% 查看最大连接数与响应最大连接数： show variables like '%max_connections%'; show variables like '%max_user_connections%'; 在配置文件my.cnf中修改最大连接数 [mysqld] max_connections = 100 max_used_connections = 20 查询耗时0.5秒建议将单次查询耗时控制在0.5秒以内，0.5秒是个经验值，源于用户体验的3秒原则。如果用户的操作3秒内没有响应，将会厌烦甚至退出。响应时间=客户端UI渲染耗时+网络请求耗时+应用程序处理耗时+查询数据库耗时，0.5秒就是留给数据库1/6的处理时间。实施原则相比NoSQL数据库，MySQL是个娇气脆弱的家伙。它就像体育课上的女同学，一点纠纷就和同学闹别扭(扩容难)，跑两步就气喘吁吁(容量小并发低)，常常身体不适要请假(SQL约束太多)。如今大家都会搞点分布式，应用程序扩容比数据库要容易得多，所以实施原则是数据库少干活，应用程序多干活。充分利用但不滥用索引，须知索引也消耗磁盘和CPU。不推荐使用数据库函数格式化数据，交给应用程序处理。不推荐使用外键约束，用应用程序保证数据准确性。写多读少的场景，不推荐使用唯一索引，用应用程序保证唯一性。适当冗余字段，尝试创建中间表，用应用程序计算中间结果，用空间换时间。不允许执行极度耗时的事务，配合应用程序拆分成更小的事务。预估重要数据表(比如订单表)的负载和数据增长态势，提前优化。数据表设计数据类型数据类型的选择原则：更简单或者占用空间更小。如果长度能够满足，整型尽量使用tinyint、smallint、medium_int而非int。如果字符串长度确定，采用char类型。如果varchar能够满足，不采用text类型。精度要求较高的使用decimal类型，也可以使用BIGINT，比如精确两位小数就乘以100后保存。尽量采用timestamp而非datetime。相比datetime，timestamp占用更少的空间，以UTC的格式储存自动转换时区。避免空值 MySQL中字段为NULL时依然占用空间，会使索引、索引统计更加复杂。从NULL值更新到非NULL无法做到原地更新，容易发生索引分裂影响性能。尽可能将NULL值用有意义的值代替，也能避免SQL语句里面包含is not null的判断。 text类型优化由于text字段储存大量数据，表容量会很早涨上去，影响其他字段的查询性能。建议抽取出来放在子表里，用业务主键关联。索引优化索引分类普通索引：最基本的索引。组合索引：多个字段上建立的索引，能够加速复合查询条件的检索。唯一索引：与普通索引类似，但索引列的值必须唯一，允许有空值。组合唯一索引：列值的组合必须唯一。主键索引：特殊的唯一索引，用于唯一标识数据表中的某一条记录，不允许有空值，一般用primary key约束。全文索引：用于海量文本的查询，MySQL5.6之后的InnoDB和MyISAM均支持全文索引。由于查询精度以及扩展性不佳，更多的企业选择Elasticsearch。索引优化分页查询很重要，如果查询数据量超过30%，MYSQL不会使用索引。单表索引数不超过5个、单个索引字段数不超过5个。字符串可使用前缀索引，前缀长度控制在5-8个字符。字段唯一性太低，增加索引没有意义，如：是否删除、性别。合理使用覆盖索引，如下所示： select login_name, nick_name from member where login_name = ? login_name, nick_name两个字段建立组合索引，比login_name简单索引要更快。 SQL优化分批处理博主小时候看到鱼塘挖开小口子放水，水面有各种漂浮物。浮萍和树叶总能顺利通过出水口，而树枝会挡住其他物体通过，有时还会卡住，需要人工清理。MySQL就是鱼塘，最大并发数和网络带宽就是出水口，用户SQL就是漂浮物。不带分页参数的查询或者影响大量数据的update和delete操作，都是树枝，我们要把它打散分批处理，举例说明：业务描述：更新用户所有已过期的优惠券为不可用状态。 SQL语句： update status=0 FROM `coupon` WHERE expire_date 100) union all (select id from orders where amount < 100 and amount >

OR optimization

Or cannot use composite indexes under the Innodb engine, such as:

Select id,product_name from orders where mobile_no = '13421800407' or user_id = 100

OR cannot hit the combined index of mobile_no + user_id and can use union, as shown below:

(select id,product_name from orders where mobile_no = '13421800407') union (select id,product_name from orders where user_id = 100)

At this point, both id and product_name fields are indexed, so the query is the most efficient.

IN optimization

IN is suitable for master table and sub-table is small, and EXIST is suitable for master table and child table. Due to the continuous upgrading of the query optimizer, the performance of the two is pretty much the same in many scenarios.

Try changing to a join query, as an example:

Select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP'

Using JOIN is as follows:

Select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP'

Do not do column operation

Operations on query condition columns usually result in index invalidation, as shown below:

Inquire about orders of the day

Select id from order where date_format (create_time,'%Y-%m-%d') = '2019-07-01'

The date_format function will cause the query to fail to use the index. After rewriting:

Select id from order where create_time between '2019-07-01 00 and' 2019-07-01 23 59'

Avoid Select all

If you do not query all the columns in the table and avoid using SELECT *, it will perform a full table scan and cannot make effective use of the index.

Like optimization

Like is used for fuzzy queries, for example (field has been indexed):

SELECT column FROM table WHERE field like'% keyword%'

This query misses the index and is written as follows:

SELECT column FROM table WHERE field like 'keyword%'

Removing the previous% query will hit the index, but does the product manager have to vaguely match before and after? Full-text indexing fulltext can try, but Elasticsearch is the ultimate weapon.

Join optimization

Join is implemented by using Nested Loop Join algorithm, which drives the result set of the table as the basic data, uses the knot data as the filter condition to query the data in the next table, and then merges the results. If there is more than one join, the previous result set is used as circular data, and the data is queried in the latter table again.

Driven tables and driven tables increase query conditions as much as possible, meet the conditions of ON and use less Where, and use small result sets to drive large result sets.

When an index is added to the join field of the driven table, and the index cannot be established, set enough Join Buffer Size.

Disable join from joining more than three tables and try to add redundant fields.

Limit optimization

When limit is used for paging queries, the performance gets worse when flipping backward. The principle to solve this problem is to narrow the scanning range, as shown below:

Select * from orders order by id desc limit 100000Bol 10

It takes 0.4 seconds

Select * from orders order by id desc limit 1000000 million 10

It takes 5.2 seconds

First, filter out ID to narrow down the scope of the query, as follows:

Select * from orders where id > (select id from orders order by id desc limit 1000000, 1) order by id desc limit 0Jing 10

It takes 0.5 seconds.

If the query condition has only the primary key ID, write as follows:

Select id from orders where id between 1000000 and 1000010 order by id desc

It takes 0.3 seconds

What if the above plan is still slow? Have to use cursors, interested friends read JDBC use cursors to achieve paging query method

Other databases

As a back-end developer, be sure to be proficient in MySQL or SQL Server as the storage core, and actively focus on NoSQL databases, which are mature and widely adopted to solve performance bottlenecks in specific scenarios.

Thank you for reading! This is the end of the article on "how to write a good SQL". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.