Relearning MySQL database 12: start by practicing sql statement optimization 07/01 Update SLTechnology News&Howtos

Relearning MySQL database 12: start by practicing sql statement optimization

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article is from the Internet.

This series of articles will be sorted out in my Java interview Guide warehouse on GitHub. Please check out more wonderful content in my warehouse.

Https://github.com/h3pl/Java-Tutorial

Have some trouble with Star if you like.

The article was first posted on my personal blog:

Www.how2playlife.com

This article is one of the "re-learning MySQL database" of Wechat official account [Java technology jianghu]. Part of this article comes from the Internet. In order to explain the topic of this article clearly and thoroughly, it also integrates a lot of technical blog content that I think is good. I quote some good blog articles among them. If there is any infringement, please contact the author.

This series of blog posts will show you how to start to advanced, from the basic usage of sql, from MySQL execution engine to index, transaction and other knowledge, to learn the implementation principles of MySQL-related technologies step by step, to better understand how to optimize sql based on this knowledge, to reduce SQL execution time, to analyze SQL performance through execution plans, and then to master-slave replication and master-slave deployment of MySQL. So that you can have a more complete understanding of the whole MySQL technical system and form your own knowledge framework.

If you have any suggestions or questions about this series of articles, you can also follow the official account [Java Technology jianghu] to contact the author. You are welcome to participate in the creation and revision of this series of blog posts.

Unless single-table data will continue to rise in the future, do not consider splitting at the beginning. Splitting will bring various complexities of logic, deployment, and operation and maintenance. Generally, tables dominated by integer values are below 10 million. It is not a big problem for string-based tables to be below 5 million. In fact, in many cases, the performance of a single MySQL table still has a lot of room for optimization, and even can normally support more than 10 million levels of data:

Field

Try to use TINYINT, SMALLINT, MEDIUM_INT as integer types instead of INT, and add UNSIGNED if non-negative

The length of VARCHAR allocates only the space that is really needed

Use enumerations or integers instead of string types

Try to use TIMESTAMP instead of DATETIME

Do not have too many fields in a single table. It is recommended that it be less than 20.

Avoid using NULL fields, which are difficult to query and optimize and take up extra index space

Using integers to save IP

Indexes

The more indexes, the better. To create targeted indexes according to the query, consider indexing the columns involved in the WHERE and ORDER BY commands. You can check whether indexes or full table scans are used according to EXPLAIN.

Try to avoid judging the NULL value of a field in the WHERE clause, otherwise it will cause the engine to give up using the index and do a full table scan

Fields with sparse value distribution are not suitable for indexing, such as "gender", which has only two or three values.

Character fields are indexed with prefixes only

The character field had better not be the primary key.

No foreign keys are needed, and the program guarantees the constraint.

Try not to use UNIQUE, and the program guarantees the constraint.

When using multi-column indexes, the order of ideas and query conditions are consistent, while unnecessary single-column indexes are deleted.

Query SQL

You can find the slower SQL by opening the slow query log

Do not do column operations: SELECT id WHERE age + 1 = 10, any column operation will result in a table scan, including database tutorial functions, evaluation expressions, etc. When querying, move the operation to the right of the equal sign as much as possible

Sql statements are as simple as possible: a sql can only operate on one cpu; large statements break down small statements to reduce lock time; a large sql can block the entire library

No SELECT *

Rewrite OR into IN: the efficiency of OR is n level, the efficiency of IN is log (n) level, and the number of in is recommended to be less than 200.

Without functions and triggers, implemented in the application

Avoid% xxx queries

Use less JOIN

Compare with the same type, such as the ratio of '123' to '123', 123 to 123

Try to avoid using the! = or operator in the WHERE clause, otherwise the engine will give up using the index and do a full table scan

For consecutive values, use BETWEEN instead of IN: SELECT id FROM t WHERE num BETWEEN 1 AND 5

List data do not take the whole table, use LIMIT to page, the number of each page is not too large

engine

At present, MyISAM and InnoDB engines are widely used:

MyISAM

The MyISAM engine is the default engine for MySQL 5.1 and earlier, with the following features:

Row locks are not supported. Locks are applied to all tables that need to be read when reading and exclusive locks are added to tables when writing

Transactions are not supported

Foreign keys are not supported

Security recovery after crash is not supported

While the table has a read query, new records can be inserted into the table.

Support for the first 500 characters of BLOB and TEXT, and full-text indexing

Support for delayed updating of indexes, greatly improving write performance

For tables that will not be modified, compressed tables are supported to greatly reduce disk space consumption

InnoDB

InnoDB becomes the default index after MySQL 5.5, which is characterized by:

Supports row locks and uses MVCC to support high concurrency

Support transaction

Foreign keys are supported

Support for security recovery after a crash

Full-text indexing is not supported

Generally speaking, MyISAM is suitable for SELECT-intensive tables, while InnoDB is suitable for INSERT and UPDATE-intensive tables.

0. Practice of sql optimization of massive data written by oneself

The first is the process of building tables and deriving data.

Reference https://nsimple.top/archives/mysql-create-million-data.html

Sometimes we need to test big data. Generally, we don't have so much data locally, so we need to generate some by ourselves. The following will take advantage of the characteristics of the memory table to generate millions of test data.

Create a temporary memory table, which will be faster to insert data.

SQL

-- create a temporary memory table DROP TABLE IF EXISTS `memory roomy` CREATE TABLE `vote_ int (10) unsigned NOT NULL AUTO_INCREMENT, `user_ id` varchar (20) NOT NULL DEFAULT'', `vote_ Num` int (10) unsigned NOT NULL DEFAULT '0mm, `group_ id` int (10) unsigned NOT NULL DEFAULT' 0mm, `status` tinyint (2) unsigned NOT NULL DEFAULT '1century, `create_ time`datetime NOT NULL DEFAULT' 0000-00-0000: 000000, PRIMARY KEY (`id`) KEY `index_user_ id` (`user_ id`) USING HASH) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 -create a normal table to be used as a test case to simulate big data

SQL

DROP TABLE IF EXISTS `vote_ record` CREATE TABLE `vote_ int (10) unsigned NOT NULL AUTO_INCREMENT, `user_ id` varchar (20) NOT NULL DEFAULT''COMMENT' user Id', `vote_ num` int (10) unsigned NOT NULL DEFAULT'0' COMMENT 'voter count', `group_ id` int (10) unsigned NOT NULL DEFAULT'0' COMMENT 'user group id 0-inactive user 1-regular user 2-vip user 3-administrator user' `status` tinyint (2) unsigned NOT NULL DEFAULT'1' COMMENT 'status 1-normal 2-deleted', `status` int (10) unsigned NOT NULL DEFAULT '0000-00-0000: 00create_ 00' COMMENT' creation time, PRIMARY KEY (`id`), KEY `statusid` (`statusid`) USING HASH COMMENT 'user ID hash index') ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' voting record table' For the randomness and authenticity of the data, we need to create a function that generates a random string of length n.

SQL

-- create a function DELIMITER / /-- to generate a random string of length n. Modify MySQL delimiter:'//'DROP FUNCTION IF EXISTS `rand_ string` / / SET NAMES utf8 / / CREATE FUNCTION `rand_ string` (n INT) RETURNS VARCHAR 'utf8'BEGIN DECLARE char_str varchar' utf8'BEGIN DECLARE char_str varchar 'DEFAULT' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'; DECLARE return_str varchar 'DEFAULT'; DECLARE i INT DEFAULT 0; WHILE I

< n DO SET return_str = concat(return_str, substring(char_str, FLOOR(1 + RAND()*62), 1)); SET i = i+1; END WHILE; RETURN return_str;END //为了操作方便，我们再创建一个插入数据的存储过程 SQL -- 创建插入数据的存储过程DROP PROCEDURE IF EXISTS `add_vote_record_memory` //CREATE PROCEDURE `add_vote_record_memory`(IN n INT)BEGIN DECLARE i INT DEFAULT 1; DECLARE vote_num INT DEFAULT 0; DECLARE group_id INT DEFAULT 0; DECLARE status TINYINT DEFAULT 1; WHILE i < n DO SET vote_num = FLOOR(1 + RAND() * 10000); SET group_id = FLOOR(0 + RAND()*3); SET status = FLOOR(1 + RAND()*2); INSERT INTO `vote_record_memory` VALUES (NULL, rand_string(20), vote_num, group_id, status, NOW()); SET i = i + 1; END WHILE;END //DELIMITER ; -- 改回默认的 MySQL delimiter：';'开始执行存储过程，等待生成数据(10W条生成大约需要40分钟) SQL -- 调用存储过程生成100W条数据CALL add_vote_record_memory(1000000);查询内存表已生成记录(为了下步测试，目前仅生成了105645条) SQL SELECT count(*) FROM `vote_record_memory`;-- count(*)-- 105646把数据从内存表插入到普通表中(10w条数据13s就插入完了) SQL INSERT INTO vote_record SELECT * FROM `vote_record_memory`;查询普通表已的生成记录 SQL SELECT count(*) FROM `vote_record`;-- count(*)-- 105646如果一次性插入普通表太慢，可以分批插入，这就需要写个存储过程了： SQL -- 参数n是每次要插入的条数-- lastid是已导入的最大idCREATE PROCEDURE `copy_data_from_tmp`(IN n INT)BEGIN DECLARE lastid INT DEFAULT 0; SELECT MAX(id) INTO lastid FROM `vote_record`; INSERT INTO `vote_record` SELECT * FROM `vote_record_memory` where id >

Lastid LIMIT ninterend calls the stored procedure:

SQL

-- call stored procedure to insert 60w CALL copy_data_from_tmp (600000)

SELECT * FROM vote_record

Full table query

Open the slow log after the table is built, refer to the following example, and then learn to use explain. The location of the windows slow log is on disk C. in addition, you can also record the slow log using the client tool, so you don't have to use the command line to perform the test, otherwise the large table data will be displayed on the command line for a long time.

1 full table scan select * from vote_record

Slow log

SET timestamp=1529034398; select * from vote_record

Time: 2018-06-15T03:52:58.804850Z

User @ Host: root [root] @ localhost [:: 1] Id: 74

Query_time: 3.166424 Lock_time: 0.000000 Rows_sent: 900500 Rows_examined: 999999

It took 3 seconds, and the threshold I set was one second. So I recorded it.

Explain execution plan

Id select_type table partitions type possible_keys key key_len ref rows filtered Extra

1 SIMPLE vote_record\ N ALL\ N\ N 996507 100.00\ N

The full table scan takes more than 3 seconds and no index is needed.

2 select * from vote_record where vote_num > 1000

There is no index, so it is equivalent to a full table scan, which is about 3.5 seconds.

3 select * from vote_record where vote_num > 1000

Indexed create

CREATE INDEX vote ON vote_record (vote_num)

Explain View execution Plan

Id select_type table partitions type possible_keys key key_len ref rows filtered Extra

1 SIMPLE vote_record\ N ALL votenum,vote\ N\ N 996507 50.00 Using where

The index is still not used because it does not match the leftmost prefix match. The query takes about 3.5 seconds.

Finally, modify the sql statement.

EXPLAIN SELECT * FROM vote_record WHERE id > 0 AND vote_num > 1000

Id select_type table partitions type possible_keys key key_len ref rows filtered Extra

1 SIMPLE vote_record\ N range PRIMARY,votenum,vote PRIMARY 4\ N 498253 50.00 Using where

The index is used, but only the primary key index is used. Modify it again.

EXPLAIN SELECT * FROM vote_record WHERE id > 0 AND vote_num = 1000

Id select_type table partitions type possible_keys key key_len ref rows filtered Extra

1 SIMPLE vote_record\ N index_merge PRIMARY,votenum,vote votenum,PRIMARY 8 index_merge PRIMARY,votenum,vote votenum,PRIMARY 4\ N 51 100.00 Using intersect (votenum,PRIMARY); Using where

Two indexes are used, votenum,PRIMARY.

That's why.

Look at one more sentence.

EXPLAIN SELECT * FROM vote_record WHERE id = 1000 AND vote_num > 1000

Id select_type table partitions type possible_keys key key_len ref rows filtered Extra

1 SIMPLE vote_record\ N const PRIMARY,votenum PRIMARY 4 const 1 100.00\ N

Only the primary key uses the index. This is because only the leftmost prefix index can use > or (`YSB`.`s`.`s _ id`

< EXISTS >

(SELECT 1 FROM `YSB`.`SC``sc` WHERE ((`YSB`.`sc`.`c _ id` = 0) AND (`YSB`.`sc`.`score` = 100) AND (

< CACHE >

(`YSB`.`s`.`s _ id`) = `YSB`.`sc`.`s _ id`)

Add: here are netizens asking how to view the optimized sentences.

The methods are as follows:

Execute in the command window

There is type=all.

According to my previous idea, the order in which the sql should be executed is to execute the subquery first.

Select s_id from SC sc where sc.c_id = 0 and sc.score = 100

Time: 0.001s

The results are as follows:

And then execute.

Select s. * from Student s where s.s_id in (7, 29, 5, 000)

Time: 0.001s

This is quite fast. Instead of executing the inner query first, Mysql optimizes sql into an exists clause, and EPENDENT SUBQUERY appears.

Mysql executes the outer query first, and then the inner query, so it loops 700071170077 times.

What about using join queries instead?

SELECT s.* from Student s INNER JOIN SC sc on sc.s_id = s.s_id where sc.c_id=0 and sc.score=100

Here, in order to reanalyze the join query, delete the index sc_c_id_index,sc_score_index temporarily

Execution time: 0.057s

Efficiency has improved, take a look at the implementation plan:

There is a situation of joining tables here. I wonder if it is necessary to create an index on the s_id of the sc table.

CREATE index sc_s_id_index on SC (s_id)

Show index from SC

Performing a join query

Time: 1.076s, even the time has become longer, what is the reason? View the execution plan:

The optimized query statement is:

SELECT `YSB`.`s`.`s _ id`AS `sid`, `YSB`.`s`.`name`AS `name`FRom `YSB`.`Student``s`JOIN `YSB`.`SC`SC``sc`Were ((`YSB`.`sc`.`s _ id`) AND (`YSB`.`sc`.`score` = 100) AND (`YSB.`sc`.`c _ id` = 0))

It seems that the join query is done first, and then the where filtering is performed.

Go back to the previous execution plan:

Here, where filtering is done first, and then tables are connected. The execution plan is not fixed, so let's first take a look at the standard sql execution order:

Normally, join is filtered first and then where, but in our case, if we join first, 70w pieces of data will be sent to join for exercise, so execute where first.

Filtering is a wise solution. Now, in order to eliminate the query optimization of mysql, I write an optimized sql myself.

SELECT s.*FROM (SELECT * FROM SC sc WHERE sc.c_id = 0 AND sc.score = 100) tINNER JOIN Student s ON t.s_id = s.s_id

That is, first perform the filtering of the sc table, and then join the table, the execution time is: 0.054s

It takes about the same time as when the s_id index was not built before.

View the execution plan:

It is much more efficient to extract sc first and then join tables. The problem now is that scanned tables appear when extracting sc, so it is clear that relevant indexes need to be established.

CREATE index sc_c_id_index on SC (c_id); CREATE index sc_score_index on SC (score)

Then execute the query:

SELECT s.*FROM (SELECT * FROM SC sc WHERE sc.c_id = 0 AND sc.score = 100) tINNER JOIN Student s ON t.s_id = s.s_id

The execution time is 0.001s, which is 50 times faster.

Execute the plan:

We will see that indexes are used to extract sc and then connect tables.

So let's execute the sql again.

SELECT s.* from Student s INNER JOIN SC sc on sc.s_id = s.s_id where sc.c_id=0 and sc.score=100

Execution time 0.001s

Execute the plan:

Here, mysql optimizes the query statement, first performs where filtering, and then performs join operations, and indexes are all used.

Summary:

The efficiency of 1.mysql nested subquery is indeed relatively low.

two。 It can be optimized into a join query

3. Set up an appropriate index

4. Learn to analyze the sql execution plan, and mysql will optimize the sql, so it is important to analyze the implementation plan.

Due to the lack of time, this article will stop here and share other sql optimization experiences later.

Third, how to use the primary key index to optimize the paging search of massive data

Mysql million-level paging optimization general paging

Data paging is very common in web pages. Paging is usually limit start,offset, and then start is calculated according to the page number page.

Select * from user limit * * 1 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2 / 2, select * from user limit * / 1 / 2 /

When this kind of paging is in the hundreds of thousands, the paging efficiency will be relatively low, and MySQL needs to be calculated all the time from the beginning, which greatly affects the efficiency.

SELECT * from user limit * * 100001s explain SELECT * * 100001s explain SELECT * * 100001s * * 100001s * * 100001s * * 100001s * * 100001s * * 100001s * * 100001s * * 100001s

We can parse the following statement with explain, without using any index, and the number of rows executed by MySQL is 16W, so we can use the index to implement paging.

Optimize paging

Use primary key indexes to optimize data paging

Select * from user where id > (select id from user where id > = * * 100000mm * limit * * 1mm *) limit * * 20cycles; / / time * * 0room.003s

Using the explain parsing statement, MySQL scanned 8 rows this time, and the time was greatly reduced.

Explain select * from user where id > (select id from user where id > = * * 100000mm * limit * * 1mm *) limit * * 20cycles;! [] (https://oscimg.oschina.net/oscnet/05fffbffc5e3ef9add4719846ad53f25099.jpg) Summary

When the amount of data is large, we try to use the index to optimize the statement. If the above optimization method is not a primary key index, the query efficiency is even lower than the first one. We can first use explain to analyze statements to see the execution order and performance of statements.

Reproduced at: https://my.oschina.net/alicoder/blog/3097141

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.