In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article is about how to optimize MySQL anti-connection. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Overall, this environment is relatively busy, with about 200 threads.
# mysqladmin pro | less | wc-l
two hundred and thirty five
Check the slow log with curiosity, locate this statement immediately, and have been desensitized.
# Time: 161013 9:51:45
# User@Host: root [root] @ localhost []
# Thread_id: 24630498 Schema: test Last_errno: 1160 Killed: 0
# Query_time: 61213.561106 Lock_time: 0.000082 Rows_sent: 7551 Rows_examined: 201945890920 Rows_affected: 0 Rows_read: 7551
# Bytes_sent: 0 Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 2F8E5A82
SET timestamp=1476323505
Select account from t_fund_info
Where money > = 300 and account not in
(select distinct (login_account) from t_user_login_record where login_time > = '2016-06-01')
Into outfile'/ tmp/data.txt'
From the slow log, the execution time is 61213s, which is quite amazing, which means that the statement has been running all day.
This aroused my curiosity and interest, and this question needs to be solved.
The table t_fund_info has nearly 2 million data, there is a primary key in the id column, and the unique index is on account.
CREATE TABLE `troomfundinfo`
. . .
PRIMARY KEY (`id`)
UNIQUE KEY `roomt` (`roomt`)
) ENGINE=InnoDB AUTO_INCREMENT=1998416 DEFAULT CHARSET=utf8
Table t_user_login_record has more than 20 million data, and there is a primary key column id.
CREATE TABLE `tresume username log _ record`
. . .
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=22676193 DEFAULT CHARSET=utf8
It can be seen from the statement that we are doing a large batch query, hoping to generate a text file from the query results, but the filtering conditions are very limited. At present, according to the query, it must be a full table scan.
First of all, a brief look at the filter conditions, from the t_fund_info table, according to a filter condition can filter out the vast majority of the data, get more than 10, 000 data, or more than ideal.
> select count (*) from t_fund_info where money > = 300
+-+
| | count (*) |
+-+
| | 13528 |
+-+
1 row in set (0.99 sec)
The bottleneck of that problem seems to be the later sub-query.
Put the following statement into a SQL script query.sql
Select distinct (login_account) from t_user_login_record where login_time > = '2016-06-01'
It takes about 1 minute to export the data.
Time mysql test
< query.sql >Query_rt.log
Real 0m59.149s
User 0m0.394s
Sys 0m0.046s
The filtered data has more than half a million, which is still a relatively ideal filtering situation.
# less query_rt.log | wc-l
548652
Let's parse this statement and see how the Not in condition in it is parsed.
Explain extended select account from t_fund_info
Where money > = 300 and account not in
(select distinct (login_account) from t_user_login_record where login_time > = '2016-06-01')
Show warnings
The results are as follows:
Message: select `test`.`t _ fund_ info`.`roomt`AS ``test`.`t _ fund_ info`where ((`test`.`t _ fund_ info`.`money` > = 300) and (not ((`test`.`t _ fund_ info`.`roomt`) (select distinct 1 from `test`.`t _ user_login_ record` where ((`test`.`t _ user_login_ record`.`login _ time` > = '2016-06-01') and (`test`.`t _ fund_ info`.`roomt`) = `test`.`t _ user_login_ record`.login _ record`) or isnull (`test`.`t _ user_login_ record`.login _ recordt`)) having (`test`.`t _ user_login_ record.login `_ roomt`)
You can see that the whole parsing process is very complex, and a simple statement has become so complex after parsing.
Because there is still much less room for optimization in MySQL than in Oracle, I decided to try to optimize step by step. Because this environment is still very important, I used mysqldump to export data from the library side, imported it to another test environment, and let go of the test.
First of all, for the part of not in, whether it is because the cost of generating a temporary table is too high, so I decided to set up a temporary table to cache the data of the subquery.
> create table test_tab as select distinct (login_account) login_account from t_user_login_record where login_time > = '2016-06-01'
Query OK, 548650 rows affected (1 min 3.78 sec)
Records: 548650 Duplicates: 0 Warnings: 0
It is easy to view the temporary table in this way, and the result can be obtained in less than 1 second.
> select count (*) from test_tab
+-+
| | count (*) |
+-+
| | 548650 |
+-+
1 row in set (0.38 sec)
Then check again to see if the parsing of the query using the temporary table will improve.
Explain extended select account from t_fund_info
Where money > = 300 and account not in (select login_account from test_tab)
Show warnings
It is found that the improvement of the temporary table is not very effective.
| | Note | 1003 | select `test`.`t _ fund_ info`.`roomt`AS `test`.`t _ fund_ info`where ((`test`.`t _ fund_ info`.`money` > = 1003) and (not ((`test`.`t _ fund_ info`.`roomt`, (select 1 from `test`.`test _ fund_ info`.`roomt`) = `test`.test _ tab`.login _ roomt`) or isnull (`test`.`test _ tab`.login _ roomt`) having (`test`.`test _ tab`.login _ roomt`) |
Whether it is because the amount of data in the subquery is too large that it is too slow to return the table in the whole unjoin query process, then I will reduce the number of data items in the subquery.
Select account from t_fund_info
Where money > = 300and not exists (select login_account from test_tab where login_account=t_fund_info.account limit 1pl 10)
This way is still very stuck, lasted for nearly half an hour or no response, so give up decisively.
Is t_fund_info 's filtering query causing performance problems? we also create a temporary table.
> create table test_tab1 as select account from t_fund_info
-> where money > = 300
Query OK, 13528 rows affected (1.38 sec)
Records: 13528 Duplicates: 0 Warnings: 0
The effect of the query again is still not ideal.
Select account from test_tab1
Where not exists (select login_account from test_tab where login_account=test_tab1.account limit 1 and 10)
It lasted for more than 20 minutes, but there was no response, so I gave up decisively.
At this point, you can think of an index. We create an index on the temporary table test_tab.
> create index ind_tmp_login_account on test_tab (login_account)
Query OK, 0 rows affected (4.31 sec)
Records: 0 Duplicates: 0 Warnings: 0
Indexes are also created on the temporary table test_tab1.
> create index ind_tmp_account on test_tab1 (account)
Query OK, 0 rows affected (0.18 sec)
Records: 0 Duplicates: 0 Warnings: 0
Once again, the performance becomes very good, with a running time of 0.15 seconds. I can't believe it.
Explain select account from test_tab1
Where not exists (select login_account from test_tab where login_account=test_tab1.account)
11364 rows in set (0.15 sec)
The implementation plan is as follows:
It can be seen that through this split, continuous speculation and exclusion, some ideas have been found.
We began to grasp the nature of the problem.
First delete the index on test_tab1 to see how it works.
> alter table test_tab1 drop index ind_tmp_account
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
The statement is as follows, and the execution time is 0.15 seconds
Select account from test_tab1
Where not exists (select login_account from test_tab where login_account=test_tab1.account)
+-+
11364 rows in set (0.15 sec)
Whether there will be a big difference in the way of not in, lasting 0.18 seconds, there is a difference, but the difference is not big.
Select account from test_tab1
Where account not in (select login_account from test_tab)
+-+
11364 rows in set (0.18 sec)
We gradually restore the original query and remove the temporary table test_tab1, and the whole query lasts 1.12 seconds.
Select account from t_fund_info
Where money > = 300 and account not in (select login_account from test_tab)
+-+
11364 rows in set (1.12 sec)
The content parsed using explain extended is as follows:
Message: select `test`.`t _ fund_ info`.`roomt`AS ``test`.`t _ fund_ info`where ((`test`.`t _ fund_ info`.`money` > = 300) and ((`test`.`t _ fund_ info`.`roomt`, ((`test`.`t _ fund_ info`.`roomt`) in test_tab on ind_tmp_login_account checking NULL having (`test`.`test _ tab`.`login _ roomt`)
At this time, the problem has been basically defined. In the anti-join query, in this problem scenario, we need to add an index to the table of the subquery based on login_account, which can be mapped with the outer query field to improve the query efficiency.
Of course, in a system with a large amount of data and relatively busy business, adding an index with temporary requirements may not be a good solution. But we are still in the test environment to experience.
> create index ind_tmp_account1 on t_user_login_record (login_account)
Query OK, 0 rows affected (4 min 45.48 sec)
Records: 0 Duplicates: 0 Warnings: 0
The process of adding an index lasted nearly 4 minutes, and at this time we used the initial query statement, what was the performance?
Select account from t_fund_info where money > = 300and account not in (select distinct (login_account) from t_user_login_record where)
+-+
11364 rows in set (2.52 sec)
It only takes 2.52 seconds to complete the query results of the previous 20 hours, and the performance is very different.
But then again, a batch query can be executed from the library, and it is also a good choice to create such an index from the library, or to create a temporary table and create an index on the temporary table, which is the point of the temporary table. not to cache query results but to create indexes to improve data filtering efficiency.
The problem here is that temporary tables have only one field, so what's the point of creating an index?
Let me draw a picture to explain.
First of all, the data of this query is based on the filtering condition of t_fund_info, 10, 000 pieces of data are filtered from the 2 million data, and then the two fields are related by the condition of account=login_account, rather than the filter condition login_time of the subquery, and then the login_time is filtered after the value of account is filtered. Finally, the data is chosen according to the logic of not in, and the whole data set will be greatly reduced. In this way, the subquery table has tens of thousands of rows, and the difference in performance is not exponential.
Thank you for reading! This is the end of this article on "how to optimize MySQL anti-connection". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.