In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the MySQL in the count (*) and count (1) which is fast, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian with you to understand.
Let's first come to the conclusion that there is little difference between the two performances.
1. Practice
I have prepared a table with 100W items of data, and the table structure is as follows:
CREATE TABLE `user` (`id` int (11) unsigned NOT NULL AUTO_INCREMENT, `username` varchar (255) DEFAULT NULL, `address` varchar (255) DEFAULT NULL, `password` varchar (255) DEFAULT NULL, PRIMARY KEY (`id`) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
As you can see, there is a primary key index. Let's count the number of records in the table in two ways, as follows:
As you can see, the execution efficiency of the two SQL is actually about the same, both 0.14s.
Let's take a look at two other statistics:
Id is the primary key, and username and address are normal fields. It can be seen that there is also a lost advantage in using id statistics. Brother Song here because the test data template is relatively small, so the effect is not obvious, friends can increase the amount of test data, then this difference will be more obvious.
So what is the reason for this difference? next, let's briefly analyze it.
2. Explain analysis
Let's first take a look at the different execution plans of these SQL using explain:
As you can see, the execution plan of the first three statistical methods is the same, and the last two are the same. Let me compare the different items in explain with you:
Type: the type value of the first three is index, which means a full index scan, which means going through the entire index (note that the index is not the entire table); the type value of the last two is all, indicating a full table scan, that is, the index will not be used.
Key: this indicates that MySQL decides which index to use to optimize access to the table, PRIMARY means primary key index is used, and NULL means no index is used.
Key_len: this represents the key length used by MySQL, because our primary key type is INT and is not empty, so the value is 4.
Extra: the Using index in this means that the optimizer only needs to access the index to get the data it needs (no need to return to the table).
Through explain, we can also roughly see that the first three statistical methods are more efficient (because indexes are used), while the latter two are relatively less efficient (no indexes are used and full table scans are needed). The above analysis is not enough, let's analyze it from a theoretical point of view.
3. Principle Analysis 3.1 Primary key Index and General Index
Before I start the principle analysis, I'd like to take a look at the B+ tree, which is important for us to understand what's next. As we all know, the storage structure of indexes in InnoDB is B + tree (as for what is B + tree and what is the difference between B tree and B tree, this article will not discuss this, both of them can be integrated into an article), and the storage of primary key index is different from that of ordinary index.
The following figure shows the primary key index:
As you can see, in the primary key index, the leaf node holds the data for each row. In the ordinary index, the leaf node saves the primary key value. when we use the ordinary index to search for data, we first find the primary key in the leaf node, and then take the primary key to find the data in the primary key index, which is equivalent to doing two lookups. This is what we usually call a table operation.
3.2 principle analysis
I don't know if my friends have noticed that when we learn MySQL, count functions fall into the category of aggregate functions, that is, avg, sum, etc., and count functions are grouped together, indicating that it is also an aggregate function. Since it is an aggregate function, it is necessary to judge the returned result set row by row, which involves a question: what is the returned result? Let's look at it separately: for the select count (1) from user; query, the InnoDB engine finds the smallest index tree to traverse (not necessarily the primary key index), but does not read the data, but reads a leaf node, returns 1, and finally accumulates the results. For the select count (id) from user; query, the InnoDB engine traverses the entire primary key index, then reads the id and returns, but because id is the primary key, it is on the leaf node of the B+ tree, so the process does not involve random IO (there is no need to go back to the table and other operations to get the data page), and the performance is OK. For the query select count (username) from user;, the InnoDB engine will scan the entire table, read the username field of each row and return it. If username sets not null when defining, then count the number of username directly; if username does not set not null when defining, first determine whether username is empty, and then count it. Finally, let's talk about select count (*) from user;. What is special about this SQL is that it has been optimized by MySQL. When MySQL sees count (*), it knows that you want to count the total number of records. It will find the smallest index tree to traverse, and then count the records. Because the leaf node of a primary key index (clustered index) is data, while the leaf node of a normal index is a primary key value, the index tree of a normal index is smaller. In the above case, however, we only have the primary key index, so we end up using the primary key index. Now, if I modify the above table to add an index for the username field, then we look at the execution plan of explain select count (*) from user;:
As you can see, the index used at this time is the username index, which is consistent with the results of our previous analysis. As we can see from the above description, the first query has the highest performance, the second (because you need to read the id and return it), the third (because you need a full table scan), and the fourth query performance is close to the first.
4. Where is MyISAM?
As some of you may know, the select count (*) from user; operation in the MyISAM engine is very fast, because MyISAM stores the number of rows in the table directly on disk and reads them when needed, so it is very fast. The MyISAM engine does this mainly because it does not support transactions, so its statistics are actually very easy, just add a row of records. But our commonly used InnoDB can't do this! Why? Because InnoDB supports transactions! In order to support transactions, InnoDB introduces MVCC multi-version concurrency control, so there may be dirty reading, phantom reading and non-repeatable reading when reading data. Therefore, InnoDB needs to take out each row of data to determine whether the data is visible to the current session, and if so, count the data, otherwise it will not be counted.
Thank you for reading this article carefully. I hope the article "count (*) and count (1) in MySQL" shared by the editor will be helpful to everyone. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.