In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Today, the editor will share with you the relevant knowledge about how to use group by in MySql. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.
We often use group by in our daily development. Dear friends, do you know how group by works? What's the difference between group by and having? What is the optimization idea of group by? What are the problems you need to pay attention to when using group by? This article will learn with you to conquer group by~.
A simple example of using group by
How group by works
The difference between group by + where and group by + having
Optimization idea of group by
Pay attention to the use of group by
How to optimize a slow production SQL
[related recommendation: mysql video tutorial]
1. A simple example of using group by
Group by is generally used for grouping statistics, and the logic it expresses is to group according to certain rules. Let's start with a simple example and review it together.
Suppose you use an employee table with the following structure:
CREATE TABLE `room` (`id` bigint (11) NOT NULL AUTO_INCREMENT COMMENT 'key id', `id_ card` varchar (20) NOT NULL COMMENT' ID number', `name` varchar (64) NOT NULL COMMENT 'name', `age`int (4) NOT NULL COMMENT 'age', `city`varchar (64) NOT NULL COMMENT 'city', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' employee table'
The data of the table stock are as follows:
We now have a need to count the number of employees in each city. The corresponding SQL statement can be written as follows:
Select city, count (*) as num from staff group by city
The implementation results are as follows:
The logic of this SQL statement is clear, but what is its underlying execution flow?
2. Group by principle analysis 2.1explain analysis
Let's use explain to check the execution plan first.
Explain select city, count (*) as num from staff group by city
The Using temporary of the Extra field indicates that a temporary table was used when performing the grouping
The Using filesort of the Extra field indicates that sorting is used
How does group by use temporary tables and sorting? Let's take a look at the execution flow of this SQL.
2.2 simple execution flow of group by: explain select city, count (*) as num from staff group by city
Let's take a look at the execution process of this SQL.
Create a temporary memory table with two fields city and num
Scan the records of staff in the whole table and take out the records of city ='X' in turn.
Determine whether there are rows in the temporary table that are city='X', and insert a record if not (XPhone1)
If there is a row of city='X' in the temporary table, increase the num value of the row x by 1
After the traversal is completed, the result set is sorted according to the field city and returned to the client.
The execution diagram of this process is as follows:
What is the sort of temporary table?
That is, put the fields that need to be sorted into sort buffer, and then return. Note here that sorting is divided into full-field sorting and rowid sorting.
If it is a full-field sort, you need to put all the fields returned by the query into sort buffer, and then return them directly according to the sort fields.
In the case of rowid sorting, just put the sorted fields into sort buffer, then return to the table one more time, and then return.
How to determine whether it is a full-field sort or a rowid sort? Controlled by a database parameter, max_length_for_sort_data
For those of you who are interested in sorting, you can read my article.
Take a look at it and understand: order by detailed explanation
3. The difference between where and having
The execution flow of group by + where
The execution flow of group by + having
At the same time, the execution order of where, group by and having
3.1Execution flow of group by + where
Some partners think that the SQL in the previous section is too simple, if the where condition is added, and the where condition column is indexed, what is the implementation process?
OK, let's add a condition to it and add an index of idx_age, as follows:
Select city, count (*) as num from staff where age > 30 group by city;// indexed alter table staff add index idx_age (age)
Let's take a look at the expain analysis:
Explain select city, count (*) as num from staff where age > 30 group by city
From the execution of the plan results from explain, we can see that the query condition hits the index of idx_age, and temporary tables and sorting are used.
Using index condition: indicates index push-down optimization, filtering data according to the index as much as possible, and then returning it to the server layer to filter according to other where conditions. Why is there an index push-down for a single index here? The presence of explain does not necessarily mean that index push-down is used, it just means that it can be used, but not necessarily used. If you have any thoughts or questions, you can join me on Wechat for discussion.
The execution process is as follows:
1. Create a temporary memory table with two fields, city and num.
2. Scan the index tree idx_age to find the primary key ID that is older than 30 years old.
3. Through the primary key ID, go back to the table and find city ='X'
Determine whether there are rows in the temporary table that are city='X', and insert a record if not (XPhone1)
If there is a row of city='X' in the temporary table, increase the num value of the row x by 1
4. Continue to repeat steps 2 and 3 to find all the data that meet the conditions.
5. Finally, the result set is sorted according to the field city and returned to the client.
3.2 execution of group by + having
If you want to check the number of employees in each city and get cities with no less than 3 employees, having can solve your problem very well, SQL wrote:
Select city, count (*) as num from staff group by city having num > = 3
The query results are as follows:
Having is called a packet filter condition, and it operates on the returned result set.
3.3.The execution sequence of where, group by and having at the same time
If a SQL contains both where, group by, and having clauses, what is the order of execution?
For example, this SQL:
Select city, count (*) as num from staff where age > 19 group by city having num > = 3
Execute the where clause to find the data of employees over the age of 19
The group by clause groups employee data by city.
For the city group formed by the group by clause, run the aggregation function to calculate the number of employees in each group
Finally, use the having clause to select the city group with more than or equal to 3 employees.
3.4 Summary of differences between where and having
The having clause is used for filtering after grouping, and the where clause is used for row conditional filtering.
Having usually appears together with group by and aggregate functions, such as (count (), sum (), avg (), max (), min ()).
Aggregate functions cannot be used in where conditional clauses, but having clauses can.
Having can only be used after group by, and where executes before group by
4. Problems to pay attention to when using group by
There are several main points to pay attention to when using group by:
Does group by have to be used with aggregate functions?
The field of group by must appear in select.
Slow SQL problems caused by group by
Does group by have to be used with aggregate functions?
Group by means grouping statistics, which is usually used with aggregate functions such as (count (), sum (), avg (), max (), min ()).
Number of count ()
Sum of sum ()
Avg () average
Maximum value of max ()
Minimum min ()
Is it okay if it is not used with the aggregate function?
I use Mysql 5.7. it's okay. No error is reported, and the first row of data in the packet is returned.
For example, this SQL:
Select city,id_card,age from staff group by city
The query result is
By comparison, what is returned is the first piece of data of each group.
Of course, group by is usually used with aggregate functions, unless there are some special scenarios, such as if you want to remove duplicates, it is also possible to reuse distinct.
4.2 the fields that follow group by must appear in select.
Not necessarily, such as the following SQL:
Select max (age) from staff group by city
The implementation results are as follows:
The grouping field city is not followed by select and will not report an error. Of course, this may have something to do with different databases and different versions. When you use it, you can verify it first. There is a saying that the truth learned from paper is not profound and must be put into practice.
4.3 slow SQL problems caused by group by
When it comes to the most important concern, the improper use of group by can easily lead to slow SQL problems. Because it uses both temporary tables and sorting by default. Disk temporary tables may also be used sometimes.
If during execution, you will find that the memory temporary table size has reached the upper limit (the parameter that controls this upper limit is tmp_table_size), the memory temporary table will be converted into a disk temporary table.
If the amount of data is large, it is likely that the disk temporary table required by this query will take up a lot of disk space.
These are all x factors that lead to slow SQL. Let's discuss the optimization scheme together.
5. Some optimization schemes of group by
From which direction to optimize?
Direction 1: since it will sort by default, let's just not line it up.
Direction 2: since temporary tables are the X factor affecting group by performance, can we not use temporary tables?
Let's think about why we need temporary tables to execute group by statements. The semantic logic of group by is to count the number of different values. If these values are ordered from the beginning, should we just scan down the statistics and not use a temporary table to record and count the results?
Fields after group by are indexed
Order by null does not need to sort
Try to use only memory temporary tables
Use SQL_BIG_RESULT
5.1 fields after group by are indexed
How do you ensure that the values of the fields after group by are ordered in the first place? Indexing, of course.
Let's go back to this SQL.
Select city, count (*) as num from staff where age= 19 group by city
Its execution plan
If we add a federated index idx_age_city (age,city) to it
Alter table staff add index idx_age_city (age,city)
When you look at the execution plan, you find that you don't need sorting or temporary tables.
Adding an appropriate index is the simplest and most effective way to optimize group by.
5.2 order by null does not need sorting
Not all scenarios are suitable for indexing. If we encounter scenarios that are not suitable for indexing, how can we optimize them?
If your requirements do not need to sort the result set, you can use order by null.
Select city, count (*) as num from staff group by city order by null
The execution plan is as follows. There is no filesort left.
5.3 try to use only memory temporary tables
If there is not much data to be counted in group by, we can use only memory temporary tables as far as possible, because it is time-consuming if disk temporary tables are used in the group by process because the data cannot be put down. Therefore, the tmp_table_size parameter can be appropriately increased to avoid using disk temporary tables.
5.4 using SQL_BIG_RESULT optimization
What if the amount of data is too large? You can't increase the tmp_table_size indefinitely, can you? But you can't just watch the data put into the memory temporary table first, and then switch to the disk temporary table as the data insertion reaches the upper limit, right? This is a little unintelligent.
Therefore, if the estimated amount of data is large, we use the SQL_BIG_RESULT prompt to use the disk temporary table directly. The MySQl optimizer found that the disk temporary table is B + tree storage, and the storage efficiency is not as efficient as the array. So it will be saved directly in the array.
The example SQl is as follows:
Select SQL_BIG_RESULT city, count (*) as num from staff group by city
As you can see in the Extra field of the execution plan, the execution no longer uses temporary tables, but only sorts
The execution process is as follows:
Initialize sort_buffer and put in the city field
Scan the table staff, take out the values of city in turn, and store them in sort_buffer
When the scan is complete, sort the city field of sort_buffer
When the sorting is complete, you get an ordered array.
Count the number of occurrences of each value according to the ordered array.
6. How to optimize a slow production SQL
Recently encountered a production slow SQL, related to group by, show you how to optimize ha.
The table structure is as follows:
CREATE TABLE `stat` (`id` bigint (11) NOT NULL AUTO_INCREMENT COMMENT 'key id', `id_ card` varchar (20) NOT NULL COMMENT' ID number', `name`varchar (64) NOT NULL COMMENT 'name', `status`varchar (64) NOT NULL COMMENT'Y-activated I-initialization D-deleted R-auditing', `age`int (4) NOT NULL COMMENT 'age', `city`varchar (64) NOT NULL COMMENT 'city', `city`varchar (64) NOT NULL COMMENT 'enterprise' `legal_cert_ no` varchar (64) NOT NULL COMMENT 'legal person number', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' employee Table'
The SQL of the query looks like this:
Select * from T1 where status = # {status} group by # {legal_cert_no} these are all the contents of the article "how to use group by in MySql". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.