Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use group by in MySql

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you the relevant knowledge about how to use group by in MySql. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

We often use group by in our daily development. Dear friends, do you know how group by works? What's the difference between group by and having? What is the optimization idea of group by? What are the problems you need to pay attention to when using group by? This article will learn with you to conquer group by~.

A simple example of using group by

How group by works

The difference between group by + where and group by + having

Optimization idea of group by

Pay attention to the use of group by

How to optimize a slow production SQL

[related recommendation: mysql video tutorial]

1. A simple example of using group by

Group by is generally used for grouping statistics, and the logic it expresses is to group according to certain rules. Let's start with a simple example and review it together.

Suppose you use an employee table with the following structure:

CREATE TABLE `room` (`id` bigint (11) NOT NULL AUTO_INCREMENT COMMENT 'key id', `id_ card` varchar (20) NOT NULL COMMENT' ID number', `name` varchar (64) NOT NULL COMMENT 'name', `age`int (4) NOT NULL COMMENT 'age', `city`varchar (64) NOT NULL COMMENT 'city', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' employee table'

The data of the table stock are as follows:

We now have a need to count the number of employees in each city. The corresponding SQL statement can be written as follows:

Select city, count (*) as num from staff group by city

The implementation results are as follows:

The logic of this SQL statement is clear, but what is its underlying execution flow?

2. Group by principle analysis 2.1explain analysis

Let's use explain to check the execution plan first.

Explain select city, count (*) as num from staff group by city

The Using temporary of the Extra field indicates that a temporary table was used when performing the grouping

The Using filesort of the Extra field indicates that sorting is used

How does group by use temporary tables and sorting? Let's take a look at the execution flow of this SQL.

2.2 simple execution flow of group by: explain select city, count (*) as num from staff group by city

Let's take a look at the execution process of this SQL.

Create a temporary memory table with two fields city and num

Scan the records of staff in the whole table and take out the records of city ='X' in turn.

Determine whether there are rows in the temporary table that are city='X', and insert a record if not (XPhone1)

If there is a row of city='X' in the temporary table, increase the num value of the row x by 1

After the traversal is completed, the result set is sorted according to the field city and returned to the client.

The execution diagram of this process is as follows:

What is the sort of temporary table?

That is, put the fields that need to be sorted into sort buffer, and then return. Note here that sorting is divided into full-field sorting and rowid sorting.

If it is a full-field sort, you need to put all the fields returned by the query into sort buffer, and then return them directly according to the sort fields.

In the case of rowid sorting, just put the sorted fields into sort buffer, then return to the table one more time, and then return.

How to determine whether it is a full-field sort or a rowid sort? Controlled by a database parameter, max_length_for_sort_data

For those of you who are interested in sorting, you can read my article.

Take a look at it and understand: order by detailed explanation

3. The difference between where and having

The execution flow of group by + where

The execution flow of group by + having

At the same time, the execution order of where, group by and having

3.1Execution flow of group by + where

Some partners think that the SQL in the previous section is too simple, if the where condition is added, and the where condition column is indexed, what is the implementation process?

OK, let's add a condition to it and add an index of idx_age, as follows:

Select city, count (*) as num from staff where age > 30 group by city;// indexed alter table staff add index idx_age (age)

Let's take a look at the expain analysis:

Explain select city, count (*) as num from staff where age > 30 group by city

From the execution of the plan results from explain, we can see that the query condition hits the index of idx_age, and temporary tables and sorting are used.

Using index condition: indicates index push-down optimization, filtering data according to the index as much as possible, and then returning it to the server layer to filter according to other where conditions. Why is there an index push-down for a single index here? The presence of explain does not necessarily mean that index push-down is used, it just means that it can be used, but not necessarily used. If you have any thoughts or questions, you can join me on Wechat for discussion.

The execution process is as follows:

1. Create a temporary memory table with two fields, city and num.

2. Scan the index tree idx_age to find the primary key ID that is older than 30 years old.

3. Through the primary key ID, go back to the table and find city ='X'

Determine whether there are rows in the temporary table that are city='X', and insert a record if not (XPhone1)

If there is a row of city='X' in the temporary table, increase the num value of the row x by 1

4. Continue to repeat steps 2 and 3 to find all the data that meet the conditions.

5. Finally, the result set is sorted according to the field city and returned to the client.

3.2 execution of group by + having

If you want to check the number of employees in each city and get cities with no less than 3 employees, having can solve your problem very well, SQL wrote:

Select city, count (*) as num from staff group by city having num > = 3

The query results are as follows:

Having is called a packet filter condition, and it operates on the returned result set.

3.3.The execution sequence of where, group by and having at the same time

If a SQL contains both where, group by, and having clauses, what is the order of execution?

For example, this SQL:

Select city, count (*) as num from staff where age > 19 group by city having num > = 3

Execute the where clause to find the data of employees over the age of 19

The group by clause groups employee data by city.

For the city group formed by the group by clause, run the aggregation function to calculate the number of employees in each group

Finally, use the having clause to select the city group with more than or equal to 3 employees.

3.4 Summary of differences between where and having

The having clause is used for filtering after grouping, and the where clause is used for row conditional filtering.

Having usually appears together with group by and aggregate functions, such as (count (), sum (), avg (), max (), min ()).

Aggregate functions cannot be used in where conditional clauses, but having clauses can.

Having can only be used after group by, and where executes before group by

4. Problems to pay attention to when using group by

There are several main points to pay attention to when using group by:

Does group by have to be used with aggregate functions?

The field of group by must appear in select.

Slow SQL problems caused by group by

Does group by have to be used with aggregate functions?

Group by means grouping statistics, which is usually used with aggregate functions such as (count (), sum (), avg (), max (), min ()).

Number of count ()

Sum of sum ()

Avg () average

Maximum value of max ()

Minimum min ()

Is it okay if it is not used with the aggregate function?

I use Mysql 5.7. it's okay. No error is reported, and the first row of data in the packet is returned.

For example, this SQL:

Select city,id_card,age from staff group by city

The query result is

By comparison, what is returned is the first piece of data of each group.

Of course, group by is usually used with aggregate functions, unless there are some special scenarios, such as if you want to remove duplicates, it is also possible to reuse distinct.

4.2 the fields that follow group by must appear in select.

Not necessarily, such as the following SQL:

Select max (age) from staff group by city

The implementation results are as follows:

The grouping field city is not followed by select and will not report an error. Of course, this may have something to do with different databases and different versions. When you use it, you can verify it first. There is a saying that the truth learned from paper is not profound and must be put into practice.

4.3 slow SQL problems caused by group by

When it comes to the most important concern, the improper use of group by can easily lead to slow SQL problems. Because it uses both temporary tables and sorting by default. Disk temporary tables may also be used sometimes.

If during execution, you will find that the memory temporary table size has reached the upper limit (the parameter that controls this upper limit is tmp_table_size), the memory temporary table will be converted into a disk temporary table.

If the amount of data is large, it is likely that the disk temporary table required by this query will take up a lot of disk space.

These are all x factors that lead to slow SQL. Let's discuss the optimization scheme together.

5. Some optimization schemes of group by

From which direction to optimize?

Direction 1: since it will sort by default, let's just not line it up.

Direction 2: since temporary tables are the X factor affecting group by performance, can we not use temporary tables?

Let's think about why we need temporary tables to execute group by statements. The semantic logic of group by is to count the number of different values. If these values are ordered from the beginning, should we just scan down the statistics and not use a temporary table to record and count the results?

Fields after group by are indexed

Order by null does not need to sort

Try to use only memory temporary tables

Use SQL_BIG_RESULT

5.1 fields after group by are indexed

How do you ensure that the values of the fields after group by are ordered in the first place? Indexing, of course.

Let's go back to this SQL.

Select city, count (*) as num from staff where age= 19 group by city

Its execution plan

If we add a federated index idx_age_city (age,city) to it

Alter table staff add index idx_age_city (age,city)

When you look at the execution plan, you find that you don't need sorting or temporary tables.

Adding an appropriate index is the simplest and most effective way to optimize group by.

5.2 order by null does not need sorting

Not all scenarios are suitable for indexing. If we encounter scenarios that are not suitable for indexing, how can we optimize them?

If your requirements do not need to sort the result set, you can use order by null.

Select city, count (*) as num from staff group by city order by null

The execution plan is as follows. There is no filesort left.

5.3 try to use only memory temporary tables

If there is not much data to be counted in group by, we can use only memory temporary tables as far as possible, because it is time-consuming if disk temporary tables are used in the group by process because the data cannot be put down. Therefore, the tmp_table_size parameter can be appropriately increased to avoid using disk temporary tables.

5.4 using SQL_BIG_RESULT optimization

What if the amount of data is too large? You can't increase the tmp_table_size indefinitely, can you? But you can't just watch the data put into the memory temporary table first, and then switch to the disk temporary table as the data insertion reaches the upper limit, right? This is a little unintelligent.

Therefore, if the estimated amount of data is large, we use the SQL_BIG_RESULT prompt to use the disk temporary table directly. The MySQl optimizer found that the disk temporary table is B + tree storage, and the storage efficiency is not as efficient as the array. So it will be saved directly in the array.

The example SQl is as follows:

Select SQL_BIG_RESULT city, count (*) as num from staff group by city

As you can see in the Extra field of the execution plan, the execution no longer uses temporary tables, but only sorts

The execution process is as follows:

Initialize sort_buffer and put in the city field

Scan the table staff, take out the values of city in turn, and store them in sort_buffer

When the scan is complete, sort the city field of sort_buffer

When the sorting is complete, you get an ordered array.

Count the number of occurrences of each value according to the ordered array.

6. How to optimize a slow production SQL

Recently encountered a production slow SQL, related to group by, show you how to optimize ha.

The table structure is as follows:

CREATE TABLE `stat` (`id` bigint (11) NOT NULL AUTO_INCREMENT COMMENT 'key id', `id_ card` varchar (20) NOT NULL COMMENT' ID number', `name`varchar (64) NOT NULL COMMENT 'name', `status`varchar (64) NOT NULL COMMENT'Y-activated I-initialization D-deleted R-auditing', `age`int (4) NOT NULL COMMENT 'age', `city`varchar (64) NOT NULL COMMENT 'city', `city`varchar (64) NOT NULL COMMENT 'enterprise' `legal_cert_ no` varchar (64) NOT NULL COMMENT 'legal person number', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' employee Table'

The SQL of the query looks like this:

Select * from T1 where status = # {status} group by # {legal_cert_no} these are all the contents of the article "how to use group by in MySql". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report