Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the optimization skills of SQL Server aggregate function algorithm

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about the optimization skills of SQL Server aggregation function algorithm, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Sql server aggregate function is widely used to deal with various requirements in practical work, so the optimization of aggregate function naturally becomes a key point. The quality of a program optimization directly determines the declaration cycle of this program. The Sql server aggregate function performs calculations on a set of values and returns a single value. The aggregate function performs calculations on a set of values and returns a single value. With the exception of COUNT, aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of a SELECT statement.

one。 Write at the front

All the data demos are based on Microsoft's official sample database: Northwind, and Northwind can also be downloaded online.

II. Sql server scalar polymerization

2.1. Concept: an aggregate function (such as MIN (), MAX (), COUNT (), SUM (), or AVG ()) specified in the list of SELECT statement columns that contain only aggregate functions. When the column list contains only aggregate functions, the result set has only one row that gives the aggregate value, which is calculated by the source row that matches the WHERE clause predicate.

2.2. Explore scalar aggregation:

Let's start with Sql server's "including actual execution plans" to look at a simple stream aggregation COUNT () to see all the rows of data in the table.

Then use SET SHOWPLAN_ALL ON (more information about the columns contained in the output can be seen in the link) to see more information about the execution of the statement and estimate the resource requirements of the statement.

Through SET SHOWPLAN_ALL ON, let's take a look at exactly what COUNT () does:

Index scan: scan the number of rows in the current table flow calculation: calculate the number of rows calculate scalar: convert the results of the flow calculation to the appropriate type. (because the scanned result of the index is determined by the size of the data in the table, if there is a lot of data in the table, it will be a problem if COUNT is of type int, so you need to convert the default type (the default type of value is generally Big) to type int when it is finally returned.) Summary: through SET SHOWPLAN_ALL ON we can see what the Sql server aggregate function has done for this effect when it gives us the final effect.

2.3. Scalar aggregation optimization techniques:

Let's look at the differences between two relatively simple sql queries.

The code is as follows:

SELECT COUNT (DISTINCT ShipCity) FROM OrdersSELECT COUNT (DISTINCT OrderID) FROM Orders

As you can see from the above picture, in fact, there is not much difference between the two queries in terms of sentence, but why the cost is different, one is to query the city and the other is to query the order number. This is because in fact, DISTINCT is meaningless to OrderID queries, because OrderID is the primary key and there is no repetition. While ShipCity will have repetition, the de-weight mechanism of Sql server will have a sorting process when it is de-duplicated. This sort is rather resource-consuming.

For a table with a large amount of data, it is not recommended to sort the large table or re-operate on a field with more repetitions of the large table. So we can optimize ShipCity here. You can create a nonclustered index on ShipCity.

The code is as follows:

CREATE INDEX Index_ShipCity On Orders (ShipCity desc) go

As you can see from the figure above, after indexing, the COUNT (DISTINCT ShipCity) query becomes two stream aggregates without sorting, saving overhead.

Summary: as you can see from the above example, the advantages and disadvantages of scalar aggregation are obvious:

Advantages of Sql server scalar aggregation: the algorithm is relatively simple and intuitive, and is suitable for aggregation operations with non-repetitive values. The disadvantage of Sql server scalar aggregation is that it has poor performance (sorting is required) and is not suitable for aggregation operations with duplicate values. Optimization tips: try to avoid sorting and lock the grouped word (GROUP BY) segment within the coverage of the index.

III. Sql server hash aggregation

3.1. Concept:

Hash (Hash, generally translated as "hash", also has direct transliteration as "hash", is the arbitrary length of input (also known as pre-mapping, pre-image), through the hash algorithm, transformed into a fixed length output, the output is the hash value. This transformation is a compressed mapping, that is, the space of the hash value is usually much smaller than that of the input, and different inputs may be hashed into the same output, so it is not possible to determine the input value uniquely from the hash value. Simply put, it is a function that compresses a message of any length into a message digest of a fixed length.)

The internal implementation method of hash aggregation is the same as the implementation mechanism of hash join, which requires the internal operation of hash function to form different hash values, and then scan the data in parallel to form aggregate values.

3.2. Background:

In order to solve the deficiency of flow aggregation and deal with the operation of big data, hash aggregation was born.

3.3. Analysis:

Take a look at two simple queries.

Grouping queries for ShipCountry and CustomerID look similar, but why are the execution plans different? This is because ShipCountry contains a large number of duplicate values, and CustomerID repeat values are very few, so the Sql server system pushes hash aggregations to ShipCountry, while CustomerID pushes stream aggregations. In other words, the Sql server system will dynamically choose the appropriate aggregation method according to the query situation. So when we do SQL optimization, we can not only optimize according to the SQL statement, but also combine the environment of specific data distribution.

four。 Operation process monitoring index

4.1. Monitoring elements:

Visual view run time T-sql statement query time memory occupied T-sql statement query IO

4.2. Visually view the run time:

4.3.T-sql statement query time:

4.4. Memory usage:

The 4.5.T-sql statement queries IO:

There are many more monitoring elements, here are just a few.

After reading the above, do you have any further understanding of the optimization skills of SQL Server aggregate function algorithm? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report