The influence of sampling percentage on the accuracy of data estimation when SQL Server statistics are updated 07/06 Update SLTechnology News&Howtos

The influence of sampling percentage on the accuracy of data estimation when SQL Server statistics are updated

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the impact of sampling percentage on the accuracy of data estimates when SQL Server statistics are updated. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Why do you write statistics?

Recently, when I saw someone writing statistics in the garden, the landlord also came to join the party.

It is said that often do database, especially do development or optimization, statistical information caused by performance problems should be said to be common.

Of course, the solution is not immutable, and the practice of "eating everything in one move" is no longer feasible (digression: isn't this the way the whole era is like this?)

Of course, the same sentence, now that it has been written, it can not be too corny, write something different, this paper analyzes a similar actual case to interpret the issues related to the update of statistical information.

For practical problems, not only to solve the problem, it is more important to make an in-depth analysis in theory in order to better control the database.

When to update statistics

(1) the query execution is slow, or the query statement is suddenly slow. This scenario is probably due to the problem of parameter sniffing because the statistics are not updated in a timely manner.

(2) when a large amount of data is INSERT/DELETE/UPDATE to ascending or descending columns, in this case, the statistical histogram may not be updated in time.

(3) it is recommended that statistics be updated after maintenance other than index maintenance, which does not change the data distribution when you rebuild, defragment, or reorganize the index.

(4) the data of the database changes frequently, so it is recommended to update the statistical information at least once a day. Data warehouses can appropriately reduce the frequency of updating statistics.

(5) when there is a warning of missing statistics in the execution plan, you need to create statistics manually.

Basis of statistical information

First of all, let's talk about an old topic, the update threshold of statistics:

1, the table has changed from no data to one piece of data greater than or equal to.

2, for a table with less than 500 rows of data, when the cumulative change of the first field of the statistical information is greater than 500.

3. For a table with more than 500 rows of data, when the cumulative change of the first field of the statistical information is greater than 500 + (20% × the total amount of table data).

Make a query, trigger statistics update, and return rowmodct to 0 (continue to accumulate until the next trigger threshold, and return to 0 again after triggering the update)

On the problem of statistical information being "out of date"

At the beginning of the text, there are many articles on the Internet about statistical information, referring to statistical information, many of which are due to the expiration of statistical information, and then what happens after the update?

Especially in the third interval that triggers the automatic update threshold of statistics, that is to say, the update of statistics can be triggered automatically only after the cumulative change of data exceeds 20%.

This usually has a great impact on large tables. For example, for 1000W tables, statistics updates are triggered after a change of more than 20% or + 500, that is, 200W+500 rows. The automatic trigger threshold of this threshold range is unacceptable in most cases, so the diagnosis of statistics becomes "out of date".

It is understandable to judge whether the statistics are out of date and then update the statistics to promote the execution plan to estimate the number of rows more accurately.

But that's the problem: so how do you update statistics? Whether the immutable approach is feasible or not is the crux of the question.

Of course, someone must have said that I updated it in the default way, and after the update, SQL became more optimized or something.

Update the statistics of an index through update statistics TableName StatisticName

Or update statistics TableName updates the statistics of the whole table

In this case, it is often possible to do this on a small table. Of course, there is no standard value for a large table or a small table. Everything should be explained in combination with the facts.

The following begins the topic of this article:

Abstract and simplify an actual case in the business, create such a table, similar to the order and order schedule (master child table)

Here you can imagine that it is a child table of an order table. The Id field is unique, and there is a ParentID field that is not unique.

ParentID is similar to the Id of the main table. The test data is inserted according to the rule that 50 bars of a main table Id indicate fineness.

CREATE TABLE [dbo]. [TestStaitisticsSample] ([Id] [int] IDENTITY (1) NOT NULL, [ParentId] [int] NULL, [OtherColumn] [varchar] (50) NULL) declare @ I int=0while (@ I)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.