Critical point in nonclustered index (Tipping Point) 04/18 Update SLTechnology News&Howtos

Critical point in nonclustered index (Tipping Point)

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

What is the tipping point?

Note that the problem I want to say is that the execution plan of nonclustered indexes has changed from Seek+Lookup to the tipping point of Table/Clustered Index Scan. The smallest IO unit for SQL Server to access data is a page.

We know that the leaf level of a clustered index is a data page, and the leaf level of a nonclustered index is a pointer to a data row. Therefore, when you get data through a clustered index, you directly access the clustered index itself, while when you get data through a nonclustered index, you have to access the data page not only by accessing yourself, but also by pointers. This process is called RID/Key Lookup. This Lookup is a single-page operation, that is, use one RID/Key at a time, then access the corresponding data page, and then get the corresponding data rows on the page. It is possible that multiple data rows in the current data page meet the requirements of the query, but at a time lookup, only the data rows specified by the current RID/Key can be fetched. So the same data page may have to be accessed many times. For example, now lookup is going to find the data rows corresponding to RID for 2Magne3, 5, 5, 7 and 9, and these 5 data rows are all stored on the data page N, then the data page N will only be accessed 5 times less.

In Seek, if you want to return N rows of data, the Seek operation must at least access the N times data page. A tipping point occurs when Lookup accesses secondary data that exceeds the total number of data pages in the entire table. At this time, the operating cost of Scan is lower than that of Lookup. When this tipping point is exceeded, the query optimizer generally chooses Scan instead of Seek+Lookup. For example, table T has 100000 rows, each page contains 1000 rows, a total of 1000 pages. Query 1000 pieces of data, theoretically / ideally: Scan requires at least 10 times IO,Lookup and less than 1000 times IO.

It is important to note that there is no RID/Key in the overlay index, but the corresponding column value, so this problem does not occur.

When will the tipping point appear?

The above mentioned theoretical and theoretical things, while the emergence of the actual tipping point is determined by many factors. But it is mainly related to the total number of pages of the table. The tipping point probably occurs when the access page accounts for 25% ~ 33% of the total number of pages in the table. In order to be intuitive, the number of pages is usually converted into the number of rows for analysis. Note that the Lookup is a single-page operation, so the number of pages = the number of rows.

The total number of rows in a table is 1, 000, and there are 2 rows per page, a total of 500000 pages. Then 25% "125, 000, 33%" 166000. The tipping point will appear between page 125000 and page 166000. Converted to a row, it is 125000 / (2cm 500000) = 12.5% Magi 166000 / (2cm 500000) = 16.6%. This means that Lookup is likely to be used when the number of rows returned is less than 62400 (500000 to 12.55). Scan is likely to be used when the number of rows returned is greater than 83000. The rows of this table are so wide that only 2 rows of data can be stored on a page, which doesn't seem to make much sense in terms of percentage.

The total number of rows in a table is 1, 000, 100 rows per page, a total of 10000 pages. Then 25% "2500, 33%," 3300. Convert to line 2500Universe 0.25% Magne3300Universe 1000000cm 0.33%. The upper limit of its critical point is no more than 0.5%. In other words, when you query less than 0.5% of the rows in the table, the table will be scanned.

The total number of rows in a table is 1, 000, with 20 rows per page, a total of 50000 pages. Then 25% "1250.00" 33% "1660.00". Converted to a row, it is 125000 / (2cm 500000) = 1.25% Magi 166000 / (2cm 500000) = 1.66%.

It is not difficult to find that the critical point judgment is of great help to the query performance of large tables. For small tables, it will almost always be Scan, but the database has a caching mechanism, small tables will be fully cached, and scanning has little impact.

What can we do?

1. It is easy to think that since the table has an index corresponding to Seek, we use Hint to force the use of Seek, and the problem will not be solved. Not necessarily, the original problem is that the query optimizer believes that Scan is cheaper than Lookup. If you force it, it may backfire. SQL Server's query optimizer is powerful and intelligent, unless you have tested it rigorously to prove that ForceSeek performs better.

two。 Create an overlay index to eliminate Lookup operations.

Example analysis

Use Sales.SalesOrderDetail for AdventureWorks2012. There is a nonclustered index IX_SalesOrderDetail_ProductID in the ProductID column.

Through the following query, we can know that the table has 121317 rows, a total of 1237 data pages, each page stores about 98 rows of data. From this we can estimate that the critical point is near (309 lines, 408 lines).

Select page_count,record_countfrom sys.dm_db_index_physical_stats (db_id (), object_id (NumberSales.SalesOrderDetail'), 1 where index_level=0

Then count the number of rows of different ProductID in the table to test different ProductID:

Select ProductID,COUNT (*) as cntfrom Sales.SalesOrderDetail group by ProductIDorder by cnt

From the above query, we know that ProductID=882 has 407 rows in the table, and we can see that it still uses Lookup. Its IO count is:

Table 'SalesOrderDetail'. Scan count 1, logical reads 1258

ProductID=751 has 409 rows in the table, so it uses the Scan approach. Its IO count is:

Table 'SalesOrderDetail'. Scan count 1, logical reads 1246

We can also test other ProductID that returns more rows. If it is scanned, the IO is 1246, and if it is Lookup, it will be higher than 1246. The proof is consistent with the theory.

Even if 500 rows return, it will exceed the critical point, and 500 rows will only account for 0.41% of the total number of rows. In other words, when the number of rows returned exceeds 0.41% of the whole table, the optimizer thinks that its screening is not high enough, so it needs to be scanned instead of seek+lookup.

Summary

1. When it comes to "there is an index, why is it scanned?" The problem of the tipping point may be one of the reasons.

two。 Because there is a tipping point, the utilization of non-covering nonclustered indexes may not be as high as we thought.

Referenc

Http://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.