What is the principle of indexing in SqlServer? 07/02 Update SLTechnology News&Howtos

What is the principle of indexing in SqlServer?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you what the principle of indexing in SqlServer is. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

The concept of index

The use of the index: our query and processing speed of data has become the standard to measure the success or failure of the application system, and the use of index to speed up data processing is usually the most commonly used optimization method.

What is the index: the index in the database is similar to the catalog of a book, using the catalog in a book can quickly find the information you want without having to read the whole book. In a database, database programs can use indexes to reweight the data in a table without having to scan the entire table. The catalogue in the book is a list of words and the page number on which they are located, and the index in the database is a list of values in the table and where they are stored.

The pros and cons of indexing: most of the overhead of query execution is Istroke O, and one of the main goals of using indexes to improve performance is to avoid full table scans, because full table scans need to read every data page of the table from disk. If an index points to the data value, the query only needs to read the disk a few times. Therefore, the reasonable use of index can speed up the query of data. However, the index does not always improve the performance of the system, the indexed table needs more storage space in the database, and the running time of the command used to add and delete data and the processing time required to maintain the index will be longer. Therefore, we should use the index reasonably and update and remove the suboptimal index in time.

Basic structure of data table

When a new table is created, the system allocates a contiguous space in 8K units on the disk. When the value of the field is written from memory to disk, it is saved randomly in this given space. When an 8K is used up, the database pointer automatically allocates an 8K space. Here, each 8K space is called a data page (Page), also known as a page or data page, and a page number from 0 to 7 is assigned, and page 0 of each file records boot information, called a file header (File header); every combination of eight data pages (64K) forms an extension (Extent), called an extension. The combination of all data pages forms a Heap.

SQLS stipulates that rows cannot span data pages, so the maximum amount of data recorded per row can only be 8K. This is why the capacity of the two string types char and varchar should be limited to 8K. Data stored more than 8K should use the text type. In fact, the field values of the text type cannot be directly entered and saved, it just stores a pointer to an extension composed of a number of 8K text data pages in which the real data is placed.

Pages can be divided into spatial pages and data pages.

When the eight data pages of an extension contain both spatial pages and data or index pages, it is called hybrid extension (Mixed Extent), and each table starts with a mixed extension; conversely, it is called uniform extension (Uniform Extent), which specifically stores data and index information.

When the table is created, SQLS allocates at least one data page to it in the hybrid extension. As the amount of data increases, SQLS can immediately allocate seven pages in the hybrid extension, and when the data exceeds 8 pages, it allocates data pages from the consistent extension.

Spatial pages are specifically responsible for the allocation and management of data space, including: PFS pages (Page free space): recording information such as whether a page has been allocated, whether it is located in a mixed extension or a consistent extension, and how much space is available on the page; GAM pages (Global allocation map) and SGAM pages (Secodary global allocation map): used to record idle extensions or mixed extensions containing idle pages. SQLS makes comprehensive use of these three types of page files to create new spaces for data tables if necessary.

Data pages or index pages specifically store data and index information, and SQLS uses four types of data pages to manage tables or indexes: IAM pages, data pages, text / image pages, and index pages.

In WINDOWS, every step we perform on a file is physically located on disk that only the system (system) knows; SQL SERVER follows this way of working, in the process of inserting data, not only the location of each field value in the data page is random, but also the arrangement position of each data page in the "heap" is known only to the system (system).

Why is that? As we all know, OS is able to manage DISK because the file distribution table is loaded first when the system starts: FAT (File Allocation Table), which manages the file system and records all operations on files, so that the system can function normally; similarly, as a management system-level SQL SERVER, there is also such a table similar to FAT, which is the index distribution image page: IAM (Index Allocation Map).

The existence of IAM makes it possible for SQLS to manage data tables physically.

IAM pages are allocated from mixed extensions, recording the location of eight initial pages and the location of the extension. Each IAM page can manage 512000 data pages. If the amount of data is too large, SQLS can also add more IAM pages, which can be located anywhere in the file. The first IAM page is called FirstIAM, which records the location of future IAM pages.

Data pages and text / image pages are reciprocal. The former holds non-text / image type data because neither of them exceeds 8K capacity, while the latter only holds text or image type data that exceeds 8K capacity. The index page, as its name implies, stores the data information related to the index structure. Understanding the problems of the page will help us to accurately understand the way SQLS maintains the index, such as page splitting, fill factor, and so on.

Page split

Half of the data will be retained on the old page, while the other half will be placed on the new page, and the new page may be assigned to any available page. Therefore, frequent page splitting will have serious consequences, which will cause a large number of data fragments in the physical table, resulting in a sharp decline in the efficiency of SQLS O directly. finally, stopping the operation of SQLS and rebuilding the index will be our only choice!

Filling factor

A feature of an index that defines the amount of free space on each page of the index. FILLFACTOR (fill factor) adapts to later expansion of table data and reduces the possibility of page splitting. The fill factor is a percentage value from 0 to 100, and a setting of 100 indicates that the data page is filled. Use this setting only if no changes are made to the data, such as in a read-only table. The lower the value, the more free space on the data page, which reduces the need for page splitting during index growth, but this operation takes up more hard disk space. Improper assignment of the fill factor will reduce the read performance of the database, which is inversely proportional to the fill factor setting value.

Classification of indexes

There are several index types in SQL SERVER.

Classified by storage structure: "clustered index (also known as clustered index, clustered index)", "clustered index (non-clustered index, non-clustered index)"

Distinguished by data uniqueness: "unique index", "non-unique index"

Distinguish the number of key columns: "single-column index" and "multi-column index".

Clustered index

A clustered index is a method of reorganizing actual data on disk to sort by a specified column or column values. For example, the Chinese dictionary we use is a clustered index. For example, if we want to look up "Zhang", we naturally turn to the back 100 pages of the dictionary. Then look it up according to the alphabetical order. Microsoft's balanced binary tree algorithm is used here, that is, first turn the book to about 1/2, if the page number you are looking for is smaller than the page number of the page, turn the book forward to 1/4, otherwise, turn the book back to 3/4, and so on. Continue to divide the page into smaller parts until the correct page number.

Because a clustered index sorts data, there can be no more than one arrangement, so only one clustered index can be built for a table. Scientific statistics need at least 120% of the additional space for the table to store copies of the table and index middle pages, but its performance is almost always faster than other indexes.

Because under the clustered index, the data is physically arranged in order on the data page, and the duplicate values are also arranged together, so when including a range check (bentween,=) or a query using group by or order by, once the rows with the first key value are found, they will be linked together, and there is no need for further search to avoid large-scale scanning, which can greatly improve the speed of the query.

Nonclustered index

The index that sqlserver builds by default is a nonclustered index. Instead of reorganizing the data in the table, he stores the index column value for each row and uses a pointer to the page where the data is located. He is like the Chinese dictionary to find the words according to the "side radical". Even if the data is not sorted, the directory he owns is more like a directory, and it also has room to improve the efficiency of retrieving data without the need for full table scanning.

A table can have multiple nonclustered indexes, each of which provides a different sort order depending on the index column.

Create an index

Grammar

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name ON {table | view} (column [ASC | DESC] [,... n]) [with[ pad _ INDEX] [[,] FILLFACTOR=fillfactor] [[,] IGNORE_DUP_KEY] [[,] DROP_EXISTING] [[,] STATISTICS_NORECOMPUTE] [[,] SORT_IN_TEMPDB]] [ON filegroup]

The parameters of the CREATE INDEX command to create the index are described as follows:

UNIQUE: used to specify that a unique index is created for a table or view, that is, two rows with the same index value are not allowed.

CLUSTERED: used to specify that the index created is a clustered index.

NONCLUSTERED: used to specify that the index created is nonclustered.

Index_name: used to specify the name of the index created.

Table: used to specify the name of the table on which the index was created.

View: used to specify the name of the view in which the index is created.

ASC | DESC: used to specify the ascending or descending sort direction of a specific index column.

Column: used to specify the column to be indexed.

PAD_INDEX: used to specify the space that remains open on each page (node) in the middle level of the index.

FILLFACTOR = fillfactor: used to specify the percentage of data per index page as a percentage of the index page size when the index is created, with a value of 1 to 100 for fillfactor.

IGNORE_DUP_KEY: used to control how SQL Server reacts when duplicate data is inserted into a column contained in a unique clustered index.

DROP_EXISTING: used to specify that a named pre-existing clustered or nonclustered index should be deleted and recreated.

STATISTICS_NORECOMPUTE: used to specify that expired index statistics are not automatically recalculated.

SORT_IN_TEMPDB: used to specify that the intermediate sort results when the index is created will be stored in the tempdb database.

ON filegroup: used to specify the filegroup in which the index is stored.

Example:

-- Table bigdata creates a nonclustered index named idx_mobiel, the index field is mobielcreate index idx_mobielon bigdata (mobiel)-- Table Bigdata creates a unique clustered index named idx_id, and the index field is id--, which requires that duplicate values are ignored when data is inserted in batches, statistics are not recalculated, and the fill factor is 40create unique clustered index idx_idon bigdata (id) with pad_index,fillfactor=40,ignore_dup_key,statistics_norecompute.

Manage index

Exec sp_helpindex BigData-View the index definition Exec sp_rename 'BigData.idx_mobiel','idx_big_mobiel'-change the index name from' idx_mobiel' to 'idx_big_mobiel'drop index BigData.idx_big_mobiel-delete the idx_big_mobiel index dbcc showcontig (bigdata,idx_mobiel) in the bigdata table-check the fragmentation information dbcc indexdefrag (Test,bigdata) of the index idx_mobiel in the bigdata table Idx_mobiel)-- Defragment update statistics bigdata on index idx_mobiel of bigdata table in test database-- update statistics for all indexes in bigdata table

Design principles of index

Whether there is an index or not and what kind of index to build for a table depends on where sentences and Join expressions.

Generally speaking, the principles of indexing include the following:

The system generally automatically establishes a clustered index for gradual fields. Columns that have a large number of duplicate values and often have range queries and sorting, grouped columns, or frequently accessed columns, consider establishing a clustered index. To build an index in a table that often does insert operations, you should use fillfactor (fill factor) to reduce page splitting and increase concurrency to reduce the occurrence of deadlocks. If the table is read-only, the fill factor can be set to 100. When selecting index keys, use columns of small data types as keys as far as possible so that each index page can hold as many index keys and pointers as possible. In this way, the index pages that a query must traverse can be minimized. In addition, use integers as key values as much as possible, because integers have the fastest access speed.

The above is what is the principle of indexing in SqlServer. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.