In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, Xiaobian will bring you about how to implement clustered index and non-clustered index in sql. The article is rich in content and analyzed and described from a professional perspective. After reading this article, I hope you can gain something.
clustered index
An index in which the logical order of key values determines the physical order of corresponding rows in a table.
A clustered index determines the physical order of data in a table. A clustered index is similar to a phone book, which arranges data by last name. Because clustered indexes specify the physical order in which data is stored in a table, a table can contain only one clustered index. But the index can contain multiple columns (composite index), just as a phone book is organized by last name and first name. Clustered indexes are particularly effective for columns that are frequently searched for range values. Once the row containing the first value is found using a clustered index, you can ensure that rows containing subsequent index values are physically adjacent. For example, if an application executes a query that frequently retrieves records within a certain date range, a clustered index can be used to quickly find the row that contains the start date and then retrieve all adjacent rows in the table until the end date is reached. This helps improve the performance of such queries. Similarly, if a column is frequently used to sort data retrieved from a table, you can save costs by aggregating the table on that column (physical sorting) to avoid sorting every time you query that column. Using clustered indexes to find specific rows is also efficient when index values are unique. For example, the fastest way to find a particular employee using the unique employee ID column emp_id is to create a clustered index or PRIMARY KEY constraint on the emp_id column.
nonclustered index
An index in which the logical order of indexes differs from the physical storage order upstream of disk.
An index is described by a binary tree data structure, and we can understand clustered indexes as follows: the leaf nodes of an index are data nodes. The leaf nodes of a non-clustered index are still index nodes, but with a pointer to the corresponding data block. As shown below:
(nonclustered index)
(clustered index)
I. Understanding index structure in simple terms
In fact, you can think of an index as a special kind of directory. Microsoft SQL Server provides two types of indexes: clustered index (also known as clustered index) and nonclustered index (also known as nonclustered index). Here is an example to illustrate the difference between clustered and nonclustered indexes: In fact, the body of our Chinese dictionary itself is a clustered index. For example, if we want to look up the word "An," we will naturally open the first few pages of the dictionary, because the pinyin of "An" is "an," and the dictionary of Chinese characters sorted according to pinyin starts with the English letter "a" and ends with "z," so the word "An" will naturally be arranged at the front of the dictionary. If you look up all the parts beginning with "a" and still can't find this word, then it means that this word is not in your dictionary; similarly, if you look up the word "Zhang," you will also turn your dictionary to the last part, because the pinyin of "Zhang" is "zhang." That is, the body of the dictionary itself is a directory, and you don't have to look in other directories to find what you're looking for. We refer to this text content itself as a directory arranged according to certain rules as a "clustered index."
If you recognize a word, you can quickly look it up in the automatic. However, you may also encounter a word you do not know and do not know its pronunciation. At this time, you cannot find the word you are looking for according to the method just now, but you need to find the word you are looking for according to the "radical", and then turn to a page directly according to the page number after the word to find the word you are looking for. However, the sorting of characters found by combining the "radical catalogue" and "character index table" is not the real sorting method of text. For example, if you look up the word "Zhang," we can see that the page number of "Zhang" in the character index table after searching for radicals is 672 pages, the word "Chi" is above "Zhang" in the character index table, but the page number is 63 pages, and the word "crossbow" is below "Zhang," and the page number is 390 pages. Obviously, these words are not really located above and below the word "Zhang". The consecutive words "Chi, Zhang, crossbow" you see now are actually their ordering in the non-clustered index, which is the mapping of the words in the dictionary body in the non-clustered index. We can find the word you need this way, but it requires two processes, first find the result in the table of contents, and then turn to the page number you need. We call this arrangement of tables purely table of contents and bodies purely body "non-clustered indexes."
From the above example, we can understand what is a "clustered index" and a "non-clustered index." By extension, it's easy to understand that there can only be one clustered index per table, because directories can only be sorted in one way.
When to use clustered or nonclustered indexes
The following table summarizes when clustered or nonclustered indexes are used (important):
Action Description Using a clustered index Using a non-clustered index Column is often grouped Sort should return a range of data Should not have one or very few different values Should not have a small number of different values Should not have a large number of different values Should not have frequently updated Column should not have foreign key Column should have primary key Column should have frequently modified index Column should not have
In fact, we can understand the above table by looking at the previous example of the definition of clustered and nonclustered indexes. For example, it returns a data item within a certain range. For example, if you have a time column in a table, and you set up the aggregation index in this column, then when you query all the data between January 1, 2004 and October 1, 2004, this speed will be very fast, because the body of your dictionary is sorted by date, and the clustering index only needs to find the beginning and end data of all the data to be retrieved. Unlike non-clustered indexes, you must first look up the page number of each item in the table of contents, and then look up the specific content according to the page number.
Third, combined with reality, talk about the error of index use
The purpose of theory is application. Although we have listed when clustered or nonclustered indexes should be used, in practice the above rules are easily ignored or cannot be comprehensively analyzed. Below we will talk about the errors in the use of indexes according to the actual problems encountered in practice, so that we can master the method of index establishment.
The primary key is a clustered index. This idea, I think, is extremely wrong and a waste of clustered indexes. SQL SERVER uses a clustered index on a primary key. Usually, we will create an ID column in each table to distinguish each piece of data, and this ID column is automatically incremented, usually by 1. This is the case for column Gid in our example of office automation. At this point, if we set this column as the primary key, SQL SERVER defaults this column to a clustered index. This has the advantage of allowing your data to be physically sorted by ID in the database, but I don't think it makes much sense. Obviously, the advantages of clustered indexes are obvious, and the rule that there can only be one clustered index per table makes clustered indexes even more valuable. From the definition of clustered index we talked about earlier, we can see that the biggest benefit of using clustered index is that it can quickly narrow the query scope according to query requirements and avoid full table scan. In practice, because the ID number is automatically generated, we do not know the ID number of each record, so it is difficult for us to use the ID number to query in practice. This makes it a waste of resources to have ID numbers as primary keys for clustered indexes. Secondly, let each ID number of fields are different as a clustered index does not comply with the rule of "aggregation index should not be established when a large number of different values"; of course, this situation is only for users to modify the record content frequently, especially when the index item will have a negative effect, but it has no impact on the query speed. In the office automation system, whether it is the system home page to be signed by the user of the file, conference or user file query under any circumstances such as data query is inseparable from the field of "date" and the user's own "user name." Typically, the home page of office automation displays documents or meetings that each user has not signed for. Although our where statement can only limit the situation that the current user has not signed, but if your system has been established for a long time, and the amount of data is very large, then every time each user opens the home page, a full table scan is not meaningful, the vast majority of users 1 month ago files have been browsed, doing so will only increase the overhead of the database. In fact, we can let the user open the home page of the system, the database only query the files that the user has not read in the past 3 months, and limit the table scanning through the field of "Date" to improve the query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times faster or even faster. The reason I mention "theoretically" here is that if your clustered index is still blindly built on ID, your query speed will not be as high, even if you build an index on the "date" field (non-aggregated index). Let's take a look at the speed performance of various queries in the case of 10 million data volumes (250,000 data in 3 months):
(1) Set up a clustered index only on the primary key, and do not divide the time period:
Select gid,fariqi,neibuyonghu,title from tgongwen
Time: 128470 milliseconds (i.e., 128 seconds)
(2) Set up a clustered index on the primary key and a non-clustered index on the fariq:
select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> dateadd(day,-90,getdate())
Time: 53763 milliseconds (54 seconds)
(3) Set the aggregate index on the date column (fariqi):
select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> dateadd(day,-90,getdate())
Time: 2423 milliseconds (2 seconds)
Although each statement extracts 250,000 pieces of data, the differences between cases are huge, especially when the clustered index is built on a date column. In fact, if your database really has 10 million capacity, the primary key is established on the ID column, just like the first and second cases above, the performance on the web page is timeout, and it cannot be displayed at all. This is also one of the most important factors in my decision to discard ID columns as clustered indexes. To get the above speed, add:
declare @d datetime set @d=getdate()
And add after the select statement:
select [statement execution time (ms)]=datediff(ms,@d,getdate())
2. As long as the index is established, the query speed can be significantly improved. In fact, we can find that in the above example, the second and third statements are exactly the same, and the fields that establish the index are also the same; the only difference is that the former establishes a non-aggregated index on the fariqi field, and the latter establishes an aggregated index on this field, but the query speed is very different. So, it's not that simply indexing on any field will speed up queries. From the table creation statement, we can see that there are 5003 different records in the fariqi field of this table with 10 million data. It is appropriate to build an aggregate index on this field. In reality, we send several files every day, and these files have the same issue date, which is completely in line with the rule of "neither most of them are the same, nor only a few of them are the same." From this point of view, it is very important that we establish "appropriate" aggregate indexes for us to improve query speed.
3. Add all fields that need to be improved to the clustered index to improve query speed. As mentioned above,"date" and "user name" of the user are inseparable fields when performing data query. Since these two fields are so important, we can combine them to create a compound index. While many people think that adding any field to a clustered index will speed up queries, others wonder if queries will slow down if composite clustered index fields are queried separately. With this question in mind, let's look at the following query speeds (the result set is 250,000 data):(Date column fariqi is first in the composite clustered index, user name neubuyonghu is next):
(1)select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi>''2004-5-5''
Query speed: 2513 ms
(2)select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi>''2004 -5-5'' and neubuyonghu =''Office ''
Query speed: 2516 ms
(3) select gid,fariqi, neubuyonghu,title from Tgongwen where neubuyonghu =''office''
Query speed: 60280 ms
From the above experiments, we can see that if only the starting column of the clustered index is used as the query condition and all the columns of the composite clustered index are used at the same time, the query speed is almost the same, or even slightly faster than all the columns of the composite clustered index (in the case of the same number of query result sets); if only the non-starting column of the composite clustered index is used as the query condition, the index does not work. Of course, the query speed of statements 1 and 2 is the same because the number of entries in the query is the same. If all columns of the composite index are used and the query results are few, this will form "index coverage", so the performance can be optimized. Also, remember that regardless of whether you use the other columns of the aggregate index frequently, its leading column must be the most frequently used column.
IV. Summary of experience in using indexes not found in other books
1. Using an aggregated index is faster than using a primary key that is not an aggregated index. Here is an example statement: (both extract 250,000 pieces of data)
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi=''2004-9-16''
Usage time: 3326 ms
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid''2004-1-1''
Time: 6343 ms (1 million extracted)
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi>''2004-6-6''
Time: 3170 ms (500,000 items extracted)
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi=''2004-9-16''
Time: 3326 ms (same as above). If the quantity collected is the same, then the greater than sign and the equal sign are the same)
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi>''2004-1-1'' and fariqi''2004-1-1'' order by fariqi
Time: 6390 ms
select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.