Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the SQL index and database optimization?

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the SQL index and database optimization, the content is very detailed, interested friends can refer to, hope to be helpful to you.

(1) A profound and simple understanding of the index structure

In fact, you can think of an index as a special kind of directory. Microsoft SQL SERVER provides two kinds of indexes: clustered index (clustered index, also known as clustered index, clustered index) and nonclustered index (nonclustered index, also known as non-clustered index, non-clustered index). Let's give an example to illustrate the difference between a clustered index and a nonclustered index:

In fact, the text of our Chinese dictionary itself is a clustered index. For example, if we want to look up the word "an", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters sorted by pinyin begins with the English letter "a" and ends with "z", so the word "an" is naturally placed at the front of the dictionary. If you have searched all the parts that start with "a" and still can't find the word, it means you don't have the word in your dictionary; similarly, if you look up the word "Zhang", you will turn your dictionary to the last part, because the pinyin of "Zhang" is "zhang". In other words, the body of the dictionary is itself a directory, and you don't need to look in other directories to find what you are looking for.

We call this kind of text content itself a kind of directory arranged according to certain rules as "clustered index".

If you know a word, you can quickly look it up in the self-dictionary. But you may also encounter a word you do not know and do not know its pronunciation. At this time, you will not be able to find the word you are looking for according to the previous method, but need to find the word you are looking for according to the "side radical". Then turn directly to a page according to the page number after the word to find the word you are looking for. However, the sorting of the words you find by combining the "radical catalogue" and the "word search list" is not really the sorting method of the text. For example, if you look up the word "Zhang", we can see that the page number of "Zhang" in the word search list after the search is 672 pages, and the top of the word "Zhang" in the word search table is the word "chi", but the page number is 63 pages, the bottom of "Zhang" is the word "crossbow", and the page is 390 pages. Obviously, these words are not really located at the top and bottom of the word "Zhang". The continuous words "Chi, Zhang, and crossbow" you see now are actually their sorting in the nonclustered index. it is the mapping of the words in the dictionary body in the nonclustered index. We can find the words you need in this way, but it takes two processes to find the results in the directory and then turn to the page number you need.

We call this kind of catalog purely a catalog, and the sorting method in which the text is purely the text is called a "nonclustered index".

Through the above examples, we can understand what are "clustered indexes" and "nonclustered indexes".

As a further extension, it is easy to understand that there can be only one clustered index per table because directories can only be sorted in one way.

(II) when to use clustered or nonclustered indexes

The following table summarizes when to use a clustered or nonclustered index (important).

Action description using clustered index using nonclustered index

The foreign key column should be

The primary key column should be

Columns are often grouped and sorted (order by) should be

Should data within a certain range be returned?

Different values of a small number should not

Different values of large numbers should not be

Columns that are updated frequently should not

Frequently modifying index columns should not be

One or very few different values should not be

In fact, we can understand the above table through the previous examples of the definitions of clustered and nonclustered indexes. For example, return an item of data within a certain range. For example, a table of yours has a time column in which you build the aggregate index. When you query all the data between January 1, 2004 and October 1, 2004, this speed will be very fast. Because the body of your dictionary is sorted by date, the clustering index only needs to find the beginning and end of all the data to be retrieved. Unlike a nonclustered index, you must first find the page number corresponding to each item of data in the directory, and then find the specific content according to the page number.

(3) talking about the misunderstandings in the use of index in the light of practice.

The purpose of theory is to apply. Although we have just listed when clustered or nonclustered indexes should be used, in practice, the above rules are easily ignored or cannot be analyzed comprehensively according to the actual situation. Next, we will talk about the misunderstandings in the use of the index according to the practical problems encountered in practice, so that we can master the method of index establishment.

1. The primary key is the clustered index

The author thinks that this idea is extremely wrong and a waste of clustered indexes. Although SQL SERVER builds a clustered index on the primary key by default.

Typically, we create an ID column in each table to distinguish each piece of data, and this ID column is automatically incremented, with a step size of 1. This is the case with column Gid in our example of office automation. At this point, if we set this column as the primary key, SQL SERVER defaults this column to the clustered index. This has the advantage of having your data physically sorted by ID in the database, but I don't think it makes much sense.

Obviously, the advantage of clustered index is obvious, and the rule that there can be only one clustered index in each table makes clustered index more precious.

From the definition of clustered index we mentioned earlier, we can see that the greatest advantage of using clustered index is that it can quickly narrow the scope of query and avoid full table scanning according to the query requirements. In practical application, because the ID number is generated automatically, we do not know the ID number of each record, so it is difficult for us to query with the ID number in practice. This makes it a waste of resources to use the primary key ID as a clustered index. Secondly, making each field with a different ID number as a clustered index does not conform to the rule that an aggregate index should not be built in the case of large numbers with different values; of course, this situation only has a negative effect on the contents of the record, especially the index items, but has no effect on the query speed.

In the office automation system, whether the documents that need to be signed by the user, the meeting or the file query displayed on the home page of the system are inseparable from the field is the "date" and the "user name" of the user.

Typically, the home page of office automation displays documents or meetings that have not been signed by each user. Although our where statement can only limit the current user has not yet signed for, but if your system has been established for a long time, and a large amount of data, then, every time each user opens the home page for a full table scan, this is not meaningful, the vast majority of users a month ago the files have been browsed, this will only increase the cost of the database. In fact, we can allow the user to open the home page of the system, the database only query the user has not read the file for nearly 3 months, through the "date" field to limit table scanning and improve the query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times faster or even faster.

The word "theoretically" is mentioned here because if your clustered index is still blindly built on the primary key ID, your query speed is not so fast, even if you build the index on the date field (non-aggregated index). Let's take a look at the speed performance of various queries in the case of 10 million pieces of data (250000 items in 3 months):

(1) build a clustered index only on the primary key without dividing the time period:

Select gid,fariqi,neibuyonghu,title from tgongwen

Time: 128470 milliseconds (i.e., 128seconds)

(2) create a clustered index on the primary key and a nonclustered index on fariq:

Select gid,fariqi,neibuyonghu,title from Tgongwen

Where fariqi > dateadd (day,-90,getdate ())

Time: 53763 milliseconds (54 seconds)

(3) build the aggregate index on the date column (fariqi):

Select gid,fariqi,neibuyonghu,title from Tgongwen

Where fariqi > dateadd (day,-90,getdate ())

Time: 2423 milliseconds (2 seconds)

Although 250000 pieces of data are extracted from each statement, the differences are huge, especially when a clustered index is built on a date column. In fact, if your database really has 10 million capacity, build the primary key on the ID column, as in the first and second cases above, the performance on the web page is a timeout and cannot be displayed at all. This is also the most important factor in my abandonment of the ID column as a clustered index.

The way to get the above speed is to add: before each select statement:

Declare @ d datetime

Set @ d=getdate ()

And add the following after the select statement:

Select [statement execution time in milliseconds] = datediff (ms,@d,getdate ())

2. As long as the index is established, the query speed can be significantly improved.

In fact, we can find that in the above example, statements 2 and 3 are exactly the same, and the indexed fields are the same; the only difference is that the former builds a non-aggregate index on the fariqi field, while the latter builds an aggregate index on this field, but the query speed is very different. Therefore, it is not a simple index on any field that can speed up the query.

From the statement to create the table, we can see that there are 5003 different records in the fariqi field in this table with 10 million data. It is perfectly appropriate to build an aggregate index on this field. In reality, we send several files every day, and these files are issued on the same date, which is fully in line with the requirement of establishing a clustered index: "neither the vast majority of them are the same, nor only a few of them are the same." In view of this, it is very important for us to improve the query speed by establishing an "appropriate" aggregate index.

3. Add all the fields that need to improve the query speed to the clustered index to improve the query speed.

As mentioned above, the fields that are inseparable from the data query are the "date" and the "user name" of the user. Since both fields are so important, we can combine them to create a composite index (compound index).

Many people think that adding any field to the clustered index can improve the query speed, while others are confused: will the query speed slow down if the composite clustered index fields are queried separately? With this question in mind, let's take a look at the following query speed (the result set is all 250000 pieces of data): (the date column fariqi is listed first at the beginning of the composite clustered index, followed by the user name neibuyonghu)

(1) select gid,fariqi,neibuyonghu,title from Tgongwen

Where fariqi > '2004-5-5'

Query speed: 2513 milliseconds

(2) select gid,fariqi,neibuyonghu,title from Tgongwen

Where fariqi > '2004-5-5' and neibuyonghu=' Office'

Query speed: 2516 milliseconds

(3) select gid,fariqi,neibuyonghu,title from Tgongwen

Where neibuyonghu=' office'

Query speed: 60280 milliseconds

From the above experiments, we can see that if only the starting column of the clustered index is used as the query condition and all the columns of the composite clustered index are used at the same time, the query speed is almost the same. Even faster than using all the composite index columns (in the case of the same number of query result sets); but if only the non-starting column of the composite clustered index is used as the query condition, the index has no effect. Of course, the query speed of statements 1 and 2 is the same because the number of entries in the query is the same. If all the columns of the composite index are used and the query results are few, "index coverage" will be formed, so the performance can be optimal. At the same time, keep in mind that whether or not you often use other columns of the aggregate index, the leading column must be the most frequently used column.

(4) Summary of experience in using indexes that are not available in other books

1. Using an aggregate index is faster than using a primary key that is not an aggregate index

The following is an example statement: (all extract 250000 pieces of data)

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16'

Usage time: 3326 milliseconds

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid'2004-1-1'

Time: 6343 milliseconds (1 million entries extracted)

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi > '2004-6-6'

Time: 3170 milliseconds (500000 entries extracted)

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16'

Time: 3326 milliseconds (the result is exactly the same as the previous sentence. If the quantity collected is the same, then the greater than sign and the equal sign are the same)

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi > '2004-1-1' and fariqi'2004-1-1 'order by fariqi

Time: 6390 milliseconds

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi 10000

And execution:

Select * from table1 where tID > 10000 and name='zhangsan'

Some people do not know whether the execution efficiency of the above two statements is the same, because if you simply look at the sequence of the statements, the two statements are indeed different, if tID is an aggregate index, then the latter sentence will only look up the records after 10000 items of the table; while the former sentence should first look for several name='zhangsan' in the whole table, and then propose the query results according to the constraint condition tID > 10000.

In fact, such worries are unnecessary. There is a "query analysis optimizer" in SQL SERVER that calculates the search conditions in the where clause and determines which index reduces the search space for table scans, that is, it optimizes automatically.

Although the query optimizer can automatically optimize queries based on the where clause, it is still necessary to understand how the query optimizer works. Otherwise, sometimes the query optimizer will not query quickly as you intended.

During the query analysis phase, the query optimizer looks at each stage of the query and decides whether it is useful to limit the amount of data that needs to be scanned. If a phase can be used as a scan parameter (SARG), it is called optimizable, and the required data can be quickly obtained using the index.

Definition of SARG: used to restrict an operation of a search, because it usually refers to a specific match, a match in a worthy range, or an AND connection with more than two conditions. The form is as follows:

Column name operator

Or

Operator column name

Column names can appear on one side of the operator, while constants or variables appear on the other side of the operator. Such as:

Name=' Zhang San'

Price > 5000

50005000

If an expression does not satisfy the form of SARG, it cannot limit the scope of the search, that is, SQL SERVER must determine for each line whether it meets all the conditions in the WHERE clause. So an index is useless for expressions that do not satisfy the SARG form.

After introducing SARG, let's summarize the experience of using SARG and the experiences encountered in practice that are different from the conclusions on some materials:

1. Whether the Like statement belongs to SARG depends on the type of wildcard used

Such as: name like 'Zhang%', which belongs to SARG

And: name like'% Zhang 'does not belong to SARG.

The reason is that the opening of the wildcard% in the string makes the index unusable.

2. Or will cause full table scan

Such as: Name=' Zhang San 'and price > 5000 symbol SARG

But: Name=' Zhang San'or price > 5000 does not conform to SARG.

Using or causes a full table scan.

3. Statements that do not satisfy the SARG form caused by non-operators and functions

The most typical case of statements that do not satisfy the SARG form is to include statements that are not operators, such as NOT,! =,!, NOT EXISTS, NOT IN, NOT LIKE, etc., as well as functions. Here are a few examples that do not satisfy the SARG form:

ABS (Price) 5000

SQL SERVER will also think that SARG,SQL SERVER will translate this expression into:

WHERE Price > 2500 Universe 2

However, this is not recommended because sometimes SQL SERVER cannot guarantee that this transformation is completely equivalent to the original expression.

4. The function of IN is equivalent to that of OR.

Statement:

Select * from table1 where tid in (2pm 3)

And

Select * from table1 where tid=2 or tid=3

Is the same, will cause a full table scan, if there is an index on the tid, its index will also be invalid.

5. Use NOT as little as possible

6. The execution efficiency of exists and in is the same.

Many materials show that the execution efficiency of exists is higher than that of in, and not exists should be used instead of not in as much as possible. But in fact, I experimented and found that the execution efficiency between the two is the same regardless of whether they are preceded by not or not. Because subqueries are involved, we try to use the pubs database that comes with SQL SERVER this time. Before running, we can turn on the statistics statistics O status of SQL SERVER.

(1) select title,price from titles where title_id in

(select title_id from sales where qty > 30)

The result of the execution of this sentence is:

Table 'sales'. Scan count 18, logical read 56 times, physical read 0 times, pre-read 0 times.

Table 'titles'. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

(2) select title,price from titles where exists

(select * from sales where sales.title_id=titles.title_id and qty > 30)

The execution result of the second sentence is as follows:

Table 'sales'. Scan count 18, logical read 56 times, physical read 0 times, pre-read 0 times.

Table 'titles'. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

From then on, we can see that the execution efficiency with exists is the same as that with in.

7. Using the function charindex () is as efficient as the LIKE with the wildcard% before it.

As we mentioned earlier, if you precede the LIKE with the wildcard%, it will cause a full table scan, so its execution is inefficient. However, according to some data, using the function charindex () instead of LIKE will greatly improve the speed. Through my experiments, I found that this explanation is also wrong:

Select gid,title,fariqi,reader from tgongwen

Where charindex ('Criminal investigation Detachment', reader) > 0 and fariqi > '2004-5-5'

Time: 7 seconds, in addition: scan count 4, logical read 7155 times, physical read 0 times, pre-read 0 times.

Select gid,title,fariqi,reader from tgongwen

Where reader like'%'+ 'Criminal investigation Detachment' +% 'and fariqi >' 2004-5-5'

Time: 7 seconds, in addition: scan count 4, logical read 7155 times, physical read 0 times, pre-read 0 times.

8. Union is not always more efficient than or.

We have mentioned earlier that using or in the where clause will cause full table scans. Generally speaking, all the materials I have seen are recommended to use union instead of or. It turns out that this statement is applicable to most of them.

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16'or gid > 9990000

Time: 68 seconds. Scan count 1, logical read 404008 times, physical read 283 times, pre-read 392163 times.

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16'

Union

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid > 9990000

Time: 9 seconds. Scan count 8, logical read 67489 times, physical read 216times, pre-read 7499 times.

It seems that using union is generally much more efficient than using or.

But after experiments, the author found that if the query columns on both sides of or are the same, then the execution speed of using union is much lower than that of using or, although here union scans the index and or scans the whole table.

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16' or fariqi='2004-2-5'

Time: 6423 milliseconds. Scan count 2, logical read 14726 times, physical read 1 time, pre-read 7176 times.

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-9-16'

Union

Select gid,fariqi,neibuyonghu,reader,title from Tgongwen

Where fariqi='2004-2-5'

Time: 11640 milliseconds. Scan count 8, logical read 14806 times, physical read 108 times, pre-read 1144 times.

9. Field extraction should follow the principle of "how much you need and how much you need" to avoid "select *".

Let's do an experiment:

Select top 10000 gid,fariqi,reader,title from tgongwen order by gid desc

Time: 4673 milliseconds

Select top 10000 gid,fariqi,title from tgongwen order by gid desc

Time: 1376 milliseconds

Select top 10000 gid,fariqi from tgongwen order by gid desc

Time: 80 milliseconds

From this point of view, every time we extract one less field, the speed of data extraction will be improved accordingly. The speed of improvement depends on the size of the field you discard.

10. Count (*) is no slower than count (field)

According to some data, using * will count all columns, which is obviously less efficient than listing in a world. In fact, this statement is unfounded. Let's see:

Select count (*) from Tgongwen

Time: 1500 milliseconds

Select count (gid) from Tgongwen

Time: 1483 milliseconds

Select count (fariqi) from Tgongwen

Time: 3140 milliseconds

Select count (title) from Tgongwen

Time: 52050 milliseconds

As you can see from the above, the speed of using count (*) is the same as using count (primary key), while count (*) is faster than any other field except the primary key, and the longer the field, the slower the summary. I think that if you use count (*), SQL SERVER may automatically find the smallest field to summarize. Of course, it will be more direct if you write count (primary key) directly.

11. Order by is the most efficient to sort by clustered index column

Let's see: (gid is the primary key, fariqi is the aggregate index column)

Select top 10000 gid,fariqi,reader,title from tgongwen

Time: 196 milliseconds. Scan count 1, logic read 289 times, physical read 1 time, pre-read 1527 times.

Select top 10000 gid,fariqi,reader,title from tgongwen order by gid asc

Time: 4720 milliseconds. Scan count 1, logical read 41956 times, physical read 0 times, pre-read 1287 times.

Select top 10000 gid,fariqi,reader,title from tgongwen order by gid desc

Time: 4736 milliseconds. Scan count 1, logical read 55350 times, physical read 10 times, pre-read 775 times.

Select top 10000 gid,fariqi,reader,title from tgongwen order by fariqi asc

Time: 173 milliseconds. Scan count 1, logic read 290 times, physical read 0 times, pre-read 0 times.

Select top 10000 gid,fariqi,reader,title from tgongwen order by fariqi desc

Time: 156 milliseconds. Scan count 1, logic read 289 times, physical read 0 times, pre-read 0 times.

From the above, we can see that the speed of non-sorting and the number of logical reads are the same as that of "order by clustered index columns", but these are much faster than "order by nonclustered index columns".

At the same time, when sorting by a field, whether it is positive or reverse, the speed is basically the same.

12. Efficient TOP

In fact, when querying and extracting super-large datasets, the biggest factor that affects the response time of the database is not the data lookup, but the physical Imax 0 operation. Such as:

Select top 10 * from (

Select top 10000 gid,fariqi,title from tgongwen

Where neibuyonghu=' Office 'order by gid desc) as a

Order by gid asc

In theory, the execution time of the whole sentence should be longer than that of the clause, but the opposite is true. Because 10000 records are returned after the clause is executed, while only 10 statements are returned by the whole statement, the biggest factor that affects the database response time is the physical Icano operation. One of the most effective ways to restrict physical IDUBO operations here is to use the TOP keyword. The TOP keyword is a systematically optimized word in SQL SERVER that is used to extract the first few items or percentages of data. Through the author's application in practice, it is found that TOP is really easy to use and efficient. But this word does not exist in another large database, ORACLE, which is not a pity, although it can be solved in other ways in ORACLE (such as rownumber). In the future discussion of "implementing paged display stored procedures for tens of millions of levels of data", we will use the keyword TOP.

So far, we have discussed how to quickly query the data you need from a large database. Of course, the methods we introduce are all "soft" methods, and in practice, we have to consider various "hard" factors, such as network performance, server performance, operating system performance, and even network cards, switches, and so on.

Third, realize the general paging display stored procedure of small amount of data and massive data

To build a web application, paging browsing is essential. This problem is very common in database processing. The classic data paging method is: ADO record set paging method, that is, using the paging function of ADO (using cursors) to achieve paging. However, this paging method is only suitable for situations with a small amount of data, because cursors have their own disadvantages: cursors are stored in memory and cost a lot of memory. As soon as the cursor is created, the relevant records are locked until the cursor is cancelled. Cursors provide a means to scan a specific collection line by line. Generally, cursors are used to traverse data line by line, and different operations are performed according to different conditions for fetching data. On the other hand, for cursors (large data sets) defined in multiple tables and large tables, it is easy to make the program enter a long wait or even crash.

More importantly, for very large data models, it is a waste of resources to load the entire data source each time in paging retrieval. Now the popular paging method is to retrieve the data of the block area of the page size, rather than retrieve all the data, and then step through the current row.

The earliest way to better implement this method of extracting data based on page size and page number is probably the "Russian stored procedure". This stored procedure uses cursors, but this method is not generally accepted because of the limitations of cursors.

Later, someone on the Internet modified this stored procedure, and the following stored procedure is a paging stored procedure written in combination with our office automation example:

CREATE procedure pagination1

(@ pagesize int,-- page size, such as storing 20 records per page

@ pageindex int-- current page number

)

As

Set nocount on / / does not return a count or any result set

Begin

Declare @ indextable table (id int identity (1d1), nid int)-defines table variables

Declare @ PageLowerBound int-defines the base code for this page

Declare @ PageUpperBound int-defines the top code of this page

Set @ PageLowerBound= (@ pageindex-1) * @ pagesize

Set @ PageUpperBound=@PageLowerBound+@pagesize

Set rowcount @ PageUpperBound

Insert into @ indextable (nid) select gid from TGongwen where fariqi > dateadd (day,-365,getdate ()) order by fariqi desc

Select O.GID from TGongwen O O.midjol O.title Magi O.fadanwei Ji O.fariqi from TGongwen O, @ indextable t where O.gid=t.nid

And t.id > @ PageLowerBound and t.id200

As a result, we have the following paging scheme:

Select top page size *

From table1

Where id >

(select max (id) from

(select top ((page number-1) * page size) id from table1 order by id) as T

)

Order by id

When selecting columns that do not repeat values and are easy to distinguish between sizes, we usually choose the primary key. The following table lists the tables used by the author in the office automation system with 10 million data, with GID (GID is the primary key, but not a clustered index. ) to sort the column and extract the gid,fariqi,title field, take pages 1, 10, 100, 500, 1000, 10, 000, 100000, 250000 and 500000 as examples to test the execution speed of the above three paging schemes: (in milliseconds)

Page code scheme 1 scheme 2 scheme 3

--

1 60 30 76

10 46 16 63

100 1076 720 130

500 540 12943 83

1000 17110 470 250

10 24796 4500 140

100000 38326 42283 1553

250000 28140 128720 2330

500000 121686 127846 7168

From the above table, we can see that all three stored procedures are trustworthy and fast when executing paging commands under 100 pages. But in the first scheme, the speed slows down after paging more than 1000 pages. In the second scheme, the speed begins to slow down after performing more than 10,000 pages. On the other hand, there has been no major decline in the third scheme, and the stamina is still very strong.

After determining the third paging scheme, we can write a stored procedure accordingly. We all know that the stored procedure of SQL SERVER is a pre-compiled SQL statement, and its execution efficiency is higher than that of the SQL statement sent through the WEB page. The following stored procedure not only contains a paging scheme, but also determines whether to count the total number of data based on the parameters passed from the page.

-- get the data for the specified page

CREATE PROCEDURE pagination3

@ tblName varchar (255)-- Table name

@ strGetFields varchar (1000) ='*',-the column to be returned

@ fldName varchar (255) ='',-- sorted field name

@ PageSize int = 10,-Page size (number of records per page)

@ PageIndex int = 1,-- Page number

@ doCount bit = 0,-- returns the total number of records. A non-zero value returns the number of records.

@ OrderType bit = 0,-- sets the sort type, and a non-zero value causes the order to descend

@ strWhere varchar (1500) ='--query criteria (Note: do not add where)

AS

Declare @ strSQL varchar (5000)-main statement

Declare @ strTmp varchar (110)-temporary variable

Declare @ strOrder varchar-sort type

If @ doCount! = 0

Begin

If @ strWhere! ='

Set @ strSQL = "select count (*) as Total from [" + @ tblName + "] where" + @ strWhere

Else

Set @ strSQL = "select count (*) as Total from [" + @ tblName + "]"

End-- the above code means that if @ doCount does not pass a 0, the total count is performed. All of the following code is the case where @ doCount is 0

Else

Begin

If @ OrderType! = 0 / / descending (desc)

Begin

Set @ strTmp = "(select max"

Set @ strOrder = "order by [" + @ fldName + "] asc"

End

If @ PageIndex = 1 / / Page number

Begin

If @ strWhere! ='

Set @ strSQL = "select top" + str (@ PageSize) + "+ @ strGetFields+" from ["+ @ tblName +"] where "+ @ strWhere +" + @ strOrder

Else

Set @ strSQL = "select top" + str (@ PageSize) + "+ @ strGetFields+" from ["+ @ tblName+"] "+ @ strOrder"

If you execute the above code on the first page, it will speed up the execution.

End

Else

Begin-the following code gives @ strSQL the SQL code that actually executes

Set @ strSQL = "select top" + str (@ PageSize) + "+ @ strGetFields+" from ["+ @ tblName+"] where ["+ @ fldName+"] "+ @ strTmp+" (["+ @ fldName+"]) from (select top "+ str ((@ PageIndex-1) * @ PageSize) +" ["+ @ fldName+"] from ["+ @ tblName+"] "+ @ strOrder+") as tblTmp) "+ @ strOrder"

If @ strWhere! ='

Set @ strSQL = "select top" + str (@ PageSize) + "+ @ strGetFields+" from ["+ @ tblName+"] where ["+ @ fldName+"] "+ @ strTmp+" (["+ @ fldName+"]) from (select top "+ str ((@ PageIndex-1) * @ PageSize) +" ["+ @ fldName+"] from ["+ @ tblName+"] where "+ @ strWhere+"+ @ strOrder+") as tblTmp) and "+ @ strWhere+" + @ strOrder

End

End

Exec (@ strSQL)

GO

The above stored procedure is a general-purpose stored procedure and its comments have been written in it.

Select top page size *

From table1

Where id >

(select max (id) from

(select top ((page number-1) * page size) id from table1 order by id) as T

)

Order by id

In the case of a large amount of data, especially when querying the last few pages, the query time is generally no more than 9 seconds, while using other stored procedures will lead to timeouts in practice, so this stored procedure is very suitable for querying large-capacity databases.

The author hopes that through the analysis of the above stored procedures, it can bring some enlightenment to everyone and improve the efficiency of the work. At the same time, I hope that colleagues will put forward a better real-time data paging algorithm.

Fourth, the importance of clustered index and how to choose clustered index

In the title of the previous section, the author wrote: to achieve a small amount of data and mass data of the general paging display stored procedures. This is because in the practice of applying this stored procedure to the "office automation" system, the author finds that in the case of a small amount of data, the third stored procedure has the following phenomena:

1. The paging speed is generally maintained between 1 second and 3 seconds.

2. When querying the last page, the speed is generally 5 to 8 seconds, even if the total number of pages is only 3 or 300000 pages.

Although in the case of large capacity, the implementation process of this paging is very fast, but in the first few pages, the speed of 3 seconds is even slower than the first unoptimized paging method. In the words of the user, it is "not as fast as the ACCESS database", which is enough to cause the user to give up using the system you developed.

The author analyzes that the crux of this phenomenon is so simple, but so important: the sorted field is not a clustered index!

The title of this article is "query optimization and paging algorithm scheme". The author only puts together the two topics, which are not very related to "query optimization" and "paging algorithm", because both need a very important thing-clustered index.

As we mentioned in the previous discussion, clustered indexes have two biggest advantages:

1. Narrow the query scope as quickly as possible.

2. Sort the fields as quickly as possible.

Article 1 is mostly used for query optimization, while Article 2 is mostly used for data sorting when paging.

On the other hand, there can only be one clustered index in each table, which makes the clustered index more important. The selection of clustered indexes can be said to be the most critical factor to achieve "query optimization" and "efficient paging".

However, it is usually a contradiction to make clustered index columns meet both the needs of query columns and the needs of sorting.

In the previous discussion of "Index", the author regards fariqi, that is, the date of the user's post, as the starting column of the clustered index, and the accuracy of the date is "day". The advantage of this approach, as mentioned earlier, is that it has a great advantage over using ID primary key columns in time-delimited fast queries.

However, when paging, because there are duplicate records in this clustered index column, it is impossible to use max or min as the most paging reference, thus unable to achieve more efficient sorting. If the ID primary key column is used as a clustered index, then the clustered index is of no use except for sorting, which is actually a waste of the valuable resource of the clustered index.

To resolve this contradiction, the author later added a date column whose default value is getdate (). When the user writes the record, this column is automatically written to the current time, accurate to milliseconds. Even so, to avoid unlikely coincident, create a UNIQUE constraint on this column. Use this date column as the clustered index column.

With this time-based clustered index column, users can not only use this column to find the query of a certain period of time when the user is inserting data, but also can be used as the only column to implement max or min, which becomes the reference of the paging algorithm.

After such optimization, the author finds that the paging speed is generally tens of milliseconds or even 0 milliseconds in the case of large or small amount of data. And the query speed of narrowing the scope with the date segment is not any slower than before.

The clustered index is so important and precious that the author concludes that the clustered index must be built on:

1. On the fields you use most frequently to narrow down the query

2. on the fields that you use most frequently and need to be sorted.

I hope this article can not only bring some help to everyone's work, but also enable you to understand the method of analyzing problems; most importantly, I hope this article can attract jade and arouse everyone's interest in study and discussion. to jointly promote and jointly make our own best efforts for the cause of strengthening the police through science and technology and the Golden Shield project.

Finally, in the experiment, I found that when users query a large amount of data, the biggest impact on the speed of the database is not the memory size, but CPU. When experimenting on my P42.4 machine, looking at "Explorer", CPU often lasts up to 100%, while the amount of memory has not changed or changed much. Even when we tested it on our HP ML 350G3 server, the CPU peak reached 90%, which generally lasted around 70%.

The test data are all from our HP ML 350 server. Server configuration: dual Inter Xeon hyperthreading CPU 2.4G, memory 1G, operating system Windows Server 2003 Enterprise Edition, database SQL Server 2000 SP3.

On the SQL index and database optimization is how to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report