In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you what the underlying principle of indexing in mysql database is. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Next, let's give an example to illustrate the difference between a clustered index and a nonclustered index: in fact, the text of our Chinese dictionary is itself a clustered index. For example, if we want to look up the word "an", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters sorted by pinyin begins with the English letter "a" and ends with "z", so the word "an" is naturally placed at the front of the dictionary. If you have searched all the parts that start with "a" and still can't find the word, it means you don't have the word in your dictionary; similarly, if you look up the word "Zhang", you will turn your dictionary to the last part, because the pinyin of "Zhang" is "zhang". In other words, the body of the dictionary is itself a directory, and you don't need to look in other directories to find what you are looking for. We call this kind of text content itself a kind of directory arranged according to certain rules as "clustered index". If you know a word, you can quickly look it up from the automatic. But you may also encounter a word you do not know and do not know its pronunciation. At this time, you will not be able to find the word you are looking for according to the previous method, but need to find the word you are looking for according to the "side radical". Then turn directly to a page according to the page number after the word to find the word you are looking for. However, the sorting of the words you find by combining the "radical catalogue" and the "word search list" is not really the sorting method of the text. For example, if you look up the word "Zhang", we can see that the page number of "Zhang" in the word search list after the search is 672 pages, and the top of the word "Zhang" in the word search table is the word "chi", but the page number is 63 pages, the bottom of "Zhang" is the word "crossbow", and the page is 390 pages. Obviously, these words are not really located at the top and bottom of the word "Zhang". The continuous words "Chi, Zhang, and crossbow" you see now are actually their sorting in the nonclustered index. it is the mapping of the words in the dictionary body in the nonclustered index. We can find the words you need in this way, but it takes two processes to find the results in the directory and then turn to the page number you need. We call this kind of catalog purely a catalog, and the sorting method in which the text is purely the text is called a "nonclustered index". Through the above examples, we can understand what are "clustered indexes" and "nonclustered indexes". As a further extension, it is easy to understand that there can be only one clustered index per table because directories can only be sorted in one way. (II) when to use the tables under a clustered or nonclustered index summarizes when to use a clustered or nonclustered index (important). Action description columns are often grouped and sorted to return data within a certain range, one or very few different values, different large numbers, different purposes, different values, frequently updated columns, foreign key columns, primary key columns, frequently modified index columns, use of clustered indexes should not be used, should not be used in fact We can understand the above table through the previous examples of the definitions of clustered and nonclustered indexes. For example, return an item of data within a certain range. For example, a table of yours has a time column in which you build the aggregate index. When you query all the data between January 1, 2004 and October 1, 2004, this speed will be very fast. Because the body of your dictionary is sorted by date, the clustering index only needs to find the beginning and end of all the data to be retrieved. Unlike a nonclustered index, you must first find the page number corresponding to each item of data in the directory, and then find the specific content according to the page number. (3) combined with practice, the purpose of the misunderstanding theory in the use of index is to apply. Although we have just listed when clustered or nonclustered indexes should be used, in practice, the above rules are easily ignored or cannot be analyzed comprehensively according to the actual situation. Next, we will talk about the misunderstandings in the use of the index according to the practical problems encountered in practice, so that we can master the method of index establishment. 1. The author thinks that the idea that the primary key is a clustered index is extremely wrong and a waste of the clustered index. Although SQL SERVER builds a clustered index on the primary key by default. Typically, we create an ID column in each table to distinguish each piece of data, and this ID column is automatically incremented, with a step size of 1. This is the case with column Gid in our example of office automation. At this point, if we set this column as the primary key, SQL SERVER defaults this column to the clustered index. This has the advantage of having your data physically sorted by ID in the database, but I don't think it makes much sense. Obviously, the advantage of clustered index is obvious, and the rule that there can be only one clustered index in each table makes clustered index more precious. From the definition of clustered index we mentioned earlier, we can see that the greatest advantage of using clustered index is that it can quickly narrow the scope of query and avoid full table scanning according to the query requirements. In practical application, because the ID number is generated automatically, we do not know the ID number of each record, so it is difficult for us to query with the ID number in practice. This makes it a waste of resources to use the primary key ID as a clustered index. Secondly, making each field with a different ID number as a clustered index does not conform to the rule that an aggregate index should not be built in the case of large numbers with different values; of course, this situation only has a negative effect on the contents of the record, especially the index items, but has no effect on the query speed. In the office automation system, whether the documents that need to be signed by the user, the meeting or the file query displayed on the home page of the system are inseparable from the field is the "date" and the "user name" of the user. Typically, the home page of office automation displays documents or meetings that have not been signed by each user. Although our where statement can only limit the current user has not yet signed for, but if your system has been established for a long time, and a large amount of data, then, every time each user opens the home page for a full table scan, this is not meaningful, the vast majority of users a month ago the files have been browsed, this will only increase the cost of the database. In fact, we can allow the user to open the home page of the system, the database only query the user has not read the file for nearly 3 months, through the "date" field to limit table scanning and improve the query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times faster or even faster. The word "theoretically" is mentioned here because if your clustered index is still blindly built on the primary key ID, your query speed is not so fast, even if you build the index on the date field (non-aggregated index). Let's take a look at the speed performance of various queries in the case of 10 million pieces of data (250000 pieces in 3 months): (1) build a clustered index only on the primary key without dividing the time period: Select gid,fariqi,neibuyonghu,title from tgongwen time: 128470 milliseconds (128s) (2) create a clustered index on the primary key Time to build nonclustered index on fariq: select gid,fariqi,neibuyonghu,title from Tgongwenwhere fariqi > dateadd (day,-90,getdate ()): 53763 milliseconds (54 seconds) (3) build aggregate index on date column (fariqi): select gid,fariqi,neibuyonghu,title from Tgongwenwhere fariqi > dateadd (day,-90,getdate ()): 2423 milliseconds (2 seconds) although 250000 pieces of data are extracted from each statement, the difference between cases is huge. In particular, the difference when building a clustered index on a date column. In fact, if your database really has 10 million capacity, build the primary key on the ID column, as in the first and second cases above, the performance on the web page is a timeout and cannot be displayed at all. This is also the most important factor in my abandonment of the ID column as a clustered index. The way to get the above speed is to add: declare @ d datetimeset @ d=getdate () before each select statement and after the select statement: select [statement execution time (milliseconds)] = datediff (ms,@d,getdate ()) 2. As long as the index is established, the query speed can be significantly improved. In fact, we can find that in the above example, statements 2 and 3 are exactly the same, and the indexing fields are also the same. The only difference is that the former builds a non-aggregate index on the fariqi field, while the latter builds an aggregate index on this field, but the query speed is very different. Therefore, it is not a simple index on any field that can speed up the query. From the statement to create the table, we can see that there are 5003 different records in the fariqi field in this table with 10 million data. It is perfectly appropriate to build an aggregate index on this field. In reality, we send several files every day, and these files are issued on the same date, which is fully in line with the requirement of establishing a clustered index: "neither the vast majority of them are the same, nor only a few of them are the same." In view of this, it is very important for us to improve the query speed by establishing an "appropriate" aggregate index. 3. Add all the fields that need to improve the query speed to the clustered index to improve the query speed. As mentioned above, the fields that are inseparable from the data query are the "date" and the "user name" of the user. Since both fields are so important, we can combine them to create a composite index (compound index). Many people think that adding any field to the clustered index can improve the query speed, while others are confused: will the query speed slow down if the composite clustered index fields are queried separately? With this question in mind, let's take a look at the following query speed (the result set is all 250000 pieces of data): (the date column fariqi is first listed at the beginning of the composite clustered index User name neibuyonghu is listed next) (1) select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi > '2004-5-5' query speed: 2513 Ms select gid,fariqi,neibuyonghu,title from Tgongwen where fariqi > '2004-5-5' and neibuyonghu=' office query speed: 2516 Ms (3) select gid,fariqi,neibuyonghu,title from Tgongwen where neibuyonghu=' office 'query speed: 60280 Ms from the above experiment We can see that if only the starting column of the clustered index is used as the query condition and all the columns of the composite clustered index are used at the same time, the query speed is almost the same, even slightly faster than using all the composite index columns (when the number of query result sets is the same). However, if only the non-starting column of the composite clustered index is used as the query condition, the index has no effect. Of course, the query speed of statements 1 and 2 is the same because the number of entries in the query is the same. If all the columns of the composite index are used and the query results are few, "index coverage" will be formed, so the performance can be optimal. At the same time, keep in mind that whether or not you often use other columns of the aggregate index, the leading column must be the most frequently used column. (4) Summary of experience in using indexes not found in other books 1. Using an aggregate index is faster than using a primary key that is not an aggregate index. Here is an example statement: (all extract 250000 pieces of data) select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi='2004-9-16 'usage time: 3326 milliseconds select gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid'2004-1-1' usage time: 6343 milliseconds (extracting 1 million entries) select gid,fariqi,neibuyonghu,reader Title from Tgongwen where fariqi > '2004-6-6' time: 3170 milliseconds (extracting 500000 messages) select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi='2004-9-16 'time: 3326 milliseconds (exactly the same as the result of the previous sentence. If the quantity collected is the same, then the greater than sign and the equal sign are the same) select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi > '2004-1-1' and fariqi'2004-1-1 'order by fariqi time: 6390 milliseconds select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.