In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article is about what MySQL's principles for indexing data tables are. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Interview questions:
What is the index?
What are the advantages of indexing?
What are the shortcomings of the index?
What are the factors to consider when building an index?
What are the principles for indexing data tables?
What is an index override?
Does a non-clustered index necessarily return a table query?
1. What is the index?
Index is a data structure that helps us to query the database quickly.
In many cases, there are so many records in the table that you need a lot of data pages to store these records. Finding records in many pages can be divided into two steps:
1. Navigate to the page where the record is located
2. Find the corresponding record from the page
In the case of no index, whether it is based on the value of the primary key column or other columns, because we can not quickly locate the page where the record is located, we can only look down the two-way linked list from the first page. Because you need to traverse all the data pages, this way consumes super performance. Therefore, people expect a method to complete the search efficiently, so there is an index, which is a kind of data structure.
two。 What are the advantages of indexing?
The index allows the server to quickly locate to the specified location of the table (the index greatly reduces the amount of data that the server needs to scan). The most common B-Tree index stores data sequentially, so mysql can do order by and group by operations (indexes can help servers avoid sorting and temporary tables). Because the data is ordered, B-Tree stores all the relevant column values together (the index can turn random IO into sequential IO). Because the actual column values are stored in the index, some query values can be fully queried using the index. Therefore, the advantages of the summarized index are:
1. The index greatly reduces the amount of data that the server needs to scan.
2. Indexes can help the server avoid sorting and temporary tables
3. The index can change random IO into sequential IO.
3. What are the shortcomings of the index?
1. The cost of space:
Physical space is required for indexing
InnoDB storage engine default to use the Btree index, each index, to establish a Btree, each node of each Btree is a data page, a data page default will occupy 16KB storage space, and a large Btree consists of many data pages, which will take up a large piece of storage space.
2. The price of time:
Because each time the table record is added or deleted, the index needs to be dynamically maintained, resulting in a longer time to add, delete, delete and change.
Each B + tree index needs to be modified whenever the data in the table is added, deleted or changed. Each node in the B+ tree forms a two-way linked list according to the order of the values of the index columns from small to large. Records in both leaf nodes and non-leaf nodes are arranged into an one-way linked list according to the order of index column values from small to large. The addition, deletion, modification and search operation may cause damage to the sorting of nodes and records, so the storage engine needs extra time for page splitting, page recycling and other operations to maintain the order of nodes and records.
Before executing the query statement, we first need to generate an execution plan. In general, a query statement uses at most one secondary index in the execution process. Therefore, when generating the execution plan, it is necessary to calculate the cost of executing the query using different indexes. Finally, the index with the lowest cost is selected to execute the query. If too many indexes are built, the cost analysis may take too much time. Thus affecting the execution performance of the query statement.
4. What are the factors to consider when building an index?
You should pay attention to the following when creating and using indexes:
Create indexes only for columns used for searching, sorting, and grouping
The column is indexed when the number of non-duplicate values in the column accounts for a large proportion of the total number of records.
The type of index column is as small as possible.
You can create an index for only the index column prefix to reduce the storage space occupied by Sue.
Try to use the overlay index to query in order to reduce the performance caused by the operation of Bi Miao back to the table.
Let the index column appear separately in the search criteria in the form of column names
To minimize page splits in clustered indexes, it is recommended that the primary key have the anto_increment attribute
Locate and delete redundant and duplicate indexes in the table
Create a database table:
Create table single_table (id int not auto_increment, key1 varchar, key2 int, key3 varchar, key_part1 varchar, key_part2 varchar, key_part3 varchar, common_field varchar, primary key (id), # Cluster Index key idx_key1 (key1) # second-level index unique key uk_key2 (key2), # second-level index And the index is the only secondary index key idx_key3 (key3), # secondary index key idx_key_part (key_part1,key_part2,key_part3) # secondary index, also a federated index) Engine=InnoDB CHARSET=utf8 1. Create an index only for columns used for searching, sorting, and grouping
We only create indexes on columns that appear in the where clause, columns in the order by or group by clause, and join columns in the join clause. There is no need to index only the columns that appear in the query list.
For example, the following query statement:
Select common_field,key_part3 from single_table where key1='a'
We only need to index the key1 that appears in the where clause, but there is no need to index the common_field and key_part3 columns in the query list.
2. The type of index column is as small as possible.
When defining a table structure, you explicitly specify the type of column. Take integer types as an example, there are tingint, mediumint, int, and bigint. The storage space they occupy increases sequentially, and so does the range of integers they can represent. If you want to index a column of an integer type, try to make the index column use a smaller type as long as the range of integers represented allows, for example, do not use bigint when using int, because the smaller the data type, the less storage space the index takes up, and the more records can be stored in a data page, the smaller the performance loss caused by disk IO, and the higher the read and write efficiency.
This recommendation is more applicable to the primary key of a table, because not only will the clustered index store the primary key value, but all the nodes of the secondary index will store the primary key value of a record, if the primary key uses a smaller data type, which means more storage space can be saved.
3. Index the column prefix
A string is actually made up of several characters, if the string is very long, then storing the string will take up a lot of storage space. When indexing the column in which the string resides, you need to store the complete string of the column in the corresponding B+ tree. The longer the string, the more storage space it takes up in the index.
The string prefix of the index column is actually sorted, so the designer of the index devised a scheme to store only the first few characters of the string in the index, that is, only the first few characters of the string are retained in the record of the secondary index.
For example, you can modify the idx_key1 index so that only the first 10 characters of the string are retained in the index:
Alert table single_table drop index idx_key1;alert table single_table add index idx_key1 (key1 (10))
Then execute the following query statement:
Select * from single_table where key1='abcdefghijklmn'
Since only the first 10 characters of the string are retained in the secondary index records of idx_key1, we can only locate the secondary index records with the prefix 'abcdefghij'', and then determine whether they meet the key1='abcdefghijklmn' condition when scanning these secondary index records. When the string stored in the column contains more characters, this method of indexing the column prefix can significantly reduce the size of the index.
However, in the case of indexing only the prefix, the following query statement cannot use the index to complete the sorting requirement:
Select * from single_table order by key1 limit 10
Because the secondary index idx_key1 does not contain complete key1 column information, it is not possible to sort records with the same first 10 characters of the key1 column but different other characters when performing a query using only the idx_key1 index, that is, indexing only the column prefix does not support the need to sort using the index.
4. Overwrite index
In order to completely bid farewell to the performance loss caused by the table return operation, it is recommended that you include only index columns in the query list, such as this query statement:
Select key1,id from single_table where key1 >'a 'and key1
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.