In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you how to achieve high-performance indexing in MySQL, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
What is an index or key is a data structure that the storage engine uses to find records quickly.
Index is the most effective means to improve the performance of MySQL queries. We often say that MySQL performance tuning is basically the optimization of indexes. So this is the knowledge point that every developer needs to master and will apply.
An index is a data structure, which is also a file stored on disk. In the last article, when we learned about the logical architecture of MySQL, we learned about the InnoDB and MyISM storage engines. The InnoDB storage engine indexes and data are the same file, and the MyISAM indexes and data are two separate files.
In MySQL, indexes are implemented at the storage engine layer rather than the Server layer, so indexes work differently for different storage engines. Our analysis of the index should be based on the storage engine, and InnoDB is the default storage engine for MySQL.
Advantages of the index:
The index greatly reduces the amount of data that the server needs to scan.
Indexes can help the server avoid sorting and temporary tables.
The index can be changed from random Imax O to sequential Imax O.
Disadvantages of the index:
An index is a data structure that takes up extra disk space.
When the amount of data in the table is large, the cost of maintaining the index is high.
Index data model
The data structures and algorithms of each storage engine are different, so let's take a look at the index types supported by MySQL itself.
B-Tree index
Generally speaking, the index structure refers to the B-Tree index, which is supported by most MySQL storage engines, but different storage engines use B-Tree indexes in different ways, and their performance varies. InnoDB uses B+Tree, which is stored in the original data format, referencing the indexed rows according to the primary key.
All B-Tree values are stored sequentially, and each leaf is the same distance from the root. The following is an abstract diagram of B-Tree:
B-Tree can speed up access to data.
The storage engine does not need a full table scan to get the required data, it starts the search from the root node of the index. The pointers to the child nodes are stored in the slot of the root node, and the search engine looks to the lower level according to these pointers. By comparing the value of the node page with the value you are looking for, you can find the appropriate pointer to enter the lower node. In the end, the engine either finds the corresponding value or the record does not exist.
If the B-Tree index has multiple columns, the index values are sorted according to the index order defined when the table is created, so the order of the index is more important.
B-Tree is an N-ary tree, and the size of N depends on the size of the data block.
Take an integer segment index of InnoDB as an example, N is about 1200. When the tree height is 4, you can store 1200 to the third power of data, which is about 1.7 billion. An index of an integer field on a table with 1 billion, looking for a value that accesses the disk up to 3 times. In fact, in application, if the second layer is loaded into memory in advance, then the disk access times will be even less.
Hash indexing
Hash indexing is based on a hash table, and only queries that exactly match all columns are valid.
For each row of data, the storage engine calculates a hash code (hash code) for all index columns. The hash code is a relatively small value, and the hash code is calculated differently for rows with different key values. The hash index stores all the hash codes in the index, while keeping pointers to each data row in the hash table.
Create a table test_hash with the storage engine memory, the index full_name, and the index type hash.
CREATE TABLE `test_ hash` (`short_ name` varchar (255) DEFAULT NULL, `short_ name` varchar (32) DEFAULT NULL, `age` int (11) DEFAULT NULL, KEY `idx` (`full_ name`) USING HASH) ENGINE=MEMORY DEFAULT CHARSET=utf8
The data in the table is as follows:
Mysql > select * from test_hash +-+ | full_name | short_name | age | +-+ | Dwayne Johnson | Johnson | NULL | | Taylor Swift | Taylor | NULL | | Leonardo DiCaprio | Leonardo | NULL | | Vin Diesel | Diesel | NULL | | Kobe Bryant | Kobe | NULL | +-+ 5 rows in set (0.00 sec)
Then the data structure of the hash index may be:
When we execute the query statement:
Mysql > select short_name from test_hash where full_name = 'Dwayne Johnson'
The execution flow of this sql statement:
1) if the hash code is calculated according to the where condition 'Dwayne Johnson', then the hash code is 1234.
2) MySQL finds 1234 in the index and finds the corresponding row record pointer based on this value.
3) find the corresponding row according to the pointer address, and finally compare whether the full_ name column in this row is' Dwayne Johnson'.
So now there is a question, what to do when the hash code conflicts? Friends who have studied HashMap must have an idea at this time: use linked lists when hash codes collide. Yes, MySQL also uses the linked list structure when the hash code of the key conflicts. If it is a linked list structure, when searching, we need to traverse the row records pointed to by each linked list pointer to match, so the search efficiency is relatively low when the hash conflict is large.
From the above example, we can see that only the hash value is stored in the structure of the hash index, and its structure is relatively compact and fast for accurate query.
But there are some restrictions on hash indexing:
What is stored in the hash index is the hash value of the key, which is not in the order of the index column, so it cannot be used for sorting.
The hash index does not support partial index matching lookup, because the hash index is always the whole content of the index column. If we have two columns in the index (Agraine B) and we only want to use column A when querying, it is impossible to apply the index at this time.
The hash index only supports equivalent queries, such as =, in, etc., and it does not support any range queries.
When there are hash conflicts, the storage engine must traverse all row pointers in the linked list and compare them row by row until all eligible rows are found. If there are more hash conflicts, the cost of index maintenance is higher.
In MySQL, currently only the memory engine explicitly supports hash indexes.
InnoDB index model
As we mentioned earlier, the index structure of InnoDB is B+Tee, which refers to the indexed row with the primary key. So in InnoDB, tables are stored in the form of indexes according to the order of primary keys, and each index corresponds to a Btree in InnoDB.
B+Tree index
B+Tree is an extension of the B-Tree we mentioned earlier. Each node of the B-Tree contains data items, so that each disk will store fewer index values, the height of the tree will be larger, and the number of disk I / O queries will increase.
What kind of data structure is B+Tree? The following is an abstract diagram of B+Tree:
The difference between B+Tree and B-Tree:
The non-leaf node of B+Tree does not save the data information, only the index value and the pointer to the next layer node.
The leaf node of B+Tree holds the data.
The leaf nodes of B+Tree are arranged sequentially, and the leaf adjacent nodes have pointers referencing each other.
B+Tree can better match the read and write characteristics of the disk and reduce the disk access times of a single query.
The index type of InnoDB is divided into primary key index and non-primary key index.
Primary key index and non-primary key index
Create the table user, whose storage engine is InnoDB,id as the primary key and name as the normal index.
CREATE TABLE `user` (`id` int (10) NOT NULL, `name` varchar (32) DEFAULT NULL, `age` int (3) DEFAULT NULL, `sex` varchar (1) DEFAULT NULL, `comment` varchar (255) DEFAULT NULL, `date`date DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx` (`name`) USING BTREE) ENGINE=InnoDB DEFAULT CHARSET=utf8
The data in the table is as follows:
Mysql > select * from user +-+-+ | id | name | age | sex | comment | date | +-+-+ | 1 | | Alen | 20 | 1 | NULL | 2021-02-16 | | 2 | Alex | 21 | NULL | 2021-02-16 | 3 | Saria | 16 | 0 | NULL | 2021-02-16 | | 4 | Semyt | 18 | 0 | NULL | 2021-02-16 | | 5 | Summy | 17 | 1 | NULL | 2021-02-16 | 6 | Tom | 19 | 0 | NULL | 2021-02-16 | + | -+-+ 6 rows in set (0.00 sec)
The primary key index is also known as a clustered index, and its leaf nodes contain primary key values, transaction ID, rollback pointers for transactions and MVCC, and all remaining columns.
Mysql > select * from user where id = 1
The primary key index only needs to search the B+Tree ID to get the row records that match the criteria.
InnoDB aggregates data through a primary key index, and if there is no primary key defined in the table, InnoDB selects a unique non-empty index instead. If there is no such index, InnoDB implicitly defines a primary key as the clustered index. This is why the tick creates a primary key for each table.
Advantages of clustered indexes:
Save the relevant data together, reducing the disk Ibank O
Clustered index saves data and index on the same BTree, and data access is faster.
Disadvantages of clustered indexes:
If the data is all in memory, the query performance of the clustered index is not so good.
The speed of insertion depends heavily on the insertion order. Try to make sure that the primary key index is orderly.
It is more expensive to update clustered index columns.
A problem that may cause a page to split when you need to move a row when inserting a row or updating the primary key. When inserted into a full page, the storage engine splits the page into two pages to hold data, which takes up more disk space.
Non-primary key index is also called non-clustered index, which is also called secondary index in InnoDB. The leaf node content of a non-primary key index is the value of the primary key.
Mysql > select * from user where name = 'Alen'
When querying the non-primary key index, we first search the name index tree according to the name general query and find that the id is 1, and then query the id=1 to the ID index tree once to get the row records that meet the criteria.
We call the process of searching the ordinary index tree to get the primary key and then searching the primary key index tree as returning to the table.
The query of the ordinary index retrieves one more B+Tree than the primary key index. in the practical application scenario, if the primary key index can be used, choose the primary key index as much as possible.
There are other principles when creating indexes, so let's move on to high-performance indexing strategies.
Index strategy
When learning the indexing strategy, you can use the explian keyword in the previous article to query the execution plan.
Selection of index
There are many categories of indexes, and we can divide indexes into single-column indexes and federated indexes according to the number of index fields.
Single-column index: an index contains only one column, and there can be multiple single-column indexes in a table.
Federated index: an index contains multiple columns.
We can also divide indexes into normal indexes, unique indexes, and primary key indexes.
General index: basic index type, often used to improve query efficiency, with no restrictions on data. Allows null and duplicate values to be inserted in the index column.
Unique index: the value in the index column must be unique, allowing null values.
Primary key index: a special unique index that does not allow null values.
There are so many categories of indexes, how do we choose when creating an index?
Samsung system indexed:
One star: index-related records are put together.
Two stars: the order of the data in the index is the same as that in the lookup column.
Samsung: the columns of the index contain all the columns needed in the query.
The correct creation and use of indexes is the basis for achieving high-performance queries. There are no absolute requirements for the selection of indexes, mainly according to their own business needs, but there are some principles that we can use as a reference when creating indexes.
The higher the degree of differentiation of the index column, the higher the query efficiency.
Adding frequently searched columns to the index can improve the search efficiency.
Indexes not only improve query efficiency, but also participate in sorting and grouping, and fields that are often used for sorting and grouping also need to be added to the index.
When creating an index, the highly differentiated fields should be ranked first. That is, you need to pay attention to the order of the index fields.
Index columns cannot participate in any operations.
Avoid creating duplicate indexes, that is, creating the same type of index in the same order on the same column.
Indexes that have never been used should be deleted as much as possible.
For columns of type blob, text, or long varchar, you must use a prefix index with the longest prefix to ensure a high degree of discrimination.
There is not much difference in query efficiency between a normal index and a unique index, because the engine reads data by page. For unique indexes, as long as they are found, the comparison will not continue, because the index has been guaranteed to be unique. For the ordinary index, after finding the record that meets the condition, it needs to continue to search until the first record that does not meet the condition is found, but for the engine that reads the data according to the page, one more judgment has less impact on performance. The choice of general index and unique index not only ensures the accuracy of the business, but also considers the impact on performance when updating data.
Independent column
"Independent column" means that the index cannot be part of an expression or an argument to a function.
For example, in the following sql statement, the index field name participates in the function operation when querying, which will cause the index to fail and the full table scan.
Mysql > select * from user where CONCAT (name,'n') = 'Alen'
Add the index age field. If we perform an operation on the age field during the query, it will also cause the index to fail:
Mysql > select * from user where age + 1 = 21
We should get into the habit of simplifying where conditions in our development, always using separate index columns.
Overlay index
If we modify the sql statement that is queried according to the normal index as follows:
Mysql > select name from user where name like 'Al%'
At this point, you only need to query the ordinary index tree to get the column to be queried, because the column to be queried is already in the index tree, and there is no need to go back to the table query.
The scenario in which this index field covers the result field that we need to query is called an override index.
Overlay index can reduce back to the table, reduce the search times of index tree, and significantly improve query performance, so overlay index is a better optimization strategy.
In actual development, some commonly used retrieval fields can be added to the index according to business needs, and the overlay index can be used to improve query efficiency, but in some scenarios, the index cannot be maintained too much in order to use the overlay index. after all, the maintenance cost of the index is also very high.
Leftmost prefix
At this time, we also need to think about the diversity of our queries in business scenarios, so we can't design an index for each scenario in order to use indexes, can we?
At this point we will take advantage of another feature of the B+Tree tree index structure, the leftmost prefix.
The leftmost prefix can be the leftmost fields of the federated index or the leftmost characters of the string index.
Create a federated index (name,age) in the same order.
Execute the sql statement at this point:
Mysql > select * from user where name = 'Alen'
Although it is a federated index, the name field ranks first and can hit the index.
Mysql > select * from user where name like 'Al%'
If you use the leftmost N strings of the name index field, you can also hit the index. But if we use% Al, we can't hit the index.
If we use the following sql query statement:
Mysql > select * from user where age = '16'
Although age is also a field of the federated index, its order is after name, and the index cannot be hit by using the age query directly. Therefore, when creating a federated index, you must consider the order of the index fields.
There is a principle for index maintenance: if you can maintain one index less by adjusting the order of the index, you need to adjust the order rather than increase the index first.
MySQL can use the same index to sort and scan rows, but MySQL can use to sort results only if the column order of the index is exactly the same as that of the order by clause, and the columns are sorted in the same direction (positive or reverse).
The order by clause is the same as the query type restriction, and it also needs to meet the principle of "leftmost prefix", otherwise MySQL cannot use index sorting.
Index push-down
What happens when our query statement does not satisfy the leftmost prefix?
For example, if we query the information of a person whose first name is A, age 20 and gender 1 (male), the sql sentence is as follows:
Mysql > select * from user where name like'A% 'and age = 20 and sex = 1
According to the leftmost prefix principle we learned earlier, we first search for the first primary key 1 that satisfies the condition according to'A', and then query back to the table to determine whether the other two conditions are met.
MySQL5.6 then introduces the optimization of index push-down, that is, it will filter first according to the fields contained in the index to reduce the number of times to return to the table.
Our above sql statement will go back to the table twice before MySQL5.6 to compare whether the other conditions of the data of primary key 1 and 2 are met, but if the condition age = 20 is not satisfied with the optimization of index push down, it will be filtered out directly, and you only need to return to the table for primary key 1 once to get the result.
The above is how to implement high-performance indexing in MySQL. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.