In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly gives you a brief introduction to the types or types of indexes in MySQL. You can look up the relevant professional terms on the Internet or find some related books to supplement them. We will not dabble here, so let's go straight to the topic. I hope this article can bring you some practical help with the types or types of indexes in MySQL.
What is an index?
Index is a kind of data structure, which is used to improve the efficiency of data query. A more common analogy is to compare it to a catalogue of books. Through the catalogue, you can accurately find the page of the content of a chapter.
It doesn't make sense to use an index when the amount of data is small, even if there is no index, it doesn't take much time for the computer to traverse the data. Once the amount of data is large, the index is necessary to ensure that we can provide services normally and ensure the user experience.
Index type
An index is a data structure, and there are multiple implementations to deal with different scenarios. In MySQL, there are mainly Hash indexes and B+Tree.
Hash index
Hash should be familiar to all of you. Hash is a data structure in the form of key-value. The implementation is generally array + linked list structure, through the hash function to calculate the position of key in the array, and then if there is a hash conflict through the linked list to solve (zipper method). Of course, there are other ways to resolve hash conflicts. Hash this kind of data structure is very common, for example, our system uses HashMap to build hot data cache, access efficiency is very good.
The hash structure stores data first by calculating the hash value of key to determine its position in the array, and if there is a conflict, build a linked list at the location of the array. There are obviously several problems with this:
Even the locations calculated by key with the same characteristics may be far apart, and the continuous query is inefficient. That is, range queries are not supported.
The hash index stores calculated hash values and row pointers instead of storing specific row values, so querying the data through the hash index requires two queries (first query the location of the row, and then find the specific data)
The premise of hash index query data is to calculate the hash value, that is, it requires that key is a key that can accurately point to a piece of data, so matching queries such as like are not supported.
So what we can know is that the hash index is suitable for quickly selecting a row of data.
B+Tree structure
From the name point of view, this is obviously a kind of tree structure, which is necessary in the textbooks of data structure in college. Tree structure is a very important data structure, which is used in many places.
We mentioned above that the hash index cannot query the range, and there is also a kind of structure in the tree structure that is convenient for orderly query-binary search tree. The structure of the binary search tree requires that the value of the parent node is larger than that of the left child node and smaller than that of the right child node, as shown below:
Although the tree structure is also used in the MySQL index, it is not a binary tree. Because the data in the database is eventually stored on disk, and if there are too many nodes in the tree, it will take more time to transfer between nodes. In the implementation of MySQL, we choose to put more content on the same node, and the operation of the same node is completed in memory, reducing the number of transfers between nodes in external memory, in order to achieve the purpose of improving efficiency. This is B+Tree, and in the implementation of B+Tree, a three-tier tree structure can basically meet almost all of our needs.
Related recommendation: "mysql database knowledge learning"
B-Tree
First of all, to understand B+Tree, you have to understand that Bray Tree is a balanced tree, where B refers to Balance rather than Binary, or rather, B-Tree is a multi-path balanced search tree.
The multi-path balanced search tree is shown below:
This is a 2-3 tree, which means that each node has two values and the number of branches of each node is 3. From the figure above, we can see that the middle structure is very suitable for querying data. The value of the left subtree of each node is less than the smallest value in the current node, the value of the middle subtree is all in the middle of the two values of the current node, and the value of the right subtree is all greater than the maximum value of the current node.
For example, we need to find the value of 24:
(1) first of all, it is judged from the root node that 24 is between the root node (15.25), so the left and right subtrees are excluded and looked up from the middle.
(2) then find the root node of the intermediate subtree (18pj22). It is found that 24 is greater than the maximum value of the node, and the left subtree and intermediate subtree are excluded.
(3) find the right subtree and determine that the large value of the node is exactly equal to 24, and the query ends.
Based on the above process, you can summarize the B-tree search:
The main results are as follows: (1) starting from the root node, the binary search is carried out for the keyword (ordered) sequence in the node.
(2) it ends if it is hit, otherwise it enters the child node of the scope to which the query keyword belongs
(3) repeat the above process until the corresponding child node is empty or already a leaf node.
It can be seen that its search performance is equivalent to doing a binary search in the keyword set. From here it seems that there is nothing wrong with B-Tree, but it is important to note that each node in B-Tree stores index keywords and the specific rows of data they represent. In MySQL, database load data is loaded on a page-by-page basis, and the size of each page is fixed (the default is 16k). If each node stores all the values, there will be few nodes that can be saved on a page, and a query may load data from memory multiple times, resulting in poor performance.
B+Tree
B+Tree is a variant of B-Tree that makes it more suitable for indexing externally stored files.
The biggest difference between the two is that each node of the B-Tree stores all the data, while the data that B+Tree needs to store is on the leaf node, and a sequential access pointer is added, and each leaf node has an address pointing to the next adjacent leaf node. This structure ensures that more index nodes can be stored in a memory page and is more suitable for range query.
Indexes
Because the storage engine is responsible for implementing the index, the next discussion of indexes is the MySQL-based InnoDB engine.
Clustering index
Clustering means that data rows and adjacent key values are stored together. Some databases allow you to select a specific index as a clustered index, while the primary key index is directly specified as a clustered index in the implementation of InnoDB. If no primary key is defined, InnoDB selects a unique non-empty index instead of the primary key index. If such an index is also not defined, InnoDB implicitly defines a primary key as the clustered index (row_id).
An example of clustered index is shown in the figure:
Non-clustered index
In InnoDB, except for the primary key index, all other indexes are non-clustered indexes, so they are also called non-primary key indexes. The leaf node of a non-primary key index does not store the value of a row, but the primary key value of a specific row. Does not satisfy the definition of clustering.
An example of a non-clustered index is shown in the figure:
The difference between clustered index and non-clustered index in query
As can be seen from the above two index example diagrams, if the query is queried through the primary key index, the data row is directly queried and returned. However, if you are querying through a non-primary key index, you first need to determine the primary key through the index, and then find the data of a specific row from the primary key index through the resulting primary key. The subsequent process of getting data from the primary key index through the primary key is called back to the table.
The process of returning to the table makes the query through the ordinary index one step more than the primary key index query, and in many cases the efficiency is relatively low. So in our query process, if we can only use the primary key to determine the data, it is best to use the primary key to query directly.
Overlay index
It is described above that there is a process of returning a table through a non-primary key query, but it should be noted that not every query has the step of returning a table. For an ordinary index, its leaf node stores the value of the primary key. What if the data I need now is only the value of the primary key? After fetching the value of the primary key through the ordinary index, there is no need to look it up in the primary key index, so there is no process of returning to the table.
In the above example, the non-primary key index already has the value we need, so the index is also called an override index. The overlay index is not a fixed structure, it can make a single index (the index of a field) or a composite index. Anyone who can directly provide the query results without the need for the table return process can be called an overlay index.
Of course, it's not always good to overwrite index pages, for example, I now have an index index (aforme b). It has been said that the advantage of indexing is that the ab field will not return to the table when querying, but it is impossible to use the b field to query the index. The index entries of the established index are sorted according to the order of the fields that appear in the index definition.
Leftmost prefix principle
Assuming that there is an index index (aformab), if the index can be applied through an and b queries, the index can be applied using a query alone, but it cannot be applied if b is used alone. This is the leftmost prefix principle. When matching an index, it matches the leftmost n fields of the index, and the index can be applied if it can be matched.
Because of the existence of the leftmost prefix principle, we may need to consider more things when building the index.
The first thing to be clear is that the index is a data structure, which consumes storage space when building an index, so the index is not built as much as possible, but the number of indexes should be reduced as much as possible according to the requirements.
The existence of the leftmost prefix principle makes a federated index can be used as multiple indexes, of course, if the order of the fields in the index is designed (in fact, the leftmost prefix principle is not only applicable to the federated index, it is also used for the string index, the leftmost n characters in the string index are equivalent to the leftmost n fields in the federated index).
For example, index (aformab), with this index, we do not need to build an index for a separately, so we generally put the more frequently used fields first when designing a federated index.
Then, the fields with higher differentiation are put forward, and the distinction is the repetition rate of the values in the field, and the lower the repetition rate is, the higher the differentiation is. For example, gender is not suitable as an index, and the more differentiated fields can filter out more rows after one filter.
Then you also need to consider the size of the field, because the index also needs to take up space, so you generally choose smaller fields.
Here are the types or types of indexes in MySQL. If you want to know about other related issues, you can continue to pay attention to our industry information. Our section will capture some industry news and professional knowledge to share with you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.