In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Xiaobian to share with you why MySQL to use the index, I believe most people do not know how, so share this article for everyone's reference, I hope you have a lot of harvest after reading this article, let's go to understand it together!
What's an index?
MySQL's official definition of an index is that an index is a data structure that helps MySQL obtain data efficiently.
In addition to data, database systems maintain data structures that satisfy specific lookup algorithms that reference (point to) data in a way that allows advanced lookup algorithms to be implemented on top of those data structures. This data structure is called index.
Indexing was created to improve query efficiency, just like book catalogues. In fact, to put it bluntly, the index to solve the query problem.
Query is an important function provided by the database. We all want to get the target data as quickly as possible, so we need to optimize the query algorithm of the database and choose the appropriate query model to realize the index.
In addition, indexing tables comes at a cost: it increases the storage space of the database and takes more time to insert and modify data because the index changes accordingly.
Common Query Models
There are many implementation models of index, here we first understand the commonly used query model.
order array
A sequential array is a special array in which the elements are arranged in a certain order.
Sequential arrays have certain advantages in queries, because they are ordered data, and the time complexity of binary search is O(log(N)).
The advantage of sequential arrays is that they are very efficient, but it is troublesome to update the data. Deleting and inserting elements involves a lot of movement of element positions, which is costly.
Therefore, for domains where sequential arrays are better suited for queries, it is appropriate to store some static storage engine with minor changes.
hash index
A hash table is a structure that stores data in key-value terms. We simply enter the value we want to look up (key) and find its corresponding value (value).
Hash index uses a certain hash algorithm. For each row, the storage engine calculates the hash code of the indexed field, stores the hash code in the index, and stores a pointer to each row in the hash table.
In this way, only one hash algorithm is needed to locate the corresponding position immediately during retrieval, and the speed is very fast.
Hash index structure particularity, its retrieval efficiency is very high, should be O(1) time complexity.
Although Hash index is efficient, Hash index itself has many limitations and disadvantages due to its particularity, mainly the following:
Hash index can only satisfy =, IN and query, if it is range query retrieval, then hash index is useless.
Because the original ordered key value, after the hash algorithm, may become discontinuous, there is no way to use the index to complete the range query retrieval;
2, Hash index can not use the index to complete the sorting, because the storage time is after Hash calculation, the calculated Hash value and the original data are not necessarily equal, so it cannot be sorted;
In union index, Hash index cannot be queried by partial index key.
Hash index calculates hash value by combining index keys and then calculating Hash value together, instead of calculating Hash value separately.
So for multiple columns in a union index, Hash is either all used or none. Hash indexes cannot be used when querying through one or more of the previous index keys.
Hash indexing cannot avoid table scanning at any time.
As already known above, Hash index is to store the Hash value of the Hash operation result and the corresponding row pointer information in a Hash table after the index key is subjected to Hash operation. Since different index keys may have the same Hash value, even if the number of records satisfying a certain Hash key value is taken, it is impossible to directly complete the query from the Hash index, and it is still necessary to perform corresponding comparison by accessing the actual data in the table and obtain the corresponding result.
In the case of a large number of duplicate key values, the efficiency of hash index is also extremely low, because there is a so-called hash collision problem.
In summary, hash table structures are useful for equal-value queries only, such as Memcached, redis, and other NoSQL engines.
binary search tree index
Each node of the binary search tree stores only one key, and all nodes in the left subtree (if any) have values less than the value of the root node, and all nodes in the right subtree (if any) have values greater than the value of the root node.
When the number of nodes in the left and right subtrees of all non-leaf nodes of a binary search tree remains the same (equilibrium), then the search performance of the tree approaches binary search; and it has the advantage over binary search in continuous memory space that changing the tree structure (inserting and deleting nodes) does not require moving large blocks of memory data, and usually has a constant overhead.
In some cases, a binary tree is called a balanced binary tree when the height difference between the left and right subtrees of the root node is no more than 1; in contrast, a binary search tree may degenerate into a linear tree.
The following figure shows one possible indexing method. On the left is the data table, which has two columns and seven records in total. The leftmost one is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on disk).
To speed up Col2 lookup, we can maintain a binary lookup tree shown on the right, each node containing an index key and a pointer to the physical address of the corresponding data record, so that we can use binary lookup to retrieve the corresponding data in O(log2n) complexity.
B-tree
It can be seen that binary tree achieves a balance between query and modification, and has good efficiency, but the reality is that few database engines use binary tree to implement index. Why?
Most of the database storage is not suitable for binary tree, when the amount of data is large, the tree height will be too high.
You can imagine a balanced binary tree with 1 million nodes. The tree is 20 high. Each leaf node is a block. Each block contains two pieces of data. The blocks are linked by chains.
With a tree height of 20, you have to traverse 20 blocks to get to the target data, which is very time-consuming when the index is stored on disk.
Therefore, in order to reduce disk reads, queries should traverse as few data blocks as possible, so N-ary trees are generally used.
There is a B-tree (Balanced Tree).
What exactly is a B-tree?
Let's start with an example:
As you can easily see from the diagram above, if an inner node x contains n[x] keywords, then x will contain n[x]+1 children. For example, an inner node with two keywords D H has three children, while an inner node with three keywords Q T X has four children.
Properties of B-trees
Popularize some concepts:
Degree of a node: The number of subtrees a node contains is called the degree of the node.
Degree of tree: the degree of the largest node in a tree is called the degree of tree;
leaf node or terminal node: node with zero degree;
Non-terminal node or branch node: a node with a degree other than zero;
First, two variables are defined: d is a positive integer greater than 1, called the degree of the B-tree. h is a positive integer called the height of the B-tree.
A B-tree is a data structure that satisfies the following conditions:
Each non-leaf node consists of n-1 keys and n pointers, where d
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.