In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the principle analysis of B + Tree in MySQL, which is very detailed and has certain reference value. Friends who are interested must finish reading it!
Preface
Since the most important data structure in the MySQL index is the B + tree, let's talk about the principle of the B + tree first.
B + Tree principle
1. Data structure
B Tree refers to Balance Tree, that is, a balanced tree. The balanced tree is a search tree, and all leaf nodes are on the same layer.
B + Tree is implemented based on B Tree and leaf node sequential access pointers. It has the balance of B Tree and improves the performance of interval query through sequential access pointers.
In B + Tree, the key in a node is arranged non-decreasing from left to right. If the left and right adjacent key of a pointer are keyi and keyi+1, respectively, and not null, then all the key of the pointer to the node is greater than or equal to keyi and less than or equal to keyi+1.
two。 Operation
When carrying out the search operation, we first do a binary search in the root node to find a pointer where the key is located, and then recursively search in the node the pointer points to. Until you find the leaf node, then do a binary search on the leaf node to find the data corresponding to key.
The insert and delete operation will destroy the balance of the balanced tree, so after the insert and delete operation, the tree needs to be split, merged, rotated and other operations to maintain the balance.
3. Comparison with red and black trees
Balanced trees such as red and black trees can also be used to implement indexes, but B + Tree is commonly used as the index structure in file systems and database systems for the following two reasons:
(1) fewer searches
The time complexity of the balanced tree search operation is equal to the tree height h, while the tree height is approximately O (h) = O (logdN), where d is the output of each node.
The output degree of the red-black tree is 2, and the output degree of the B + Tree is generally very large, so the height h of the red-black tree is obviously much larger than that of the B + Tree, and the search times are more.
(2) make use of the pre-reading characteristics of the disk
In order to reduce the disk Imax O, the disk is often not strictly read on demand, but pre-read every time. In the process of pre-reading, the disk is read sequentially, the sequential reading does not need to seek the disk, and only needs a very short rotation time, the speed will be very fast.
The operating system generally divides memory and disk into solid-sized blocks, each of which is called a page, and the memory and disk exchange data on a page-by-page basis. The database system sets the size of one node of the index to the size of the page, so that one node can be fully loaded at a time. And the pre-reading feature can be used, and the adjacent nodes can also be preloaded.
MySQL index
Indexing is implemented at the storage engine layer, not at the server layer, so different storage engines have different index types and implementations.
1. B+Tree index
Is the default index type for most MySQL storage engines.
Because you no longer need to do a full table scan, you only need to search the tree, so the search speed is much faster.
In addition to being used for finding, it can also be used for sorting and grouping.
You can specify multiple columns as index columns, and multiple index columns make up the key.
It is suitable for full key value, key value range and key prefix lookup, where key prefix lookup is only applicable to leftmost prefix lookup. The index cannot be used if the lookup is not in the order of the index columns.
The B+Tree index of InnoDB is divided into primary index and secondary index. The leaf node data domain of the main index records the complete data record, which is called clustered index. Because data rows cannot be stored in two different places, a table can have only one clustered index.
The data domain of the leaf node of the secondary index records the value of the primary key, so when using the secondary index for lookup, you need to find the primary key value first, and then look in the primary index.
two。 Hash indexing
The hash index can be searched in O (1) time, but it loses its order: it can not be used for sorting and grouping; it only supports precise search, but can not be used for partial search and range search. The InnoDB storage engine has a special feature called "adaptive hash index". When an index value is used very frequently, another hash index is created on top of the B+Tree index, so that the B+Tree index has some of the advantages of a hash index, such as fast hash lookup.
3. Full-text index
The MyISAM storage engine supports full-text indexing, which is used to find keywords in text, rather than directly comparing equality.
The lookup criteria use MATCH AGAINST instead of normal WHERE.
Full-text indexing is implemented using an inverted index, which records the mapping of keywords to the document in which they are located.
The InnoDB storage engine also began to support full-text indexing in MySQL version 5.6.4.
4. Spatial data index
The MyISAM storage engine supports spatial data indexing (R-Tree), which can be used for geographic data storage. The spatial data index indexes the data from all dimensions and can effectively use any dimension for combined queries. You must use GIS-related functions to maintain the data.
Index optimization
1. Independent column
When making a query, the index column cannot be part of an expression or an argument to a function, otherwise the index cannot be used. For example, the following query cannot use the index of the actor_id column:
SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5
two。 Multi-column index
When you need to use multiple columns as criteria for a query, using multiple column indexes performs better than using multiple single column indexes. For example, in the following statement, it is best to set actor_id and film_id to multi-column indexes.
SELECT film_id, actor_id FROM sakila.film_actor WHERE actor_id = 1 AND film_id = 1
3. Order of index columns
Put the most selective index column first.
The selectivity of the index refers to the ratio of the index value that is not repeated to the total number of records. The maximum value is 1, where each record has a unique index corresponding to it. The higher the selectivity, the higher the query efficiency.
For example, in the results shown below, customer_id is more selective than staff_id, so it is best to put the customer_id column in front of the multi-column index.
SELECT COUNT (DISTINCT staff_id) / COUNT (*) AS staff_id_selectivity,COUNT (DISTINCT customer_id) / COUNT (*) AS customer_id_selectivity,COUNT (*) FROM payment;staff_id_selectivity 0.0001customer_id_selectivity: 0.0373 COUNT (*): 16049
4. Prefix index
For columns of type BLOB, TEXT, and VARCHAR, you must use a prefix index, indexing only the first portion of the characters.
The selection of prefix length needs to be determined according to index selectivity.
5. Overlay index
The index contains the values of all fields that need to be queried.
It has the following advantages:
Indexes are usually much smaller than the size of data rows, and read-only indexes can greatly reduce the amount of data access.
Some storage engines, such as MyISAM, only cache indexes in memory, while data is cached by the operating system. Therefore, accessing only the index can be done without using system calls (which are usually time-consuming).
For the InnoDB engine, if the secondary index can override the query, there is no need to access the primary index.
6. Leftmost prefix principle
As the name implies, it is the leftmost priority, with the leftmost as the starting point, any consecutive index can match.
The nature of the federated index:
When creating a joint index, it is equivalent to creating (a) a single-column index, (a) a single-column index, (a) a joint index, and (a) the joint index. If you want the index to take effect, you can only use the combination of an and ameme b and ameme bmemc.
Advantages of indexing
Greatly reduces the number of rows of data that the server needs to scan.
Help the server avoid sorting and grouping, and avoid creating temporary tables (B+Tree indexes are ordered and can be used for ORDER BY and GROUP BY operations. Temporary tables are mainly created during sorting and grouping, because there is no need for sorting and grouping, so there is no need to create temporary tables).
Change the random B+Tree O to the sequential I hand O (the index is ordered and the adjacent data is stored together).
Conditions for the use of indexes
For very small tables, simple full table scans are more efficient than indexing in most cases
For medium to large tables, the index is very efficient
But for very large tables, the cost of establishing and maintaining indexes will increase. In this case, you need to use a technique that can directly distinguish the set of data that needs to be queried, rather than matching record by record, for example, you can use partitioning techniques.
The above is all the content of this article "Analysis of the principle of B + Tree in MySQL". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.