Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed introduction of MySQL index

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the detailed introduction of the MySQL index, which is very detailed and has a certain reference value. Interested friends must finish reading it!

What is an index?

   in a relational database, an index is a separate, physical storage structure that sorts the values of one or more columns in a database table. It is a logical pointer list of a collection of one or more column values in a table and corresponding data pages that physically identify these values in the table. The function of the index is equivalent to the catalogue of the book, and the content you need can be found quickly according to the page number in the catalog.

   when there are a large number of records in the table, if you want to query the table, the first way to search for information is to search the full table, which takes out all the records one by one, compares them with the query conditions, and then returns the records that meet the conditions. This will consume a lot of database system time and result in a large number of disk Imax O operations. The second is to establish an index in the table, then find the index value that meets the query criteria in the index, and finally quickly find the corresponding record in the table through the ROWID (equivalent to the page number) stored in the index.

The index data structure used by the InnoDB storage engine after    MySQL5.5 is mainly used: breadTree. This article takes B+Tree past life and this life as the main line to talk.

* * Mark**

:

B+Tree can use indexes on =, BETWEEN,IN, and LIKE that do not start with a wildcard. (after MySQL5.5)

   these facts may subvert some of your perceptions, such as in other articles or books you have read. All these belong to the "scope query" and do not go to the index!

   is right, before 5.5, the optimizer would not choose to search through the index. The optimizer thinks that the rows fetched in this way are more than the rows scanned by the whole table, because you have to go back to the table to check it again, and more rows may be involved in Ibind O, which may be abandoned by the optimizer.

After being optimized by algorithm (B+Tree),    supports scanning of some range types (benefit and ordering of B+Tree data structures). This practice also violates the leftmost prefix principle, resulting in the condition after the range query cannot be used in the federated index, which we will explain in more detail later.

Second, the advantages and disadvantages of the index 1. The advantages of the index greatly reduce the amount of data that the server needs to scan. The index can help the server avoid sorting and the temporary table index can change the random I I/O2 O into a sequential index. although the index greatly improves the query speed, it will slow down the speed of updating the table, such as INSERT, UPDATE and DELETE. Because when updating a table, MySQL saves not only the data, but also the index file. Create an index file that takes up disk space. In general, this problem is not serious, but if you create multiple composite indexes on a large table, and with a large amount of data inserted, the index file size will expand rapidly. If a data column contains a lot of duplicate content, indexing it does not have much practical effect. For very small tables, a simple full table scan is more efficient in most cases

   should therefore index only the most frequently queried and sorted data columns. (the total number of indexes in the same data table in MySQL is limited to 16)

One of the meanings of the existence of    database is to solve the problem of data storage and fast search. So where does the data in the database exist? Yes, it's a disk. What are the advantages of a disk? Cheap! What are the disadvantages? Compared to memory access speed.

  , do you know the main data structures used by MySQL indexes?

   B+ tree! You blurted it out.

What kind of data structure is the B + tree of   ? Why did the MySQL index choose the B+ tree?

In fact, the final choice of B+ tree by    has undergone a long evolution:

Binary sort tree → binary balance tree → B-Tree (B tree) → B+Tree (B + tree)

A friend of    asked me, "what's the difference between a B-tree and a B-tree?" Popularize here, MySQL data structure is only B-Tree (B-tree) and B+Tree (B + tree), mostly just pronounce it differently. "B-Tree" is generally referred to as B-tree, you can call him B-tree.

   also has the red-black tree mentioned by my partner, which is a storage structure in the programming language, not MySQL; for example, Java's HashMap uses a linked list plus a red-black tree.

   all right, let's take a look at the evolution into a B + tree today.

3. The past life and present life of B+Tree index. 1. Binary sort tree

Before    understands the B+ tree, simply talk about the binary sort tree. For a node, the child node value of its left subtree is smaller than it itself, and the child node value of its right subtree is greater than it itself. If all nodes meet this condition, then it is a binary sort tree. (you can list the knowledge points of binary search here.)

The image above is a binary sort tree, and you can try to take advantage of its features to experience the process of finding 9:

9 is smaller than 10, the left subtree (node 3) search 9 is larger than 3, the right subtree (node 4) of node 3 is larger than 4, and the right subtree (node 9) of node 4 is equal to node 9.

After a total of 4 comparisons, have you ever thought about the optimization of the above structure?

2. AVL tree (self-balanced binary search tree)

The above picture is an AVL tree, and the number and value of nodes are the same as the binary sort tree.

Let's take a look at the process of finding 9:

9 is bigger than 4, go to its right subtree to find 9 is smaller than 10, go to its left subtree to find node 9 is equal to 9, find success

   compared a total of 3 times, the same amount of data less than the binary sort tree, why? Because the height of the AVL tree is smaller than the binary sort tree, the higher the height means the more the number of comparisons; don't underestimate the optimization this time, if it is 200w pieces of data, the number of comparisons will be significantly different.

   you can imagine a balanced binary tree with 1 million nodes, with a height of 20. A query may need to access 20 blocks. In the era of mechanical hard drives, it took about 10 ms of addressing time to randomly read a block from disk. In other words, for a table with 1 million rows, it may take 20 10 ms to access a single row if you use a binary tree to store it. This query is really slow!

3. B-tree (Balanced Tree) multi-path balanced search tree with multi-forks

B-tree is a multi-path self-balanced search tree, which is similar to an ordinary binary tree, but B-book allows each node to have more child nodes. The schematic diagram of the B tree is as follows:

The characteristics of B-tree:

All key values are distributed in the whole tree any keyword appears in only one node search may be done once at the end of the non-leaf node in the full set of keywords, and the performance is close to the binary search algorithm.

In order to improve efficiency,    should try to reduce the number of disk IZP O. In the actual process, the disk is not strictly read on demand every time, but pre-read every time.

After the    disk reads the required data, it reads more data into memory in order, based on the locality principle stated in computer science:

Because the efficiency of disk sequential reading is very high (no addressing time is needed, only a little rotation time is needed), pre-reading can improve the efficiency of Ibind O for programs with locality. The length of the pre-read is generally an integral multiple of the page. MySQL (using the InnoDB engine by default) manages records as pages, with a default size of 16K per page (which can be modified).

B-Tree uses the computer disk read-ahead mechanism:

Every time    creates a new node, it applies for a page of space, so it only needs to find a node once. Because in practical application, the depth of the node is very small, so the search efficiency is very high.

  , how does the final version of the B+ tree work?

4. B + Tree (B + tree is a variant of B tree and is also a kind of multipath search tree)

You can also see from the figure that the difference between a B + tree and a B tree is:

All keywords are stored in the leaf node, and the non-leaf node does not store the real data, so that the leaf node can be quickly located. A chain pointer is added to all leaf nodes, which means that all values are stored sequentially, and each leaf page has the same distance from the root, which is suitable for lookup range data.

* * therefore, B+Tree can use indexes on =, BETWEEN,IN, and LIKE that do not start with a wildcard. **

Advantages of B + tree:

The number of comparisons is balanced, which reduces the number of Imax O, improves the search speed, and makes the search more stable.

The disk read and write cost of B+ tree is lower, and the query efficiency of B+ tree is more stable.

   should know that every time you create a table, the system will automatically create an ID-based clustered index for you (the above B+ tree) and store all the data; every time you add an index, the database will create an additional index for you (the above B+ tree), the number of fields selected by the index is the number of data indexes stored in each node, note that the index does not store all data.

4. Why did the MySQL index choose the B + tree over the B tree? B+ tree is more suitable for external storage (generally refers to disk storage), because the internal node (non-leaf node) does not store data, so a node can store more internal nodes, and each node can index a larger and more accurate range. In other words, the amount of information of using B+ tree single disk Ipicuro is larger than that of B-tree, and Iscaro is more efficient than B-tree. Mysql is a relational database, which often accesses an index column according to the interval. The chain pointers are established sequentially among the leaf nodes of the B+ tree, which enhances the interval accessibility, so the B+ tree is very friendly to the interval range query on the index column. On the other hand, the key and data of each node of the B-tree are together, so it is impossible to find the interval. Programmer, you should know the index knowledge point 1, back to the table query

For example, you created the name, age index name_age_index, which is used to query data.

Select * from table where name = 'Chen ' and age = 26 position 1 copy the code

   since there are only name and age in the additional index, after hitting the index, the database must also go back to the clustered index to find other data. This is the return table, and this is the one you memorized: use less select *.

2. Index coverage

It is easier to understand when combined with the return table, such as the above name_age_index index, where there are queries.

Select name, age from table where name = 'Chen ' and age = 26 ters1 copy code

   at this time, the field name,age of select can be obtained in the index name_age_index, so there is no need to return to the table, to meet the index coverage, and to return the data in the index directly, which is efficient. It is the first choice for DBA students to optimize.

3. Leftmost prefix principle

The node storage index order of    B+ tree is stored from left to right, so it is natural to satisfy the matching from left to right when matching. Usually when we set up a joint index, that is, when we build an index on multiple fields, I believe that students who have built an index will find that both Oracle and MySQL will let us choose the order of the index. for example, if we want to build a joint index on the three fields of amemery, we can choose the priority we want, a, b, c, or c, a, b, b, and so on. Why does the database let us choose the order of the fields? Isn't it all a joint index of three fields? This leads to the leftmost prefix principle of database indexes.

In our development,    often encounters the problem that there is a federated index for this field, but SQL will not use the index when querying the field. For example, the index abc_index: (aformab abc_index c) is the joint index of the three fields of aforme bdiary c. None of the following sql can hit the index abc_index when executed.

Select * from table where c = '1candidate select * from table where b =' 1' and c = '2accountabilit123 copy code

The following three cases will go to the index:

Select * from table where a = '1candidate select * from table where a =' 1' and b = '2candidate select * from table where a =' 1' and b ='2' and caterpillar 3 copy the code 12345

Can you tell us something from the above two examples?

   Yes, the index abc_index: (a) is only used in three types of queries: (a), (a), and (a). In fact, what is said here is a little ambiguous. In fact, (aMagnec) can also go, but only take the a-field index, not the c-field.

   also has a special case, the following type of an and b will only go index, c will not go.

Select * from table where a ='1' and b >'2' and cantilever 3 copy code

   the above type of sql statements, after an and b finish the index, c is already out of order, so c cannot walk the index, and the optimizer will think that it is not as fast as scanning c fields in the whole table.

* * leftmost prefix: as the name implies, it is the leftmost priority. In the above example, we created an a_b_c multi-column index, which is equivalent to (a) a single-column index, (a) a combination index, and (a) a combination index. **

   therefore, when creating a multi-column index, the most frequently used column in the where clause is placed on the far left, depending on the business requirements.

4. Index push-down optimization

Or index name_age_index, with the following sql

Select * from table where name like 'Chen%' and age > 26th 1 copy code

There are two possible ways to execute this statement:

Hit the name_age_index federated index, query all data that satisfies name starting with "Chen", and then go back to the table to query all satisfied rows. Hit the name_age_index federated index, query all the data that satisfies the name starting with "Chen", then sift out the indexes with age > 20, and then go back to the table to query all rows of data.

Obviously, in the second way, the number of rows of query back to the table is less, and the number of Imax O will also be reduced, which is the index push-down. So not all like misses the index.

Considerations when using the index 1. The index will not contain columns with null values.

   will not be included in the index as long as the column contains a null value, and as long as one column in the composite index contains a null value, then this column is invalid for the composite index. Therefore, when designing the database, we recommend that you do not leave the default value of the field to null.

2. Use short index

   indexes serial columns and should specify a prefix length if possible. For example, if you have a column of char, and if most of the values are unique within the first 10 or 20 characters, do not index the entire column. Short indexes can not only improve the query speed, but also save disk space and Imax O operations.

3. Sort the index column

The    query uses only one index, so if the index is already used in the where clause, the columns in the order by will not use the index. Therefore, do not use the sort operation when the database default sorting can meet the requirements; try not to include multiple column sorting, and it is best to create a composite index on these columns if necessary.

4. Like statement operation

   is generally not recommended to use like operation, if you have to use it, how to use it is also a problem. "like" Chen "will not use an index, while like" Chen "can use an index."

5. Do not operate on the column

This will cause the index to fail and perform a full table scan, such as

SELECT * FROM table_name WHERE YEAR (column_name)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report