What are the relevant knowledge points of MySQL index 07/04 Update SLTechnology News&Howtos

What are the relevant knowledge points of MySQL index

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what are the relevant knowledge points of the MySQL index?" in the operation of actual cases, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Index introduction

What's the index?

Officially, the index is a data structure that helps MySQL obtain data efficiently. More generally, the database index is like the directory in front of a book, which can speed up the query speed of the database.

Generally speaking, the index itself is too large to be stored in memory, so the index is often stored in a file on disk (either in a separate index file or in a data file with the data).

The indexes we usually refer to, including clustered index, overlay index, composite index, prefix index, unique index, etc., are all organized by B+ tree structure (multi-search tree, not necessarily binary) by default.

Advantages and disadvantages of index

Advantages:

It can improve the efficiency of data retrieval and reduce the IO cost of the database, similar to the catalogue of books.

The data is sorted by index column, which reduces the cost of data sorting and the consumption of CPU.

The indexed columns are sorted automatically, including [single column index] and [combined index], but the sorting of the combined index is more complex.

If you sort according to the order of the index columns, you will be much more efficient corresponding to the order by statement.

Disadvantages:

The index takes up disk space.

Although the index will improve the query efficiency, it will reduce the efficiency of updating the table. For example, every time you add, delete or modify a table, MySQL should not only save the data, but also save or update the corresponding index file.

Index type

Primary key index

The value in the index column must be unique and no null values are allowed.

General index

The basic index type in MySQL, with no restrictions, allows duplicate and null values to be inserted in the columns that define the index.

Unique index

The value in the index column must be unique, but null values are allowed.

Full-text index

A full-text index can only be created on a text type CHAR,VARCHAR,TEXT type field. When the field length is relatively large, if you create a normal index, it is less efficient to perform a like fuzzy query, and you can create a full-text index. Full-text indexing is available in both MyISAM and InnoDB.

Spatial index

MySQL after 5.7supports spatial indexing and the OpenGIS geometric data model. MySQL follows the OpenGIS geometric data model rules in terms of spatial indexing.

Prefix index

When you create an index on a text type such as a CHAR,VARCHAR,TEXT class column, you can specify the length of the index column, but the numeric type cannot be specified.

Others (classified by the number of index columns)

Single column index

Combinatorial index

The use of combined indexes needs to follow the leftmost prefix matching principle (leftmost matching principle). In general, composite indexes are used instead of multiple single-column indexes if conditions permit.

Data structure of index

Hash table

Hash table, HashMap,TreeMap in Java is the Hash table structure, which stores data in the form of key-value pairs. We use Hash tables to store table data Key can store index columns, and Value can store row records or row disk addresses. Hash table is very efficient in equivalent query, and the time complexity is O (1). However, fast range search is not supported, and range lookup can only be done by scanning the whole table.

Obviously, this is not suitable for database indexes that often require lookups and range lookups.

Binary search tree

Binary tree, I think everyone will have a picture in mind.

The characteristics of binary tree: each node has at most 2 bifurcations, and the data order of left subtree and right subtree is small and right.

This feature is to ensure that each search can be halved and reduce the number of IO, but the binary tree is very testing the value of the first root node, because it is easy to occur in this feature we want to happen concurrently "the tree does not fork", which is very uncomfortable and unstable.

Obviously, if this situation is unstable, if we choose to design, we will inevitably avoid this situation.

Balanced binary tree

The balanced binary tree adopts the thinking of dichotomy. In addition to the characteristics of the binary tree, the most important feature of the balanced binary tree is that the level difference between the left and right subtrees of the tree is up to 1. When inserting and deleting data, the binary tree is balanced by left / right rotation operation, so that the left subtree is very high and the right subtree is very short.

The performance of the balanced binary search tree query is close to that of the binary search method, and the time complexity is O (log2n). Querying id=6 requires only two IO.

In terms of this feature, you may think that this is very good and can achieve the ideal situation of binary tree. However, there are still some problems:

Time complexity is related to tree height. The tree needs to be retrieved as many times as it is tall, and each node's read corresponds to a disk IO operation. The height of the tree equals the number of disk IO operations each time the data is queried. Each seek time of the disk is 10ms, and the query performance will be very poor when the amount of data in the table is large. (1 million data volume, log2n equals about 20 disk IO, time 20: 10: 0.2s)

Balanced binary tree does not support fast search of range query, which needs to be traversed many times from the root node, so the query efficiency is not high.

B-tree: transform binary tree

The data of MySQL is stored in disk files. When querying and processing data, you need to load the data in disk into memory first. Disk IO operation is very time-consuming, so the focus of our optimization is to reduce disk IO operations as much as possible. IO occurs once for every node that accesses the binary tree, and if you want to reduce disk IO operations, you need to reduce the height of the tree as much as possible. So how to reduce the height of the tree?

If key is bigint=8 bytes, each node has two pointers, each 4 bytes, and a node occupies 16 bytes of space (8 bytes 4 bytes 2 bytes 16).

Because in MySQL's InnoDB storage engine, the amount of data per page (default is 16K by default) is read by IO at a time, while the effective data amount of IO in binary tree is only 16 bytes, so the space utilization is very low. To maximize the use of IO space at a time, a simple idea is to store multiple elements in each node and as much data as possible in each node. Each node can store 1000 indexes (16k/16=1000), so the binary tree is transformed into a multi-tree, and the tree is changed from tall and thin to short and fat by increasing the cross tree of the tree. To build 1 million pieces of data, the height of the tree only needs 2 layers (1000 million to 1000 million), that is to say, it only takes 2 disk IO to query the data. As the number of disk IO decreases, the efficiency of querying data is improved.

This data structure is called B-tree, which is a multi-forked balanced search tree, such as the main features of the following figure:

Multiple elements are stored in the nodes of the B-tree, and each node has multiple bifurcations.

The elements in the node contain key values and data, and the key values in the node are arranged from large to small. That is, data is stored on all nodes.

Elements in the parent node do not appear in the child node.

All the leaf nodes are on the same layer, the leaf nodes have the same depth, and there is no pointer connection between the leaf nodes.

For example, when querying data in a b-tree:

If we look up data with a value equal to 10. Query path disk block 1-> disk block 2-> disk block 5.

First disk IO: load disk block 1 into memory and walk through it from scratch in memory, 10 disk block 6.

First disk IO: load disk block 1 into memory and walk through it from scratch in memory, 9 disk block 7.

First look for data with a value equal to 9, and cache the data with a value equal to 9 to the result set. This step is the same as the previous equivalence query process, where disk IO occurs three times.

After finding 15, the underlying leaf node is an ordered list, and we start traversing backwards from disk block 6 and key value 9 to filter all the data that meets the filter criteria.

The fourth disk IO: locate disk block 7 according to disk 6 successor pointer to disk, load disk 7 into memory, traverse disk from scratch in memory, 9

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.