What are the knowledge points of MySQL index 07/09 Update SLTechnology News&Howtos

What are the knowledge points of MySQL index

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "MySQL index knowledge points", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "MySQL index knowledge points"!

Mysql Index What is an index?

An index is a data structure that helps MySQL retrieve data efficiently. More popularly, database indexes are like tables of contents in front of a book, speeding up database queries.

Generally speaking, the index itself is also very large, and it is impossible to store it all in memory, so the index is often stored in a file on disk (which may be stored in a separate index file or may be stored with data in a data file).

The indexes we usually call, including clustered indexes, coverage indexes, composite indexes, prefix indexes, unique indexes, etc., are not specifically described, and the default is to use B+ tree structure organization (multi-way search tree, not necessarily binary) indexes.

Advantages and disadvantages of indexing

Advantages:

It can improve the efficiency of data retrieval and reduce the IO cost of database, similar to the catalogue of books.

Sorting data through index columns reduces the cost of data sorting and CPU consumption.

Indexed columns will be sorted automatically, including Single Column Index and Combined Index, but the sorting of Combined Index is more complicated.

If you sort the index columns in order, you'll be much more efficient with the corresponding order by statement.

Disadvantages:

Indexing takes up disk space

Indexing improves query efficiency, but reduces the efficiency of updating tables. For example, every time a table is added, deleted or modified, MySQL not only saves the data, but also saves or updates the corresponding index file.

Index Type Primary Key Index

Values in index columns must be unique; null values are not allowed.

general index

MySQL basic index type, there are no restrictions, allowing duplicate values and null values in the column defining the index.

unique index

Values in index columns must be unique, but null values are allowed.

full-text index

Full-text indexes can only be created on fields of text type CHAR,VARCHAR,TEXT. When the field length is relatively large, if you create a normal index, it is less efficient to perform like fuzzy queries. In this case, you can create a full-text index. Full-text indexing can be used in both MyISAM and InnoDB.

spatial index

MySQL versions after 5.7 support spatial indexing and OpenGIS geometric data models. MySQL follows OpenGIS geometric data model rules for spatial indexing.

prefix index

When creating indexes on columns of text types such as CHAR,VARCHAR,TEXT, you can specify the length of the index column, but you cannot specify numeric types.

Other (sorted by number of index columns)

single-column index

combinatorial index

The use of composite index, need to follow the leftmost prefix matching principle (leftmost matching principle). In general, composite indexes are used instead of multiple single-column indexes when conditions permit.

Indexed data structure Hash table

Hash table, in Java HashMap, TreeMap is a Hash table structure, in the form of key-value pairs to store data. We use Hash tables to store table data Key can store index columns, Value can store row records or row disk addresses. Hash table is efficient in equivalent query, and its time complexity is O(1); however, it does not support fast range lookup, and range lookup can only be done by scanning the whole table.

Obviously this is not suitable for use as a database index where lookups and range lookups are frequently required.

binary search tree

Binary tree, I think everyone will have a picture in mind.

Binary tree characteristics: each node has at most 2 branches, left subtree and right subtree data order left small right large.

This feature is to ensure that each search can be halved and reduce the number of IO, but the binary tree is very testing the value of the first root node, because it is easy to appear in this feature we want to happen concurrently "tree does not fork", which is very uncomfortable and unstable.

Obviously, this is an unstable situation, and we're going to choose a design that's going to avoid this.

balanced binary tree

Balanced binary tree is a dichotomy thinking, balanced binary search tree in addition to the characteristics of binary trees, the most important feature is that the tree's left and right sub-tree hierarchy difference of at most 1. When inserting and deleting data, the balance of binary tree is maintained by left-hand/right-hand operation, and the left subtree is very high and the right subtree is very short.

The query performance using balanced binary search tree is close to binary search method, and the time complexity is O(log2n). Query id=6 requires only two IOs.

On this feature, you may feel that this is very good, you can achieve the ideal situation of binary tree. However, some problems remain:

Time complexity is related to tree height. The tree height depends on how many times it needs to be retrieved, and each node read corresponds to a disk IO operation. The height of the tree is equal to the number of disk IO operations per query. Disk seek times are 10ms per seek, and query performance is poor when table data is large. (1 million data, log2n equals approximately 20 disk IOs, time 20*10=0.2s)

Balanced binary tree does not support range query fast lookup, range query needs to traverse from the root node many times, query efficiency is not high.

B-tree: a modified binary tree

MySQL data is stored in disk files. When querying and processing data, you need to load the data in disk into memory first. Disk IO operations are very time-consuming, so our optimization focus is to minimize disk IO operations. Each node of the binary tree is accessed once, and if you want to reduce disk IO operations, you need to reduce the height of the tree as much as possible. How to reduce the height of trees?

If key is bigint=8 bytes, each node has two pointers, each pointer is 4 bytes, and a node occupies 16 bytes of space (8+4*2=16).

Because MySQL's InnoDB storage engine reads one page of data at a time (the default page is 16K), while the binary tree has only 16 bytes of valid data at a time, space utilization is extremely low. To maximize the use of IO space at a time, a simple idea is to store multiple elements per node and store as much data as possible per node. Each node can store 1000 indexes (16k/16=1000), thus transforming the binary tree into a multi-fork tree. By increasing the fork tree of the tree, the tree is changed from tall and thin to short and fat. To build 1 million pieces of data, the height of the tree only needs 2 layers (1000*1000=1 million), which means that only 2 disk IOs are needed to query the data. With fewer disk IOs, querying data is more efficient.

This data structure is called B tree, B tree is a multi-fork balanced search tree, as shown in the following main features:

B-tree nodes store multiple elements, and each inner node has multiple branches.

The elements in a node contain key values and data, and the key values in the node are arranged in descending order. That is, data is stored at all nodes.

Elements in parent nodes do not appear in child nodes.

All leaf nodes are at the same level, leaf nodes have the same depth, and there are no pointer connections between leaf nodes.

For example, querying data in a b-tree:

Let's say we query data for values equal to 10. Query path Disk Block 1-> Disk Block 2-> Disk Block 5.

First disk IO: Load disk block 1 into memory, traverse comparison from the beginning in memory, 10 disk block 6.

First disk IO: Load disk block 1 into memory, traverse comparison from the beginning in memory, 9 disk block 7.

First look for data with a value equal to 9 and cache the data with a value equal to 9 into the result set. This step is the same as the previous equivalent query process, and three disk IOs occur.

After finding 15, the underlying leaf node is an ordered list, and we traverse backwards from disk block 6, key 9 to filter all data that meets the filter criteria.

Fourth disk IO: Address to disk block 7 according to disk 6 successor pointer to disk, load disk 7 into memory, traverse comparison from beginning in memory, 9

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.