What are the reasons why MySQL uses B-tree? 04/27 Update SLTechnology News&Howtos

What are the reasons why MySQL uses B-tree?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the reasons why MySQL uses B-tree. It is very detailed and has certain reference value. Friends who are interested must finish reading it.

Generally speaking, the index itself is so large that it is impossible to store it all in memory, so the index is often stored on disk as an index file. In this way, the disk Imax O consumption will be generated in the index search process, which is several orders of magnitude higher than the memory access. Therefore, the most important index to evaluate the quality of a data structure as an index is the progressive complexity of the number of disk Imax O operations in the search process. In other words, the structure of the index should minimize the number of access to disk Imando O during the lookup process. The following first introduces the principles of memory and disk access, and then combines these principles to analyze the efficiency of B-/+Tree as an index.

Principle of main memory access

At present, the main memory used by computers is basically random read-write memory (RAM). The structure and access principle of modern RAM are more complex. This paper puts aside the specific differences and abstracts a very simple access model to explain the working principle of RAM.

From an abstract point of view, main memory is a matrix composed of a series of storage units, each of which stores data of a fixed size. Each memory cell has a unique address, and the addressing rules of modern main memory are more complex, which is simplified to a two-dimensional address: a row address and a column address can be uniquely located to a storage cell. The figure above shows a 4 x 4 main memory model.

The access process of main memory is as follows:

When the system needs to read the main memory, the address signal is put on the address bus and transmitted to the main memory. After reading the address signal, the main memory parses the signal and locates it to the designated memory unit, and then puts the data of the storage unit on the data bus for other parts to read.

The process of writing the main memory is similar. The system puts the address and data of the writing unit on the address bus and the data bus respectively. The main memory reads the contents of the two buses and does the corresponding write operation.

It can be seen here that the time of main memory access is only linear with the number of access times, because there is no mechanical operation, and the "distance" of the data accessed twice will not have any effect on time, for example, the time consumption of fetching A0 before A1 is the same as fetching A0 before D3.

Disk access principle

As mentioned above, indexes are generally stored on disk in the form of files, and index retrieval requires disk Ihop O operations. Unlike the main memory, there is a mechanical motion cost on disk Imax O, so the time consumption of disk Imax O is huge.

A disk consists of circular disks of the same size and coaxial, and the disk can rotate (each disk must rotate synchronously). There is a magnetic head bracket on one side of the disk, and the magnetic head bracket holds a set of magnetic heads, each of which is responsible for accessing the contents of a disk. The magnetic head cannot rotate, but it can move along the radius of the disk (actually oblique tangential motion), and each head must also be coaxial at the same time, that is, from the top down, all heads overlap at all times (although there is already a multi-head independent technology, but it is not subject to this restriction).

The disk is divided into a series of concentric rings, the center of which is the center of the disk, each concentric ring is called a track, and all tracks with the same radius form a cylinder. The track is divided into small segments along the radius, each segment is called a sector, and each sector is the smallest storage unit of the disk. For simplicity, let's assume that the disk has only one disk and one head.

When the data needs to be read from the disk, the system will transmit the logical address of the data to the disk, and the control circuit of the disk translates the logical address into the physical address according to the addressing logic, that is, to determine which track and sector the data to be read is in. In order to read the data from this sector, the head needs to be placed above the sector. in order to achieve this, the head needs to move to the corresponding track, a process called seek, and the time spent is called seek time. Then the disk rotation rotates the target sector under the head, a process called rotation time.

Locality principle and disk pre-reading

Due to the characteristics of the storage medium, the access of the disk itself is much slower than the main memory, coupled with the cost of mechanical movement, the access speed of the disk is often one percentile of the main memory, so in order to improve efficiency, it is necessary to reduce the disk Imax O as much as possible. In order to achieve this goal, the disk is often not strictly read on demand, but will be pre-read every time, even if only one byte is needed, the disk will start from this position and read a certain length of data back into memory in order. The theoretical basis for this is the famous locality principle in computer science:

When a data is used, the data near it is usually used immediately.

Therefore, the data needed during the running of the program should usually be more centralized.

Because of the high efficiency of disk sequential reading (no seek time, only a little rotation time), pre-reading can improve the efficiency of Ibind O for programs with locality.

The length of the pre-read is generally an integral multiple of the page. Pages are logical blocks of computer management memory. Hardware and operating systems often divide main memory and disk storage into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data on a page-by-page basis. When the data to be read by the program is not in the main memory, a page fault exception will be triggered, and the system will send a read signal to the disk, which will find the starting position of the data and continuously read one or more pages back into memory, and then return with the exception, and the program continues to run.

Performance Analysis of B-/+Tree Index

At this point, you can finally analyze the performance of B-/+Tree indexes.

As mentioned above, disk Ibind O is generally used to evaluate the quality of the index structure. First from the B-Tree analysis, according to the definition of B-Tree, we can know that a search requires a maximum of h nodes. The designers of the database system skillfully make use of the principle of disk pre-reading, setting the size of a node to equal to one page, so that each node can be fully loaded only once. To achieve this, you need to use the following techniques in the actual implementation of B-Tree:

Each time you create a new node, you directly apply for a page of space, which ensures that a node is physically stored in a page. In addition, the computer storage allocation is page-aligned, so that a node needs only one Imax O.

One search in B-Tree requires at most one search in hmurl, I _ max O (root node resident memory), and the asymptotic complexity is (ℎ) = ().

In general practical applications, the output degree d is a very large number, usually more than 100, so h is very small (usually no more than 3). (h represents the height of the tree & the degree d represents the degree of the tree, that is, the maximum degree of each node in the tree)

To sum up, it is very efficient to use B-Tree as an index structure.

On the other hand, the structure of the red-black tree is obviously much deeper. Because the logically close nodes (father and son) may be physically far away and can not make use of the locality, the asymptotic complexity of the red-black tree is also O (h), which is obviously much lower than that of B-Tree.

As mentioned above, B+Tree is more suitable for out-of-memory indexes because of the degree d of the inner node. From the above analysis, we can see that the larger d, the better the performance of the index, and the upper limit of the output depends on the size of key and data in the node:

= (/ (+))

Floor indicates rounding down. Because the nodes in the B+Tree get rid of the data domain, they can have greater output and better performance.

In MySQL, the index belongs to the concept of storage engine level, and different storage engines implement the index differently. This paper mainly discusses the index implementation of MyISAM and InnoDB storage engines.

MyISAM non-clustered index

The MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record.

Here, the table has a total of three columns. Assuming that we use Col1 as the primary key, the figure above shows the Primary key of a MyISAM table. You can see that MyISAM's index file only holds the address of the data record. In MyISAM, there is no structural difference between the primary index and the secondary index (Secondary key), except that the primary index requires that the key is unique, while the key of the secondary index can be repeated.

It is also a B + tree, the address where the data field holds data records. Therefore, the index retrieval algorithm in MyISAM is to first search the index according to the B+Tree search algorithm, take out the value of its data field if the specified Key exists, and then read the corresponding data record with the value of the data field as the address.

The indexing method of MyISAM is also called "nonclustered", which is called to distinguish it from InnoDB's clustered index.

InnoDB index implementation

Although InnoDB also uses B+Tree as the index structure, the implementation is quite different from that of MyISAM.

The first major difference is that InnoDB's data file itself is an index file. As you know from the above, the MyISAM index file and the data file are separate, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+Tree, and the leaf node data field of this tree holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Primary index (Primary Key)

The InnoDB main index (which is also a data file) can see that the leaf node contains the complete data record. This kind of index is called clustered index. Because the data files of InnoDB are aggregated by the primary key, InnoDB requires that the table must have a primary key (MyISAM can not be explicitly specified). If it is not explicitly specified, the MySQL system will automatically select a column that can uniquely identify the data record as the primary key. If such a column does not exist, MySQL automatically generates an implied field as the primary key for the InnoDB table. This field is 6 bytes long and the type is long integer.

Secondary index (Secondary Key)

The second difference from the MyISAM index is that the InnoDB secondary index data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data field.

Here, the ASCII code of English characters is used as the comparison criterion. The implementation of clustered index makes the search by primary key very efficient, but the secondary index search needs to retrieve the index twice: first retrieve the secondary index to obtain the primary key, and then use the primary key to retrieve the record in the primary index.

Understanding the index implementation of different storage engines is very helpful for the correct use and optimization of indexes. For example, after knowing the index implementation of InnoDB, it is easy to understand why it is not recommended to use overly long fields as primary keys, because all secondary indexes reference the primary index, and an overlong primary index will make the secondary index too large.

For example, using non-monotonous fields as primary keys in InnoDB is not a good idea, because the InnoDB data file itself is a B+Tree, and non-monotonous primary keys will cause frequent splits and adjustments of the data file in order to maintain the characteristics of B+Tree when inserting new records, which is very inefficient, while using self-increasing fields as primary keys is a good choice.

These are all the contents of the article "Why MySQL uses B-tree". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.