Index module of database 07/06 Update SLTechnology News&Howtos

Index module of database

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In addition to being one of the most important modules in the database, the index module is also the most frequently asked in the interview. Common questions about the index module are as follows:

Why use an index what kind of information can be the data structure of an index the difference between a dense index and a sparse index

Why use an index:

The smallest storage unit in a database is usually a block or page, and each block contains multiple rows of data. When we query some data that does not use the index, we usually need to do a full table scan, that is, we need to load all the blocks, and then traverse the blocks one by one until we find the data we need to find. It is conceivable that the efficiency of this query method is relatively slow when the amount of data is relatively large, so we often need to avoid full table scanning. However, database designers have long taken this into account, so they have introduced a more efficient query mechanism, even using indexes. The inspiration of the index comes from the dictionary, we all know that the dictionary will record some key information, such as the partial radical pinyin, through which we can quickly find the page where the word is located. The same is true of the index, the database can quickly locate the location of the target data through the key information recorded by the index, and the occurrence of full table scanning can be avoided. So the purpose of using indexes is to make queries more efficient.

What kind of information can be indexed:

Primary key id, unique fields, and fields that are frequently used as query conditions. If multiple fields are frequently used as query conditions at the same time, a combined index of these fields can be established.

Data structure of the index:

Usually B + trees, Hash, and BitMap supported by a few databases

Binary search tree

Next, let's briefly talk about the data structure of the index. we all know that the most commonly used data structure of the index is the B + tree. before introducing what the B + tree is, we must first understand the binary search tree and the B tree. And briefly explain why there is no binary tree or B-tree as the index data structure.

Now we know that the purpose of indexing fields is to help us quickly locate the location of the target data, if we are asked to design the index ourselves, for quick search, you may immediately think of tree data structures such as binary search trees. So this section first introduces the binary search tree and step by step to understand why the B+ tree is used as the indexed data structure in many tree structures.

The binary search tree is a commonly used tree data structure. Each node of the binary search tree has at most two left and right child nodes, which become the left child tree and the right child tree respectively. Usually, the element of the left child tree is smaller than its parent node, while the right child tree is larger than its parent node. The top node is usually called the root node, and the binary search tree search algorithm is binary search. The following picture shows a balanced binary tree. The so-called balanced binary tree means that the height difference between the left and right nodes at the end is no more than 1:

Because the binary search tree can only have two nodes at most at the same level, and the disk IO is not optimized, because each IO read can only read two nodes, so it can not achieve the ideal query speed and can not be used as the data structure of the index.

B tree

Since the binary tree can only read two nodes at a time without optimizing the disk IO, and there are only two search paths left and right, the depth of the tree will increase with the increasing amount of data. At this time, you need to find a multi-path tree structure in which each level can have multiple nodes. B-tree is also called multi-path balanced search tree. Its general structure is as follows:

There are m nodes in the same layer, which is usually called m-order. A m-order B-tree (balanced tree of order m) is a balanced m-path search tree. It is either an empty tree or a tree that satisfies the following properties:

The root node has at least two child nodes. Each node contains at most m child nodes (m > = 2). Except for the root node and the leaf node, every other node has at least ceil (m). All the leaf nodes are in the same layer, assuming that each non-terminal node contains n keyword information, where: Ki (iNode 1.. n) is the keyword. And the keywords are sorted in ascending order K (iMur1) < the number of Ki keywords n must satisfy: [ceil (m / 2)-1]

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.