What are the characteristics of the index of the InnoDB table and what is the organization of the index? 07/12 Update SLTechnology News&Howtos

What are the characteristics of the index of the InnoDB table and what is the organization of the index?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Characteristics of InnoDB clustered index.

We know that the clustered index of the InnoDB engine organizes the table, and there must be a clustered index.

Row data (row data) is stored in the leaf node of the clustered index (except for the columns where overflow occurs, see, hereinafter referred to as "pretext"), and the relative order in which it is stored depends on the order of the clustered index. The relative order rather than the physical order is said here because the physical order and the relative order of the row data in the leaf node data page may not be the same, which will be discussed later.

The order in which InnoDB clustered indexes are selected is as follows:

If there is an explicitly defined primary key (PRIMARY KEY), the primary key is selected as the clustered index

Otherwise, selecting the first all columns is not allowed to be the unique index of NULL

If there is none of the first two, InnoDB chooses the built-in DB_ROW_ID as the clustered index, named GEN_CLUST_INDEX

Special reminder: DB_ROW_ID occupies 6 bytes, increases itself each time, and is allocated globally within the entire instance. That is, if the current instance has multiple tables that use the built-in DB_ROW_ID as the clustered index, their built-in DB_ROW_ ID values are not contiguous but hopping when these tables insert new data. Like this:

ROW_ID:1, 3, 7, 10 of T1 table

ROW_ID:2, 4, 5, 6, 8, 9 of T2 table

2. InnoDB index structure

The default index data structure of InnoDB is B + tree (R tree is used for spatial index), and the index data is stored in leaf node.

The basic storage unit of InnoDB is the data page (page), and a page defaults to 16KB. As we said in the previous article, by default, each page will reserve 1x16 free space for subsequent data "lengthening" updates, so it produces the least fragmentation in the most ideal sequential insertion state, which can almost fill 15x16 page space. If it is a random write, the page space utilization is about 1 tick 2-15 tick 16.

When row_format = DYNAMIC | COMPRESSED, the maximum index length is 3072 bytes, and when row_format = REDUNDANT | COMPACT, the maximum index length is 767 bytes. When page size is not the default 16KB, the maximum index length limit changes accordingly.

We then verify the basic structural characteristics of the InnoDB index respectively.

First, create the following test table:

[root@yejr.me] [innodb] > CREATE TABLE `t1` (

`id`int (10) unsigned NOT NULL AUTO_INCREMENT

`c1`int (10) unsigned NOT NULL DEFAULT'0'

`c2` varchar (100) NOT NULL

`c3` varchar (100) NOT NULL

PRIMARY KEY (`id`)

KEY `c1` (`c1`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Write 10 pieces of test data in the following way:

Set @ uuid1=uuid (); set @ uuid2=uuid ()

Insert into T1 select 0, round (rand () * 1024)

@ uuid1, concat (@ uuid1, @ uuid2)

Look at the overall structure of the T1 table:

# check with the innodb_ruby tool

[root@yejr.me] # innodb_space-s ibdata1-T innodb/t1 space-indexes

Id name root fseg fseg_id used allocated fill_factor

238 PRIMARY 3 internal 1 1 1 100.00%

238 PRIMARY 3 leaf 2 00 0.005%

239 C1 4 internal 3 1 1 100.00%

239 c1 4 leaf 4 0 0 0.0

# check with the innblock tool

[root@yejr.me] # innblock innodb/t1.ibd scan 16

...

= INDEX_ID:238

Level0 total block is (1)

Block_no: 3. Level: 0 | * |

= INDEX_ID:239

Level0 total block is (1)

Block_no: 4. Level: 0 | * |

You can see

Index ID index type root node page no index layer height

238 Primary key index (clustered index) 31

239 Secondary Index 41

3. Verification of the characteristics of InnoDB index

3.1 feature 1: the clustered index leaf node stores the whole row of data

Scan the third page first and intercept the contents of the first physical record:

[root@yejr.me] # innodb_space-s ibdata1-T innodb/t1-p 3 page-dump

...

Records:

{: format= >: compact

: offset= > 127,

: header= >

{: next= > 263

: type= >: conventional

: heap_number= > 2

: nroomowned = > 0

: min_rec= > false

: deleted= > false

: nulls= > []

: lengths= > {"c2" = > 36, "c3" = > 72}

: externs= > []

: length= > 7}

: next= > 263

: type= >: clustered

# the first physical record, id=1

: key= > [{: name= > "id",: type= > "INT UNSIGNED",: value= > 1}]

: row= >

[{: name= > "C1",: type= > "INT UNSIGNED",: value= > 777}

{: name= > "c2"

: type= > "VARCHAR"

: value= > "a1c1a7c7-bda5-11e9-8476-0050568bba82"}

{: name= > "c3"

: type= > "VARCHAR"

: value= >

"a1c1a7c7-bda5-11e9-8476-0050568bba82a1c1aec5-bda5-11e9-8476-0050568bba82"}]

: sys= >

[{: name= > "DB_TRX_ID",: type= > "TRX_ID",: value= > 10950}

{: name= > "DB_ROLL_PTR"

: type= > "ROLL_PTR"

: value= >

{: is_insert= > true

: rseg_id= > 119

: undo_log= > {: page= > 469,: offset= > 272}]

: length= > 129

: transaction_id= > 10950

: roll_pointer= >

{: is_insert= > true,: rseg_id= > 119,: undo_log= > {: page= > 469,: offset= > 272}

Obviously, it does store the contents of the whole piece of data.

The key value (key) of the clustered index tree is the primary key index value (iNode 10), and the clustered index node value (value) is the other nonclustered index columns (c1journal c2 and c3) and the implicit columns (DB_TRX_ID, DB_ROLL_PTR).

Optimization recommendation 1: try not to store large object data, so that each leaf node can store more data, reduce the fragmentation rate, and improve buffer pool utilization. In addition, overflow can be avoided as much as possible.

3.2 feature 2: clustered index non-leaf nodes store pointers to child nodes

Continue to write new data to the above test table until the clustered index tree is split from one tier to two.

According to the calculation method in the old InnoDB table when the height of the aggregation index layer changes, we estimate that a leaf node can store up to 111records, so when inserting section 112 records, it will split from one layer height to two layers height. After the actual measurement, this is indeed the case.

[root@yejr.me] [innodb] > select count (*) from T1

+-+

| | count (*) |

+-+

| | 112 |

+-+

[root@yejr.me] # innblock innodb/t1.ibd scan 16

...

= INDEX_ID:238

Level1 total block is (1)

Block_no: 3. Level: 1 | * |

Level0 total block is (2)

Block_no: 5 Magi level: 0 | * | block_no: 6 Magi level: 0 | * |

...

At this point, you can see that the root node is still pageno=3, while the leaf node becomes [5,6] two page. Therefore, there should be only two physical records on the root node, storing pointers to the two page of pageno= [5,6].

Let's analyze No. 3 page and take a look at its specific structure:

[root@yejr.me] # innodb_space-s ibdata1-T innodb/t1-p 3 page-dump

...

Records:

{: format= >: compact

: offset= > 125

: header= >

{: next= > 138

: type= >: node_pointer

: heap_number= > 2

: nroomowned = > 0

: min_rec= > true, # the first record is min_key

: deleted= > false

: nulls= > []

: lengths= > {}

: externs= > []

: length= > 5}

: next= > 138

: type= >: clustered

# the first record stores only the key value

: key= > [{: name= > "id",: type= > "INT UNSIGNED",: value= > 1}]

: row= > []

: sys= > []

: child_page_number= > 5, # value is the leaf node pageno=5 that points to

: length= > 8} # the whole record consumes 8 bytes. Besides the key value of 4 bytes, the pointer also needs 4 bytes.

{: format= >: compact

: offset= > 138

: header= >

{: next= > 112

: type= >: node_pointer

: heap_number= > 3

: nroomowned = > 0

: min_rec= > false

: deleted= > false

: nulls= > []

: lengths= > {}

: externs= > []

: length= > 5}

: next= > 112

: type= >: clustered

# the second record stores only the key value

: key= > [{: name= > "id",: type= > "INT UNSIGNED",: value= > 56}]

: row= > []

: sys= > []

: child_page_number= > 6, # value is the leaf node pageno=6 that points to

: length= > 8}

Optimization recommendation 2: the smaller the index column data length, the better, so that the index tree storage efficiency is higher, the more data can be stored in non-leaf nodes, slow down the high splitting speed of the index tree layer, and the average search efficiency is higher.

3.3 feature 3: secondary indexes also store primary key index column values

In the secondary index, the column value of the primary key index (or clustered index) is always stored at the same time, and its function is that when scanning the secondary index, the corresponding clustered index value can be obtained directly from the leaf node. and according to this value back to the table query to obtain row data (if you need to return to the table query). This feature is also known as Index Extensions (a new optimizer feature after version 5.6, see Use of Index Extensions for details).

In addition, in the non-leaf node of the secondary index, the key value of the index record is the column value defined by the index, and the corresponding value value is the clustered index column value (PKV). If the secondary index definition already contains a partial clustered index column, the value value of the index record is the remaining clustered index column value that is not included.

Create the following test table:

CREATE TABLE `t3` (

`a`int (10) unsigned NOT NULL AUTO_INCREMENT

`b` int (10) unsigned NOT NULL DEFAULT'0'

`c`varchar (20) NOT NULL DEFAULT''

`d`varchar (20) NOT NULL DEFAULT''

`e`varchar (20) NOT NULL DEFAULT''

PRIMARY KEY (`a`, `b`)

KEY `k1` (`c`, `b`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Randomly insert some test data:

# call shell script to write 500pieces of data

[root@yejr.me] # cat insert.sh

#! / bin/bash

. ~ / .bash_profile

Cd / data/perconad

ITunes 1

Max=500

While [$I-le $max]

Mysql-Smysql.sock-e "insert ignore into T3 select

Rand () * 1024, rand () * 1024, left (md5 (uuid ()), 20)

Left (uuid (), 20), left (uuid (), 20); "innodb

I = `expr $I + 1`

Done

# actually write 498 pieces of data (of which 2 primary keys failed to conflict)

[root@yejr.me] [innodb] > select count (*) from T3

+-+

| | count (*) |

+-+

| | 498 |

+-+

Parse the data structure:

# Primary key

[root@test1 perconad] # innodb_space-s ibdata1-T innodb/t2 space-indexes

Id name root fseg fseg_id used allocated fill_factor

245 PRIMARY 3 internal 1 1 1 100.005%

245 PRIMARY 3 leaf 2 5 5 100.00%

246k1 4 internal 3 1 1 100.00%

246 k1 4 leaf 4 2 2 1

[root@yejr.me] # innodb_space-s ibdata1-T innodb/t2-p 4 page-dump

...

Records:

{: format= >: compact

: offset= > 126

: header= >

{: next= > 164,

: type= >: node_pointer

: heap_number= > 2

: nroomowned = > 0

: min_rec= > true

: deleted= > false

: nulls= > []

: lengths= > {"c" = > 20}

: externs= > []

: length= > 6}

Next= > 164,

: type= >: secondary

: key= >

[{: name= > "c",: type= > "VARCHAR (80)",: value= > "00a5d42dd56632893b5f"}

{: name= > "b",: type= > "INT UNSIGNED",: value= > 323}]

: row= >

[{: name= > "a",: type= > "INT UNSIGNED",: value= > 310}

{: name= > "b",: type= > "INT UNSIGNED",: value= > 9}]

# the value here is parsed to column b, which is actually a pointer to the leaf node, that is, child_page_number=9

# the true value of column b is 323

: sys= > []

: child_page_number= > 335544345

# the parsing here is not accurate, it is actually the record header of the next record, with a total of 6 bytes

: length= > 36}

{: format= >: compact

Offset= > 164,

: header= >

{: next= > 112

: type= >: node_pointer

: heap_number= > 3

: nroomowned = > 0

: min_rec= > false

: deleted= > false

: nulls= > []

: lengths= > {"c" = > 20}

: externs= > []

: length= > 6}

: next= > 112

: type= >: secondary

: key= >

[{: name= > "c",: type= > "VARCHAR (80)",: value= > "7458824a39892aa77e1a"}

{: name= > "b",: type= > "INT UNSIGNED",: value= > 887}]

: row= >

[{: name= > "a",: type= > "INT UNSIGNED",: value= > 623}

{: name= > "b",: type= > "INT UNSIGNED",: value= > 10}]

# same as above, it is actually child_page_number=10, not the value of column b

: sys= > []

: child_page_number= > 0

: length= > 36} # data length 16 bytes

By the way, TRX_ID and ROLL_PTR are not stored on secondary indexes (they are only stored on clustered indexes).

The content of the non-leaf node parsed by the innodb_ruby tool above is not accurate, so we open the data file in binary mode and verify it again:

# you can also use the hexdump tool here

[root@yejr.me] # vim-b path/t3.ibd

...

%! xxd

# find the part of the data where the secondary index is located

0010050: 0002 0272 0000 00e1 0000 0002 01b2 0100... r.

0010060: 0200 1b69 6e66 696d 756d 0003 000b 0000... infimum.

0010070: 7375 7072 656d 756d 1410 0011 0026 3030 supremum.&00

0010080: 6135 6434 3264 6435 3636 3332 3839 3362 a5d42dd56632893b

0010090: 3566 0000 0143 0000 0136 0000 0009 1400 5f...C...6.

00100a0: 0019 ffcc 3734 3538 3832 3461 3339 3839.... 7458824a3989

00100b0: 3261 6137 3765 3161 0000 0377 0000 026f 2aa77e1a...w...o

00100c0: 0000 000a 0000 0000 0000.

# refer to the physical structure of page for parsing, and get the following results

/ * first record * /

1410 0011 0026, record header, 5 bytes

3030 6135 6434 3264 6435 3636 3332 3839 3362 356 356 5d42dd56632893b5f5fengjue 20B

0000 0143, baked 323,4B

0000 0136, axiom 310, 4B

0000 0009, child_pageno=9, 4B

/ * 2 * /

1400 0019 ffcc, record header

3734 3538 3832 3461 3339 3839 3261 6137 3765 3161, cantilever 7458824a39892aa77e1a'

0000 0377, baked 887

0000 026f, axi623

0000 000a, child_pageno=10, how much is the flow of people in Zhengzhou http://www.hnmt120.com/

Now, on the other hand, the above page-dump parsed with the innodb_ruby tool should look like this (I only selected one record, please compare it with the previous one):

{: format= >: compact

Offset= > 164,

: header= >

{: next= > 112

: type= >: node_pointer

: heap_number= > 3

: nroomowned = > 0

: min_rec= > false

: deleted= > false

: nulls= > []

: lengths= > {"c" = > 20}

: externs= > []

: length= > 6}

: next= > 112

: type= >: secondary

: key= >

[{: name= > "c",: type= > "VARCHAR (80)",: value= > "7458824a39892aa77e1a"}

{: name= > "b",: type= > "INT UNSIGNED",: value= > 887}]

: row= > [{: name= > "a",: type= > "INT UNSIGNED",: value= > 623}]

: sys= > []

: child_page_number= > 10

: length= > 36}

As you can see, indeed, as mentioned earlier, the value value of the non-leaf node of the secondary index stores the clustered index column value.

Optimization recommendation 3: the smaller the length of the secondary index column definition, the better. When defining the secondary index, it is not necessary to explicitly add the clustered index column (after version 5.6).

Feature 4: when no clustered index column is available, the built-in ROW_ID is used as the clustered index

Create several tables like the following to select the built-in ROW_ID as the clustered index:

[root@yejr.me] [innodb] > CREATE TABLE `tn1` (

`c1` int (10) unsigned NOT NULL DEFAULT 0

`c2` int (10) unsigned NOT NULL DEFAULT 0

) ENGINE=InnoDB

Loop to write data to several tables:

Insert into tt1 select 1,1

Insert into tt2 select 1,1

Insert into tt3 select 1,1

Insert into tt1 select 2,2

Insert into tt2 select 2,2

Insert into tt3 select 2,2

Check the data in the tn1-tn3 table (here, because the result parsed by the innodb_ruby tool is not accurate, I use hexdump to analyze it instead):

Tn1

000c060: 0200 1a69 6e66 696d 756d 0003 000b 0000... infimum.

000c070: 7375 7072 656d 756d 0000 1000 2000 0000 supremum.... ...

000c080: 0003 1200 0000 003d f6aa 0000 01d9 0110.

000c090: 0000 0001 0000 0001 0000 18ff d300 0000.

000c0a0: 0003 1500 0000 003d f9ad 0000 01da 0110.

000c0b0: 0000 0002 0000 0002 0000 0000 0000.

Tn2

000c060: 0200 1a69 6e66 696d 756d 0003 000b 0000... infimum.

000c070: 7375 7072 656d 756d 0000 1000 2000 0000 supremum.... ...

000c080: 0003 1300 0000 003d f7ab 0000 0122 0110.

000c090: 0000 0001 0000 0001 0000 18ff d300 0000.

000c0a0: 0003 1600 0000 003d feb0 0000 01db 0110.

000c0b0: 0000 0002 0000 0002 0000 0000 0000.

Tn3

000c060: 0200 1a69 6e66 696d 756d 0003 000b 0000... infimum.

000c070: 7375 7072 656d 756d 0000 1000 2000 0000 supremum.... ...

000c080: 0003 1400 0000 003d f8ac 0000 0123 0110.

000c090: 0000 0001 0000 0001 0000 18ff d300 0000.

000c0a0: 0003 1700 0000 003e 03b3 0000 012a 0110. >.

000c0b0: 0000 0002 0000 0002 0000 0000 0000.

The values that represent DB_ROW_ID are:

Tn1

0003 12 = > (1Pol 1)

0003 15 = > (2jue 2)

Tn2

0003 13 = > (1Pol 1)

0003 16 = > (2jue 2)

Tn3

0003 14 = > (1pc1)

0003 17 = > (2jue 2)

It is clear that the built-in DB_ROW_ID is indeed shared and self-incremented at the entire instance level, rather than having a single DB_ROW_ID sequence for each table.

We can imagine that if multiple tables in an instance use this DB_ROW_ID, it is bound to cause competition / wait for concurrent requests. In addition, it may also cause serious replication delay problems due to data scanning mechanism problems during relay log playback from the library in the master-slave replication environment. For more information, please refer to the lookup of slave database data and parameter slave_rows_search_algorithms.

Optimization recommendation 4: display and define the available clustered index / primary key index on your own, and don't let InnoDB choose the built-in DB_ROW_ID as the clustered index to avoid potential performance losses.

The space is already a little big, so let's stop here for this analysis and continue later.

4. Summary of several points

Finally, let's summarize a few suggestions for the InnoDB engine table.

Each table should have an explicit primary key, preferably self-increasing, and have no business use

Whether it is a primary key index or a secondary index, choose columns with smaller data types as much as possible

When defining a secondary index, it is not necessary to explicitly add a primary key index column (after MySQL 5.6)

The shorter the row data, the better, and the better if each column is of fixed length (not a variable length type like VARCHAR)

The above test environment is based on Percona Server 5.7.22:

The version of # MySQL is Percona Server 5.7.22-22. I downloaded the source code and compiled it myself.

[root@yejr.me#] mysql-Smysql.sock innodb

...

Server version: 5.7.22-22-log Source distribution

...

[root@yejr.me] >\ s

...

Server version: 5.7.22-22-log Source distribution

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.