Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

MySQL: how to quickly view Innodb data files

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Welcome to my "in-depth understanding of MySQL Master and Slave principle 32", as follows:

If the picture cannot be displayed, please see the following link:

Https://www.jianshu.com/p/d636215d767f

Note: the output format of this article is all hexadecimal format.

My friends often ask me questions about how to view Innodb data files, such as:

If I am a UTF8 character set, how many bytes does it take to insert the character'a'? What are the differences in the storage of row data between primary keys and ordinary index leaf nodes? How to prove the existence of rowid? How is the null value in the data stored? What is the difference between char and varchar in storage?

……

If you want to get the answer in addition to learning the source code, perhaps a more intuitive way is to look at the Innodb ibd data file, as the saying goes, "seeing is believing", but we know that the data file is in binary form, Innodb parses the format through the established access method to get the correct results. If we are going to access these ibd files, the usual way is to do binary access through commands such as hexdump-Cv, which I did at first, but it looks particularly uncomfortable for the eyes. So I wrote two tools:

Innblock: a tool for parsing data blocks that can get the offset of each row and sort it in logical and physical order. For more information, please refer to https://www.jianshu.com/p/5c1a99614fb8.

Download address: https://github.com/gaopengcarl/innblock except the code I have compiled and can be used directly

Bcview: a gadget that divides a data file into blocks according to a given size (such as 16K), and then accesses the number of bytes specified after the offset of each block. Usually we don't know how long the record is, so we can set a large number of bytes to view.

Download address: https://github.com/gaopengcarl/bcview except the code I have compiled and can be used directly

With these two tools, it may be more convenient to access ibd data files. I will use these two tools to view the data files to solve the problems we mentioned at the beginning.

I. brief introduction of row structure

This article does not intend to explain the detailed Innodb file structure. There are many such articles and books, such as:

Https://blog.jcole.us/innodb/

The whole series explains the structure of Innodb files. We only need to know the ordinary data files. After removing the block-level overhead, the first row of records starts at offset 96, and the first two pseudo columns infimum and supremum appear first. Their positions are fixed in the block of 94-120 bytes, of which 94-107is infimum-related information, while 107to120 is supremum-related information, with heap no of 0 and 1, respectively, which are the beginning and end of logical records. All the actual records are linked to this linked list.

The general format of ordinary records is as follows:

I'll call the yellow part "line header" for a moment. In the figure, the position of each row of innblock data offset marked in pink, we find that the innblock tool points to the location where the actual field is opened after the line header.

Here is a typical partial output of an innblock tool:

-Total used rows:3 used rows list (logic): (1) INFIMUM record offset:99 heapno:0 n_owned 1 deluxe minflag:0 rectype:2 N minflag:0 rectype:2 (2) normal record offset:128 heapno:2 n_owned 0 Delphi Vera N minflag:0 rectype:0 (3) SUPREMUM record offset:112 heapno:1 n_owned 2 Delphi Velcro N minflag:0 rectype:3-Total used rows:3 used rows list (phy): (1) INFIMUM record offset:99 heapno:0 n_owned 1 Delflag:N minflag:0 rectype:2 (2) SUPREMUM record offset:112 heapno:1 n_owned 2 deluxe minflag:0 rectype:3 N minflag:0 rectype:0 (3) normal record offset:128 heapno:2 n_owned 0 Deloitte minflag:0 rectype:3 N minflag:0 rectype:0

We can find a row of normal records other than infimum and supremum records, and mark the starting point of the record field (offset:128), that is, the pink part of the figure, but it is important to note that for the clustered index (the table itself), if there is no primary key, the first three columns are:

Rowid 6 bytes trx id 6 bytes roll ptr 7 bytes

If there is a primary key:

Primary key and definition about trx id 6 bytes roll ptr 7 bytes

The source code for rowid\ trx id\ roll ptr is defined as follows:

# define DATA_ROW_ID 0 / * row id: a 48-bit integer * / # define DATA_ROW_ID_LEN 6 / * stored length for row id * / # define DATA_TRX_ID 1 / * transaction id: 6 bytes * / # define DATA_TRX_ID_LEN 6#define DATA_ROLL_PTR 2 / * rollback data pointer: 7 bytes * / # define DATA_ROLL_PTR_LEN 7

For the specific meaning of roll ptr, please refer to the function trx_undo_decode_roll_ptr as follows:

/ * Decodes a roll pointer. * / / from high to low is / / whether the 1st bit is insert / / whether the 2nd to 8th bit is segmentid//, the 9th to 40th bit is page no / / 41st bit to the 56th bit is OFFSETUNIV_INLINEvoidtrx_undo_decode_roll_ptr (/ * = * / roll_ptr_t roll_ptr, / *!

< in: roll pointer */ ibool* is_insert, /*!< out: TRUE if insert undo log */ ulint* rseg_id, /*!< out: rollback segment id */ ulint* page_no, /*!< out: page number */ ulint* offset) /*!< out: offset of the undo entry within page */{... ut_ad(roll_ptr < (1ULL >

= 16; / / move 16 bits to the right * page_no = (ulint) roll_ptr & 0xFFFFFFFFFFF / get 32 bits as page no roll_ptr > > = 32 is_insert / move 32 bits to the right * rseg_id = (ulint) roll_ptr & 0x7F bump / get 7 bits to segment id roll_ptr > > = 7 to the right * is_insert = (ibool) roll_ptr; / * TRUE==1 * / / the last bit} 2, set up the test table

To solve the problems at the beginning of the article, let's set up a test table as follows:

Drop table baguait1;create table baguait1 (id int primary key,c1 varchar (20), c2 varchar (20), c3 char (20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;alter table baguait1 add key (C1); insert into baguait1 values (1); insert into baguait1 values (1); insert into baguait1 values (2); insert into baguait1 values (3); insert into baguait1 values (4); select * from baguait1) +-- + | id | C1 | c2 | c3 | +-- + | 1 | NULL | gaopeng | gaopeng | | 2 | gaopeng | NULL | gaopeng | | 3 | gaopeng | NULL | NULL | | 4 | a | NULL | NULL | +-+ 4 rows in set (0.01sec)

We found that everything is actually not covered here except for the rowid problem, and then we use innblock to scan. As follows:

1. Scan the data file to find the primary key and the normal index data block

[root@gp1 test] #. / innblock baguait1.ibd scan 16 ·Datafile Total Size:114688===INDEX_ID:323level0 total block is (1) block_no: 3 INDEX_ID:324level0 total block is level: 0 | * | = = INDEX_ID:324level0 total block is (1) block_no: 4 grade level: 0 | * |

Here 323 is actually a clustered index, 324 is a normal index, and their data blocks correspond to 3 and 4.

2. Scan clustered index records

[root@gp1 test] #. / innblock baguait1.ibd 3 16 linked list section: = Block list info = =-Total used rows:6 used rows list (logic): (1) INFIMUM record offset:99 heapno:0 n_owned 1 minflag:0 rectype:0 N minflag:0 rectype:2 (2) normal record offset:128 heapno:2 n_owned 0 delta minflag:0 rectype:0 N minflag:0 rectype:0 (3) normal record offset:180 heapno:3 n_owned 0 Delflag:N minflag:0 rectype:0 (4) normal record offset:231 heapno:4 n_owned 0 deluxe minflag:0 rectype:0 N minflag:0 rectype:0 (5) normal record offset:262 heapno:5 n_owned 0 Magistral deluxe minflag:0 rectype:0 N minflag:0 rectype:0 (6) SUPREMUM record offset:112 heapno:1 n_owned 5 deluxe Velcro N minflag:0 rectype:3-Total used rows:6 used rows list (phy): (1) INFIMUM record offset:99 heapno:0 n_owned 1 Delflag:N minflag:0 rectype:2 (2) SUPREMUM record offset:112 heapno:1 n_owned 5 deluxe minflag:0 rectype:3 N minflag:0 rectype:3 (3) normal record offset:128 heapno:2 n_owned 0Personnal deluxe normal record offset:128 heapno:2 n_owned N minflag:0 rectype:0 (4) normal record offset:180 heapno:3 n_owned 0Different deluxe minflag:0 rectype:0 (5) normal record offset:231 heapno:4 n_owned 0minddeliciouslace N minflag:0 rectype:0 (6) normal record offset:262 heapno:5 n_owned 0 Delflag:N minflag:0 rectype:0

3. Scan general index records

`

[root @ gp1 test] #. / innblock baguait1.ibd 4 16

Linked list section:

= Block list info =

-Total used rows:6 used rows list (logic):

(1) INFIMUM record offset:99 heapno:0 n_owned 1 delineated N minflag:0 rectype:2

(2) normal record offset:126 heapno:2 n_owned 0mm delta minflag:0 rectype:0 N

(3) normal record offset:173 heapno:5 n_owned 0mm delta minflag:0 rectype:0 N

(4) normal record offset:137 heapno:3 n_owned 0mm delta minflag:0 rectype:0 N

(5) normal record offset:155 heapno:4 n_owned 0mm delta minflag:0 rectype:0 N

(6) SUPREMUM record offset:112 heapno:1 n_owned 5 deluxe minflag:0 rectype:3 N

-Total used rows:6 used rows list (phy):

(1) INFIMUM record offset:99 heapno:0 n_owned 1 delineated N minflag:0 rectype:2

(2) SUPREMUM record offset:112 heapno:1 n_owned 5 delineated N minflag:0 rectype:3

(3) normal record offset:126 heapno:2 n_owned 0mm delta minflag:0 rectype:0 N

(4) normal record offset:137 heapno:3 n_owned 0mm delta minflag:0 rectype:0 N

(5) normal record offset:155 heapno:4 n_owned 0mm delta minflag:0 rectype:0 N

(6) normal record offset:173 heapno:5 n_owned 0mm delta minflag:0 rectype:0 N

We find that both the clustered index and the ordinary index contain four ordinary records, and the offset of the record is collected. we need to note that in the logical linked list order of the general index, we can see that the fourth record (offset:173) has been placed at the third place, in fact, it is inserted last, because the order of the'a 'character must precede the string' gaopeng''. And the first line records C1 as NULL, which is still the first in the logical linked list order. All right, let's solve the problems one by one. # 3. If I am a UTF8 character set, how many bytes does it take to insert the character'a'? The data we insert into the last record in our statement is'a record', that is:

| | 4 | a | NULL | NULL |

Let's use bcview to see what the data of the clustered index (offset 262) is. We look at 20 bytes directly after block 3 (offset 262), as follows:

[root @ gp1 test] #. / bcview baguait1.ibd 16 262 30 | grep 00000003

Current block:00000003-Offset:00262-cnt bytes:30-data is:8000000400000005d970e000000043011061000000000000000000000000

Let's parse the same:-80000004: the primary key 4 400000005d970:trx id 8 is the symbol bit-400000005d970:trx id 6 bytes-e0000000430110:undo ptr 7 bytes-61: the character 'axiom' ascii encodes we find that it is all followed by zero, in fact, the character'a' only occupies one byte even under the UTF8 character. # 4. What are the differences in the storage of row data between primary keys and ordinary index leaf nodes? Let me sum it up first:-the primary key will contain all the fields, the normal index will only contain the field contents defined by it, the primary key will contain trx id and roll ptr, the ordinary index will not contain-even if the primary key is not defined, it will contain a clustered index arranged according to rowid, obviously it will not exist without defining a normal index-the normal index leaf node contains the primary key or rowid, let's verify it. Let's look at the second line of data, that is:

| | 2 | gaopeng | NULL | gaopeng | |

This record exists in (offset:180) on the primary key and (offset:137) in the general index. We parse the following: > primary key (block 3 offset 180):

[root @ gp1 test] #. / bcview baguait1.ibd 16 180 50 | grep 00000003

Current block:00000003-Offset:00180-cnt bytes:50-data is:8000000200000005d96adc00000042011067616f70656e6767616f70656e6720202020202020202020202020070600002000

Parse:-80000002: primary key-00000005d96a:trx id 6 bytes-dc000000420110:undo ptr 7 bytes-67616f70656e67: the second field's' gaopeng' 's ASCII code-67616f70656e6720202020202020202020202020: the fourth field's' gaopeng' 's ASCII code, and because it's a char (20) type, there's a 0X20 complement, which actually solves the fifth problem, and we can actually see that this complement takes up more space. Here we find that this record does not have a third field because it is NULL, which is included in the null bitmap, as we will explain later. > General index (block 4 offset 137):

[root @ gp1 test] #. / bcview baguait1.ibd 16 137 20 | grep 00000004

Current block:00000004-Offset:00137-cnt bytes:20-data is:67616f70656e67800000020700000020ffd56761

The parsing is as follows:-67616f70656e67virtual gaopeng's ASCII code-80000002: the content after the primary key value 2 is the header of the next line. If you are not sure about this, you can take a look at the last line. The position of the last line is (offset:173) as follows:

[root @ gp1 test] #. / bcview baguait1.ibd 16 173 20 | grep 00000004

Current block:00000004-Offset:00173-cnt bytes:20-data is:6180000004000000000000000000000000000000

Parsed as:-61 ASCII code for roll ptr:-80000004: the primary key value 4 is followed by 0, we can see here that there is no trx id and roll ptr, in addition to the key value, the ordinary index also contains the primary key. # 5. What is the difference between char and varchar in storage? I have already said this above, but let's take the second row of data as an example:

| | 2 | gaopeng | NULL | gaopeng | |

The first 'gaopeng' is varchar (20) and the second' gaopeng' is char (20). Here is how they are stored:-67616f70656e67: the second field's' gaopeng' 's ASCII code-67616f70656e6720202020202020202020202020: the fourth field's' gaopeng' 's ASCII code, and because it's a char (20) type, 0X20 complements 20 bytes, so we can actually see this complement operation, taking up more space. Don't talk too much about how the null value in the data is stored? Do you remember the NULL bitmap of the wardrobe? This bitmap actually reserves 1 bit of space for each field that can be used for NULL to mark whether the value of the field is NULL, of course at least 1 byte (8 bits).

+-+

| | id | C1 | c2 | c3 |

+-+

| | 1 | NULL | gaopeng | gaopeng | |

| | 2 | gaopeng | NULL | gaopeng | |

| | 3 | gaopeng | NULL | NULL | |

| | 4 | a | NULL | NULL |

+-+

C1\ c2\ c3 can all be empty, so we access the NULL bitmap information on the four record clustered indexes (block 3) respectively. The calculation is as follows:-Line 1: record (offset:128) then 128-5 (5-byte fixed)-1 (1-byte NULL bitmap) = 122-line 2: record (offset:180) then 180-5 (5-byte fixed)-1 (1-byte NULL bitmap) = 174-3: record (offset:231) then 231-5 (5-byte fixed)-1 (1-byte NULL bitmap) = 225- Line 4: record (offset:262) so 262-5 (5-byte fixed)-1 (1-byte NULL bitmap) = 256. with the offset, we can use bcview to access the 1-byte NULL bitmap information as follows: > line 1 [root@gp1 test] #. / bcview baguait1.ibd 16 122 1 | grep 00000003current block:00000003--Offset:00122--cnt bytes:01--data is:01 converted to binary: 0000 0001 > 2 Line [root@gp1 test] #. / bcview baguait1.ibd 16 1741 | convert grep 00000003current block:00000003--Offset:00174--cnt bytes:01--data is:02 to binary: 0000 0010 > line 3 [root@gp1 test] #. / bcview baguait1.ibd 16225 1 | convert grep 00000003current block:00000003--Offset:00225--cnt bytes:01--data is:06 to binary: 00000110 > line 4 [root@gp1 test] #. / bcview baguait1.ibd 16 | 2561 | convert grep 00000003current block:00000003--Offset:00256--cnt bytes:01--data is:06 to binary: 0000 0110. This is the representation below. " 1 is NULL, and we find that it is exactly the same as the NULL record in our record. | | c3 | c2 | C1 | |-| Line 1 | 0 | 1 | | Line 2 | 0 | 1 | 0 | Line 3 | 1 | 1 | 0 | Line 4 | 1 | 1 | 0 | when DDL modifies the NULL attribute of the field, it cannot be done quickly by modifying the data dictionary. I think modifying and changing the actual content of the ibd file is a large part of the reason. The following is a record of my modification of the NULL property, referring to the official documentation.

Set the NULL and NOT NULL properties

All of them are in inplace mode, because the NULL bitmaps need to be modified, so they all need to be reorganized, which is expensive.

ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NULL, ALGORITHM=INPLACE, LOCK=NONE

ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NOT NULL, ALGORITHM=INPLACE, LOCK=NONE

# 7. How to prove the existence of rowid? This is actually easy to prove, let's first create a record that does not contain a primary key and insert a record as follows:

Drop table baguait1

Create table baguait1 (id int, C1 varchar (20), c2 varchar (20), c3 char (20)) ENGINE=InnoDB DEFAULT CHARSET=utf8

Insert into baguait1 values (1, null, camera, gaopeng.com, gaopeng')

Mysql > select * from baguait1

+-+-

| | id | C1 | c2 | c3 |

+-+-

| | 1 | NULL | gaopeng | gaopeng | |

+-+-

1 row in set (0.00 sec)

Using the innblock scan, it is found that it contains only 1 block as follows:

[root @ gp1 test] #. / innblock baguait1.ibd scan 16

...

Datafile Total Size:98304

= INDEX_ID:325

Level0 total block is (1)

Block_no: 3. Level: 0 | * |

Then scan the block as follows:

[root @ gp1 test] #. / innblock baguait1.ibd 3 16

...

= Block list info =

-Total used rows:3 used rows list (logic):

(1) INFIMUM record offset:99 heapno:0 n_owned 1 delineated N minflag:0 rectype:2

(2) normal record offset:128 heapno:2 n_owned 0mm delta minflag:0 rectype:0 N

(3) SUPREMUM record offset:112 heapno:1 n_owned 2 deluxe N minflag:0 rectype:3

-Total used rows:3 used rows list (phy):

(1) INFIMUM record offset:99 heapno:0 n_owned 1 delineated N minflag:0 rectype:2

(2) SUPREMUM record offset:112 heapno:1 n_owned 2 deluxe N minflag:0 rectype:3

(3) normal record offset:128 heapno:2 n_owned 0mm delta minflag:0 rectype:0 N

Then use bcview to offset:128 this record as follows:

[root @ gp1 test] #. / bcview baguait1.ibd 16 128 60 | grep 00000003

Current block:00000003-Offset:00128-cnt bytes:60-data is:000001ac310000000005d97fea0000002c01108000000167616f70656e6767616f70656e672020202020202020202020202000000000000000000000

`

Let's analyze it:

000001ac3100:rowid 6 bytes, here is the relevant field value defined by the primary key 00000005d97f:trx id 6 bytes ea0000002c0110:roll ptr 7 bytes 80000001: the first field value 167616f70656e67: the second field value 'gaopeng' 's ASCII encoding 67616f70656e6720202020202020202020202020: the fourth field value' gaopeng' 's ASCII encoding, and char has a 0X20 complement of 20 bytes. Finally:

Of course, here are only some examples to illustrate the use of the tool, you can easily extract the information you are interested in from the ibd file according to your needs.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report