Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A detailed introduction to the line format of InnoDB in Mysql

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "detailed introduction of the line format of InnoDB in Mysql". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the detailed introduction of line format of InnoDB in Mysql".

1. Pages in InnoDb

We know that in order to process data, we must first put the data in memory, so how does Mysql read and write records? Mysql divides data into several pages and takes pages as the basic unit of interaction between disk and memory. The size of pages in InnoDB is generally 16 KB. That is, in general, the contents of 16KB are read from disk to memory at least at one time, and the contents of 16KB in memory are flushed to disk at least at a time.

2. What are the line formats of InnoDB

The so-called row format is the binary format in which a record of a table is stored on disk. To date, InnoDB has four row formats: Compact, Redundant, Dynamic, and Compressed. These row formats are described below.

(1) Compact

Schematic diagram:

A record store is divided into two parts: the additional information of the record and the real data of the record:

1. Additional information recorded

The additional information of the record includes a variable length field length list, a null value list, and record header information:

(1) variable length field length list

The so-called variable length field refers to fields such as VARCHAR (M), VARBINARY (M), various TEXT types and various BLOB types. The variable length field list is mainly the byte length occupied by the real data of these fields stored, and the order of the list is in reverse order according to the table field. Each field in the length list uses 1-2 bytes to measure its byte length, whether 1 or 2 bytes is calculated by recording the maximum byte length occupied by the field and the real data length. The specific rule is: first look at the maximum byte length of the field, less than 255 is directly represented by 1 byte, then if it is greater than 255? For example, varchar (100) under utf8 encoding format has a maximum byte length of 3 "100" 300, which exceeds the maximum number of energy-efficient representations of a word. At this point, you should look at the number of bytes occupied by real data characters. If the number of data bytes occupied by real data characters is less than 127, use 1 byte, which is greater than 2 bytes. Why is it divided by 127? because a byte has 8 bits, the first bit is used to identify whether the next byte needs to be read together as the byte length of the field (that is, whether the field uses 1 byte for length or 2 bytes). 0 is required, and 1 means no. If the record data bytes are too long, resulting in a row overflow (more on this later), the variable length field length list indicates that the length of the field is still 2 bytes, which only indicates the data length of the record on this page. Because 2 bytes can represent a byte length of 2 to the 15th power, which is much longer than the length of a page read and written by InnoDb (16KB), even if the record has only one field, the data on this page stores all the data in this field, and those 2 bytes represent the length of this page.

Note: for CHAR (M) type columns, when the column uses a fixed-length character set, the number of bytes occupied by the column will not be added to the variable-length field length list, and if the variable-length character set is used, the number of bytes occupied by the column will also be added to the variable-length field length list. In addition, the CHAR type field under the fixed-length character set, if it involves updating or deleting, will not produce hard disk fragmentation, so it is more efficient than varchar.

(2) list of NULL values

The list of null values will have a list of null values only if the metadata of the table where the record is located stipulates that a field can have null values. The list of null values maintains whether the field is null based on the bit vector, that is, using 0 and 1 of binary bits to indicate whether the field is null, and the list is also in reverse order. In addition, InnoDB also stipulates that the list of null values must be represented by bits in integer bytes, and if the number of binary bits used is not integer bytes, then add 0 in the high order of the byte.

(3) record header information

The record header information consists of fixed 5 bytes, including:

In the picture above, some concepts may not be clear. If you write another article later, you will learn again.

two。 Recorded real data

In addition to the user-defined column data, InnoDB also adds some columns (also known as hidden columns) to each record by default, as follows:

Where row_id is not necessarily required, InnoDB automatically adds this column only if there is no primary key in the table.

(2) Redundant

Schematic diagram:

This line format name is also known as Redundant, indicating that it is out of date and is generally not used now, but here is an introduction to the difference between Compact and Compact.

1. Field length offset list

There are two differences from the variable length field length list:

The absence of the word variable length means that the Redundant row format stores the length information of all columns (including hidden columns) in the record in reverse order to the field length offset list.

The addition of two offset words means that the length of the column value is not calculated as intuitively as the Compact row format, which uses the difference between two adjacent values to calculate the length of each column value. For example, the decimal representation of the length of each field in reverse order is 6, 12, 9, then the offset is 6, 18 (18-6-12), 27 (27-18-9).

As you can see from the above, the field length offset list is essentially where the space occupied by the values in each column ends at the real data of the record, which is relatively simple and intuitive.

Note: the rule is similar to Compact for using 1 byte or 2 bytes, but it determines the true data length of all fields of the record. If the real data is less than 127bytes, the offset of each column occupies 1 byte, which is greater than 127. it is divided by two bytes. Of course, the real data may exceed the maximum number of bytes that can be represented by two bytes. At this time, it is still two bytes for the same reason as Compact. 2 bytes is enough to represent the maximum offset of the page (because one page is 16K, that is, 16384 bytes), and the rest is overflow column data, which is stored by other pages, and this page only holds the pointing addresses of other pages.

two。 Record header information

Compared to the record header information of Compact:

N_field and 1byte_offs_flag are added to the Redundant line format.

Unlike Compact, Redundant puts the length of one or two bytes in the header information, the specific rule is similar to Compact, but determines the true data length of all fields of the record. If the real data is less than 127bytes, the offset of each column occupies 1 byte, which is greater than 127. here you may wonder why it is 127and not 255( the maximum number represented by 1 byte), because it is different from Compact format. Redundant does not centrally store the null value information, but uses the first byte of the field length offset list to identify whether the field is not null) and is divided by two bytes. Of course, the real data may exceed the maximum number of bytes that can be represented by two bytes, which is still two bytes, because the same Compact,2 bytes is enough to represent the maximum offset of the page (because one page is 16K, that is, 16384 bytes). The rest is overflow column data, which is stored by other pages, and this page only stores the pointing addresses of other pages.

The Redundant line format does not have the attribute record_type.

(3) Dynamic

The differences with Compact are:

Instead of storing the first 768 bytes of the real data of the field at the real data of the record, all the bytes are stored in other pages, and only the addresses of other pages are stored at the real data of the record.

(4) Compressed

Unlike Dynamic, it is:

The Compressed row format uses a compression algorithm to compress the page to save space.

Third, how to view the row format

In the version of Mysql5.7.26 that I use, the line format defaults to Dynamic. The row format commands for how to view a table are:

Show table STATUS like 'table name' III. Syntax for specifying row format

Statements that create or modify tables:

CREATE TABLE table name (column information) ROW_FORMAT= row format name

ALTER TABLE table name ROW_FORMAT= row format name

Fourth, the processing of row overflow data (1) what is row overflow

DDL defines Times error field overflow

MySQL has a limit on the maximum storage space occupied by a record. Except for columns of type BLOB or TEXT, the total byte length of all columns (excluding hidden columns and record header information) cannot exceed 65535 bytes. If the ERROR is exceeded, for example, create a table with only one field encoded in ascii (1 byte is 1 character):

CREATE TABLE varchar_size_demo (- > c VARCHAR (65535)->) CHARSET=ascii ROW_FORMAT=Compact;ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

Note: 65535 here contains not only the data of the column itself, but also some other data (storage overhead), such as null value identification (which can be required for null fields) and the length of bytes occupied by real data. In this way, it can be calculated that if a table with only one field can be null, then the available bytes of the real data in the field is 65535-2 (length of bytes occupied by real data)-1 (null value identification) = 65532. The above statement can be changed to:

CREATE TABLE varchar_size_demo (c VARCHAR (65532)) CHARSET=ascii ROW_FORMAT=Redundant > OK > time: 0.124s

This is just enough to pretend.

Run-time row overflow (no error reported)

In Compact and Reduntant row formats, for columns that take up a lot of storage space, only part of the data of the column is stored at the real data of the record, the rest of the data is stored separately in several other pages, and then the address to these pages is stored with 20 bytes at the real data of the record (of course, these 20 bytes also include the number of bytes occupied by the data scattered in other pages) Thus, you can find the page where the rest of the data is located. As shown in the figure:

So how do you calculate the critical point of the data length that produces the row overflow? This is related to several restrictions:

MySQL stipulates that at least two rows of records are stored in a page.

There is only 16K on a page.

That is, as long as the two pieces of data are guaranteed, the total data size does not exceed 16K minus the size of other records in the page that are not used to store records (fixed 132 bytes), there will be no row overflow, but a piece of data not only stores real data, but also other. Taking Compact as an example, assuming that there is only one field and can be Null, the additional information required for each record is 27 bytes, including:

The length of 2 bytes used to store real data

1 byte is used to store whether the column is null (no more than 255fields can be null, so use 1 byte directly)

5-byte header information

6-byte row_ id column

6-byte transaction_ id column

7-byte roll_ poi column

Suppose the number of bytes of data stored in a column is n, as long as:

132 + 2 × (27 + n) < 16384

The record does not cause a row overflow. Of course, if there are multiple fields in the table, 27 in the above formula may increase (because the byte footprint of "length of real data" and "whether the column is null" may increase).

Thank you for your reading, the above is the content of "detailed introduction of the line format of InnoDB in Mysql". After the study of this article, I believe you have a deeper understanding of the detailed introduction of the line format of InnoDB in Mysql, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report