Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

On the Optimization of text and blob types of InnoDB Storage engine

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

When we optimize the database, we see that some tables use text or blob fields in their design. If the storage space of a single table reaches nearly 100 gigabytes or tens of gigabytes, it is very difficult to change and optimize this situation.

I. brief introduction

In order to understand the impact of large fields on performance, it is necessary to know how the innodb storage engine handles:

1. Some knowledge points

1.1 before InnoDB 1.0.x, the InnoDB storage engine provided Compact and Redundant (Redundant format is reserved for compatibility with previous versions) to store row record data. Compact and redundant are collectively referred to as Antelope (antelope).

For large fields such as blob,text,varchar (5120), innodb will only store the first 768 bytes in the data page, while the rest of the data will be stored in the overflow segment (applicable in the event of an overflow). The purpose of the maximum 768 bytes is to facilitate the creation of a prefix index / prefix index, and the rest of the content will be stored in an additional page, even if it is only one more byte. Therefore, the shorter the length of all columns, the better

Large fields can waste a lot of space in InnoDB. For example, if the stored field value is only one more byte than the row requirement, the entire page will also be used to store the remaining bytes, wasting most of the page space. If there is a value that is only slightly larger than the size of 32 pages, you actually need to use 96 pages

Extended storage disables adaptive hashing because you need to fully compare the entire length of the column to find out if the data is correct. (hashing helps InnoDB find the "guessed location" very quickly, but you must check whether the "guessed location" is correct.) Because adaptive hashing is a complete memory structure and points directly to the "most" frequently accessed pages in Buffer Pool, but Adaptive Hash cannot be used for extended storage space

Innodb_plugin in MySQL 5.1introduces a new file format: Barracuda (Barracuda), which has two new row formats: compressed and dynamic. The two formats completely overflow the blob field. Only 20 bytes are stored in the data page, and the rest are stored in the overflow section. Therefore, it is strongly not recommended to use BLOB, TEXT, VARCHAR column types longer than 255mm.

The page size of 1.3 innodb defaults to 16kb for index organization table, and the leaf node at the bottom of the tree is a two-way linked list, so there should be at least two rows of records in each page, which determines that innodb cannot store more than 8k rows of data, but in fact it should be smaller, because there are still some InnoDB internal data structures to store. After version 5.6, the new option innodb_page_size can be modified. In versions prior to 5.6, only source code recompilation can be modified, but it is not recommended to modify this configuration

1.4.When new data is written in data page of InnoDB, the space of 1x16 will be reserved, and the space can be used for subsequent new record writing, which reduces the overhead of frequently adding data page and is limited by InnoDB storage mode. If the data is written sequentially, ideally, the filling rate of data page is 15can16, but generally there is no way to guarantee complete sequential writing. Therefore, the filling rate of data page is generally from 1 to 15 to 16. Therefore, it is best for each InnoDB table to have a self-incrementing column as the primary key to make the writing of new records as sequential as possible; when the filling rate of data page is less than 1 / 2, InnoDB will shrink to free free space.

1.5 COMPACT line format saves about 20% of storage space compared to REDUNDANT, and COMPRESSED saves about 50% of storage space compared to COMPACT, but results in a 90% reduction in TPS. Therefore, COMPRESSED line format is strongly not recommended.

1.6 if the blob data type is used, will it necessarily be stored in the overflow segment? Usually we think that the storage of large objects such as blob will store data outside the data page, but the key point is to see whether two rows of data can be stored in a page. Blob can be stored completely in the data page (single row length does not exceed 8096 bytes), while varchar type may also be stored in overflow pages (single row length is more than 8096 bytes, the first 768 bytes are stored in the data page).

1.7 when mysql manipulates data in page, no matter whether it is updating, inserting or deleting a row of data, it needs to read the page of that row of data into memory, and then operate, so there is a hit rate problem. If a page can store enough rows relatively, then the hit rate will be relatively high, and the performance will be improved.

1.8The page of BLOB, TEXT or long VARCHAR columns stored in off-page is exclusive and cannot be shared. Therefore, it is strongly not recommended to use multiple long columns in a table.

1.9 MySQL 5.6 still defaults to Compact line format, and is currently the most widely used ROW FORMAT. Users can view the row format used by the current table by using the command SHOW TABLE STATUS LIKE'table_name', where the row_format column represents the type of row record structure currently in use

Mysql > desc db_page +-+-+ | Field | Type | | Null | Key | Default | Extra | + -+ | id | int (11) | NO | PRI | | auto_increment | | title | varchar | NO | | name | varchar | | YES | | content | text | YES | +-| -+ mysql > show variables like "innodb_file_format" +-+-+ | Variable_name | Value | +-+-+ | innodb_file_format | Barracuda | +- -- +-+ mysql > show table status like "db_page"\ row * 1. Row * * Name: db_page Engine: InnoDB Version: 10 Row_format: Compact Rows: 2 Avg_row_length: 8192 Data_length: 16384Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: 3 Create_time: 2017-03-07 13:30:19 Update_time: Check_time: Collation: utf8_general_ci Checksum: Create_options: Comment: Block_format: Original

In msyql 5.7.9 and later, the default line format is determined by the innodb_default_row_format variable, which defaults to DYNAMIC, and you can also specify ROW_FORMAT=DYNAMIC when create table.

Note that if you want to modify the row mode of an existing table to compressed or dynamic, you must first set the file format to Barracuda:set global innodb_file_format=Barracuda;, and then use ALTER TABLE tablename ROW_FORMAT=COMPRESSED; to modify it to take effect, otherwise the modification is invalid and there is no prompt

Second, the impact on large field types such as TEXT/BLOB 2.1 compact

The variable growth field type includes blob,text,varchar, in which the overflow page will also be saved when the length of the varchar column value is greater than a certain number of N. under the latin1 character set, the N value can be calculated as follows: the block size of innodb defaults to 16kb, because the innodb storage engine table organizes the table as an index, and the leaf node at the bottom of the tree is a two-way linked list, so there should be at least two rows of records in each page, which determines that innodb cannot store more than 8k rows of data. Subtract the number of bytes occupied by other column values, which is approximately equal to N. For InnoDB, memory is extremely precious. If you put all the 768-byte blob on the data page, although you can save some IO, the number of rows that can be cached will be reduced, that is, fewer index values can be cached, and the indexing efficiency will be reduced.

2.2 dynamic

Dynamic row format, whether the column is stored in the off-page page depends mainly on the row size. It will put the longest column in the row to the off-page until the data page can store the next two rows. TEXT/BLOB column

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report