In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly gives you a brief understanding of MySQL data line overflow, related technical terms you can check online or find some related books to supplement, here is not involved, we will go straight to the topic, hope to understand MySQL data line overflow This article can bring you some practical help.
I. Start with common mistakes
Let's start with a common SQL error message:
I believe that we must have encountered many times for this kind of error, especially for BG, which has content production as the main work core, in the storage of content lines, data must be an inseparable topic. The "large" data here is far more than the large storage space, which also includes the large storage of single (table) fields, the long retention time of data, the large volume caused by the lack of obvious data redundancy, the obvious change of access peak value with hot spots, the amplification of data storage pressure caused by complex logic processing, etc. Returning to the error report, let's first look at the structure of this table:
See here, I believe that everyone will have different processing methods, here will not compare the advantages and disadvantages of various processing methods, only describe the two processing methods with higher frequency of use.
According to the guidelines for error reporting, change the two large varchars (22288) to text and blob.
Reduce the storage length of varchar according to business characteristics, or split it into multiple small vachar and char according to rules
These two processing methods also have their own advantages and disadvantages. Changing the field to text or blob not only increases the capacity of data storage, but also can only use prefix or full-text index for the index page of this field. If the data stored on the business side is in json format, 5.7 supports json data type, which is a good choice. You can query and output for a single subclass. Similarly, if it is reduced and split, it will depend on the business scenario and logical requirements. The logic used by the business needs to be modified, and the engineering quantity needs to be evaluated.
II. In-depth exploration
Let's dive into some confusing concepts about limiting the size to "65535."
"65535" is not the maximum limit of N in a single varchar(N), but the byte sum of the non-large field types of the entire table.
---------------------------------------------------------------------------------------------
Every table (regardless of storage engine) has a maximum row size of 65,535 bytes. Storage engines may place additional constraints on this limit, reducing the effective maximum row size.
---------------------------------------------------------------------------------------------
2. Different character sets have an impact on the max that can be stored in a field. For example, UTF8 characters need 3 bytes to store, and for VARCHAR (255) CHARACTER SET UTF8 columns, it will occupy 255×3 =765 bytes. Therefore, the table cannot contain more than 65,535/765=85 columns. GBK is double byte and so on.
3. Variable length columns When evaluating the field size, consider the number of bytes of the actual length of the storage column. For example, VARCHAR (255) CHARACTER SET UTF8 column requires two extra bytes to store length information, so this column requires up to 767 bytes of storage, but can actually store up to 65533 bytes, leaving two bytes for length information.
4. BLOB, TEXT and JSON columns are different from varchar and char fields. Column length information is independent of row length storage and can reach 65535 bytes of real storage.
Defining NULL columns reduces the maximum number of columns allowed.
InnoDB table, NULL and NOT NULL column storage size is the same
MyISAM table, NULL column needs extra space to record whether its value is NULL. Each NULL requires one extra bit (rounded to the nearest byte). The maximum row length is calculated as follows:
row length = 1 + (sum of column lengths) + (number of NULL columns + delete_flag + 7)/8 + (number of variable-length columns)
static table, delete_flag = 1, static table identifies whether the row has been deleted by recording a bit in the row.
dynamic table, delete_flag = 0, this flag is stored at the head of the dynamic line, the dynamic table can be based on
6. For InnoDB tables, NULL and NOT NULL columns have the same storage size.
InnoDB allows up to 1000 columns in a single table.
Varchar primary keys only support fields of up to 767 bytes or 768/2=384 double bytes or 767/3=255 three bytes, while GBK is two bytes and UTF8 is three bytes.
9. The unused engine has different restrictions on the index
innodb The length of each column cannot be greater than 767 bytes; the sum of the lengths of all constituent index columns cannot be greater than 3072 bytes
myisam The length of each column cannot be greater than 1000 bytes, and the sum of the lengths of all constituent index columns cannot be greater than 1000 bytes.
III. The real fault
The following is the business failure encountered today. A large number of errors occurred in the online business, causing the program to fail to write data:
According to the prompt and normal thinking, our first reaction is that there are the following problems in the business:
The fields in the table structure set exceed the limit
The length of data inserted into a field exceeds the max value set by the field.
Next, I checked the database table structure of the business, as follows:
The first reason was quickly ruled out, because first of all, the error reported by the business did not appear when the table was created. If it was the sum of non-large fields in the table, 65535, the error would occur when the table was being built, while the business reported the error only when writing. Moreover, through the database table structure, it can be found that a large number of fields are mediumblob type fields, and the sum of non-large fields is much smaller than 65535.
Then, according to the specific SQL provided by the service, appversion, datatar, elt_stamp, id, these non-large fields, also do not exceed the limit. The mediumblob type field can store up to 16M, and the data of the service is far from reaching this magnitude. Following the error prompt, change the non-large fields appversion, datatar, elt_stamp and id to blob type, but it still cannot be solved. (Based on the previous analysis, it must not be the root cause of the problem).
After calming down, I found that there was actually a detail that was ignored. The failure rate of the business was not 100%, indicating that there were still successful requests. By comparing the successful and failed sql, I found that the difference in data volume was still the mediumblob type field. So now the first thought is, max_allowed_packet this parameter, is not adjusted small, yes, the single request exceeds the size is rejected, check the configured value (as shown below), the configured size of 1G, sql data length is far from so large, this reason is also ruled out.
This basically eliminates several common problems, and then look at the limit of another parameter: innodb_page_size, the default value of this is 16K, two lines of data per page, so the maximum of 8K data per line.
Looking at the following data table Row_format is Compact, then we can infer that the cause of the problem should be innodb default approach storage format will store the first 864 bytes of each blob field in the page, so if the blob exceeds a certain number, the single row size will exceed 8k, so it is wrong. By comparing the successful and failed SQL of business writing, this inference is also applied, so how to solve this problem now?
Business split table, large fields are stored separately
Resolve the problem by resolving how Row_format is stored
Since the number of storage rows in the business single table is not large, and the business logic is not suitable for splitting, we need to solve this problem on Row_format.
Barracuda file format has two new line record format Compressed and Dynamic two, the new two formats for storing BLOB data using a complete line overflow method, in the data page only stored 20 bytes pointer, the actual data are stored in BLOB Page. Another feature of the Compressed row record format is that the data stored in it is compressed using the zlib algorithm.
The related changes are relatively simple:
1. Modify MySQL global variables:
SET GLOBAL innodb_file_format='Barracuda';
2. Smoothly change the attributes of the original table:
ROW_FORMAT=COMPRESSED
IV. Continue to learn
From this case, we can extract two points worth studying in depth:
Innodb_page_size
Starting with MySQL 5.6, innodb_page_size can be set to 8K,4K, default to 16K. This parameter should be added to my.cnf at the beginning of initialization. If the table has been created and modified, MySQL will report an error.
So what if I change this value before version 5.6? That can only be done in the source code article, and then rebuild MySQL.
UNIV_PAGE_SIZE is the data page size, the default is 16K, this value can be set to the power of 2. For this value can be set to 4k, 8k, 16k, 32K, 64K. At the same time, after changing UNIV_PAGE_SIZE, you need to change UNIV_PAGE_SIZE_SHIFT. How many powers of 2 is UNIV_PAGE_SIZE? Therefore, the data pages are set as follows:
Next, let's talk about the impact of setting innodb_page_size to different values on mysql performance. The table under test contains 100 million records and the file size is 30G.
① Reading and writing scenes (50% reading and 50% writing)
16K, less pressure on CPU, average at 20%
8K, CPU pressure is 30%~40%, but select throughput is higher than 16K
Reading the scene (100%)
There is no significant difference between 16K and 8K.
The InnoDB Buffer Pool management page itself also has a cost. The more pages, the longer the management list will be for the same size. So when our rows are long (chunky inserts), larger pages are better for speed, because a page can fit more rows, each IO write size is larger, and more data can be written with fewer IOPS. When the line length exceeds 8K, if it is a 16K page, it will force some string types to TEXT, transfer the string body to the extended page, which will cause one more IO to read the column, and the larger page will support the larger line length. 64K pages can support approximately 32K lines without using extended pages. However, if it is a random read and write of short lines, it is not suitable to use such a large page, which will lead to IO efficiency degradation, and large IO can only read a small part.
2. About Row_format
The Innodb storage engine stores records in rows. Prior to InnoDB version 1.0.x, the InnoDB storage engine provided both Compact and Redundant formats for storing row record data. MySQL 5.1 introduces a new file format, Barracuda, with two new line formats, compressed and dynamic. Compact and redundancy are called Antelope. You can view the row format used by the current table with the command SHOW TABLE STATUS LIKE 'table_name';, where the row_format column indicates the row record structure type currently in use.
In MySQL 5.6, the default is Compact, msyql 5.7.9 and later, the default row format is determined by innodb_default_row_format variable, the default value is DYNAMIC, you can also specify ROW_FORMAT=DYNAMIC when creating table (by which you can dynamically adjust the storage format of the table). If you want to change the row mode of an existing table to compressed or dynamic, you must first set the file format to Barracuda (set global innodb_file_format=Barracuda;). Then use ALTER TABLE tablename ROW_FORMAT=COMPRESSED; to modify to take effect, otherwise the modification is invalid but there is no prompt.
①compact
If the blob column length is 768 bytes, then the first 768 bytes are still in the data page, and the rest are placed on the off-page, as shown below:
Blob or variable-length large field types mentioned above include blob, text, varchar, where the varchar column value length is greater than a certain number N, there will also be overflow pages. Under the latin1 character set, the N value can be calculated as follows: The default block size of innodb is 16kb. Since innodb storage engine table is an index organization table and the leaf node at the bottom of the tree is a doubly linked list, there should be at least two rows of records in each page. This determines that innodb cannot store more than 8k rows of data, minus the number of bytes occupied by other column values, which is about N.
compressed or dynamic
Full row overflow is used for blobs, i.e., the aggregated index record (data page) retains only a 20-byte pointer to the overflow segment address where it is actually stored:
dynamic row format, column storage into the off-page, mainly depends on the row size, it will put the longest column in the row into the off-page, until the data page can hold the next two rows. TEXT/BLOB column
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.