What is the reason why Text types are not used in MySQL 07/19 Update SLTechnology News&Howtos

What is the reason why Text types are not used in MySQL

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the reason why the Text type is not used in MySQL? I believe many inexperienced people are at a loss about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

As we all know, MySQL is widely used in the OLTP (online transaction process) business system of the Internet. In the big factory development specification, we often see a "do not recommend using text large field type".

The following explains why the use of the text type is not recommended from the storage structure of the text type and the problems caused by it, as well as the recommended methods for Text modification.

Background

Writing log tables causes DML to be slow

Problem description

Some crooked has a business system, which uses the highly available version of RDS for MySQL 5.7. configure long_query_time=1s and add slow query alarms. My first reaction is that some crooked and messy points are found.

Through monitoring, I see that CPU, QPS,TPS and other indicators are not very high. Recently, all stations have been doing marketing activities, and the number of users has increased slightly. Some of the interfaces that are not slow become very slow, affecting the normal business, so you need to do a troubleshooting.

Analysis of problems

From the slow query alarm, I can see that some insert and update statements are relatively slow. At the same time, during the monitoring of the alarm period, it is found that the IOPS is very high, reaching about 70MB/s. Because the CloundDBA feature of RDS is not available, and there is no audit log feature, troubleshooting is more difficult, so we have to analyze binlog.

Max_binlog_size = 512MB is configured. Take a look at the generation of binlog when the IOPS is high.

Need to analyze why binlog writes so fast, the most likely reason is that there is a text type on the insert into request_ log table. The structure of the request_ log table is as follows (demo)

CREATE TABLE request_log (``id bigint (20) NOT NULL AUTO_INCREMENT, ``log text, ``created_at datetime NOT NULL, ``status tinyint (4) NOT NULL, ``method varchar (10) DEFAULT NULL, ``url varchar (50) DEFAULT NULL, ``update_at datetime DEFAULT NULL, ``running_time tinyint (4) DEFAULT '0mm, ``user_id bigint (20) DEFAULT NULL, ``type varchar (50) DEFAULT NULL, ``PRIMARY KEY (id) `) ENGINE=InnoDB AUTO_INCREMENT=4229611 DEFAULT CHARSET= utf8`

Analyze binlog:

$mysqlbinlog-no-defaults-v-v-base64-output=DECODE-ROWS mysql-bin.000539 | egrep "insert into request_log"

The screen is full of unreadable content, but I haven't finished it for a long time.

Basically, it has been determined that it is caused by writing to the log field of request_log, resulting in frequent flush of binlog_cache and excessive switching of binlog, resulting in IOPS being too high and affecting other normal DML operations.

Problem solving

After communicating with the developer, it is planned to fix this problem in the next version. Instead of writing request information to the table, write it to the local log file, and extract it to es through filebeat for query. If it is just to view the log, you can also access log tools such as grayLog, and there is no need to write to the database.

At the end of the article, I will also introduce several MySQL I have stepped on the Text-related pit, before I introduce the pit, I will first introduce the MySQLText type.

Text in MySQL

Text Typ

Text is a large object that can store a large amount of data. There are four types: TINYTEXT, TEXT, and MEDIUMTEXT,LONGTEXT. Different types store different ranges of values, as shown below.

Data TypeStorage RequiredTINYTEXTL + 1 bytes, where L

< 2**8TEXTL + 2 bytes, where L < 2**16MEDIUMTEXTL + 3 bytes, where L < 2**24LONGTEXTL + 4 bytes, where L < 2**32 其中L表是text类型中存储的实际长度的字节数。可以计算出TEXT类型最大存储长度2**16-1 = 65535 Bytes。 InnoDB数据页 Innodb数据页由以下7个部分组成：内容占用大小说明File Header38Bytes数据文件头Page Header56 Bytes数据页头Infimun 和 Supermum Records伪记录User Records用户数据Free Space空闲空间：内部是链表结构，记录被delete后，会加入到free_lru链表Page Dictionary页数据字典：存储记录的相对位置记录，也称为Slot，内部是一个稀疏目录File Trailer8Bytes文件尾部：为了检测页是否已经完整个的写入磁盘说明：File Trailer只有一个FiL_Page_end_lsn部分，占用8字节，前4字节代表该页的checksum值，最后4字节和File Header中的FIL_PAGE_LSN，一个页是否发生了Corrupt，是通过File Trailer部分进行检测，而该部分的检测会有一定的开销，用户可以通过参数innodb_checksums开启或关闭这个页完整性的检测。从MySQL 5.6开始默认的表存储引擎是InnoDB，它是面向ROW存储的，每个page(default page size = 16KB)，存储的行记录也是有规定的，最多允许存储16K/2 - 200 = 7992行。 InnoDB的行格式 Innodb支持四种行格式：行格式Compact存储特性增强的变长列存储支持大前缀索引支持压缩支持表空间类型REDUNDANTNoNoNoNosystem, file-per-table, generalCOMPACTYesNoNoNosystem, file-per-table, generalDYNAMICYesYesYesNosystem, file-per-table, generalCOMPRESSEDYesYesYesYesfile-per-table, general 由于Dynamic是Compact变异而来，结构大同而已，现在默认都是Dynamic格式;COMPRESSED主要是对表和索引数据进行压缩，一般适用于使用率低的归档，备份类的需求，主要介绍下REDUNDANT和COMPACT行格式。 Redundant行格式这种格式为了兼容旧版本MySQL。行记录格式： Variable-length offset listrecord_headercol1_valuecol2_value…….text_value字段长度偏移列表记录头信息，占48字节列1数据列2数据…….Text列指针数据字段长度偏移列表记录头信息，占48字节列1数据列2数据 ……. Text列指针数据具有以下特点：存储变长列的前768 Bytes在索引记录中，剩余的存储在overflow page中，对于固定长度且超过768 Bytes会被当做变长字段存储在off-page中。索引页中的每条记录包含一个6 Bytes的头部，用于链接记录用于行锁。聚簇索引的记录包含用户定义的所有列。另外还有一个6字节的事务ID(DB_TRX_ID)和一个7字节长度的回滚段指针(Roll pointer)列。如果创建表没有显示指定主键，每个聚簇索引行还包括一个6字节的行ID(row ID)字段。每个二级索引记录包含了所有定义的主键索引列。一条记录包含一个指针来指向这条记录的每个列，如果一条记录的列的总长度小于128字节，这个指针占用1个字节，否则2个字节。这个指针数组称为记录目录(record directory)。指针指向的区域是这条记录的数据部分。固定长度的字符字段比如CHAR(10)通过固定长度的格式存储，尾部填充空格。固定长度字段长度大于或者等于768字节将被编码成变长的字段，存储在off-page中。一个SQL的NULL值存储一个字节或者两个字节在记录目录(record dirictoty)。对于变长字段null值在数据区域占0个字节。对于固定长度的字段，依然存储固定长度在数据部分，为null值保留固定长度空间允许列从null值更新为非空值而不会引起索引的分裂。对varchar类型，Redundant行记录格式同样不占用任何存储空间，而CHAR类型的NULL值需要占用空间。其中变长类型是通过长度 + 数据的方式存储，不同类型长度是从1到4个字节(L+1 到 L + 4)，对于TEXT类型的值需要L Bytes存储value，同时需要2个字节存储value的长度。同时Innodb最大行长度规定为65535 Bytes，对于Text类型，只保存9到12字节的指针，数据单独存在overflow page中。 Compact行格式这种行格式比redundant格式减少了存储空间作为代价，但是会增加某些操作的CPU开销。如果系统workload是受缓存命中率和磁盘速度限制，compact行格式可能更快。如果你的工作负载受CPU速度限制，compact行格式可能更慢，Compact 行格式被所有file format所支持。行记录格式： Variable-length field length listNULL标志位record_headercol1_valuecol2_value…….text_value变长字段长度列表记录头信息-列1数据列2数据…….Text列指针数据 Compact首部是一个非NULL变长字段长度的列表，并且是按列的顺序逆序放置的，若列的长度小于255字节，用1字节表示;若大于255个字节，用2字节表示。变长字段最大不可以超过2字节，这是因为MySQL数据库中varchar类型最大长度限制为65535，变长字段之后的第二个部分是NULL标志位，表示该行数据是否有NULL值。有则用1表示，该部分所占的字节应该为1字节。所以在创建表的时候，尽量使用NOT NULL DEFAULT ''，如果表中列存储大量的NULL值，一方面占用空间，另一个方面影响索引列的稳定性。具有以下特点：索引的每条记录包含一个5个字节的头部，头部前面可以有一个可变长度的头部。这个头部用来将相关连的记录链接在一起，也用于行锁。记录头部的变长部分包含了一个表示null 值的位向量(bit vector)。如果索引中可以为null的字段数量为N，这个位向量包含 N/8 向上取整的字节数。比例如果有9-16个字段可以为NULL值，这个位向量使用两个字节。为NULL的列不占用空间，只占用这个位向量中的位。头部的变长部分还包含了变长字段的长度。每个长度占用一个或者2个字节，这取决了字段的最大长度。如果所有列都可以为null 并且制定了固定长度，记录头部就没有变长部分。对每个不为NULL的变长字段，记录头包含了一个字节或者两个字节的字段长度。只有当字段存储在外部的溢出区域或者字段最大长度超过255字节并且实际长度超过127个字节的时候会使用2个字节的记录头部。对应外部存储的字段，两个字节的长度指明内部存储部分的长度加上指向外部存储部分的20个字节的指针。内部部分是768字节，因此这个长度值为 768+20， 20个字节的指针存储了这个字段的真实长度。 NULL不占该部分任何空间，即NULL除了占用NULL标志位，实际存储不占任何空间。记录头部跟着非空字段的数据部分。聚簇索引的记录包含了所以用户定于的字段。另外还有一个6字节的事务ID列和一个7字节的回滚段指针。如果没有定于主键索引，则聚簇索引还包括一个6字节的Row ID列。每个辅助索引记录包含为群集索引键定义的不在辅助索引中的所有主键列。如果任何一个主键列是可变长度的，那么每个辅助索引的记录头都有一个可变长度的部分来记录它们的长度，即使辅助索引是在固定长度的列上定义的。固定长度的字符字段比如CHAR(10)通过固定长度的格式存储，尾部填充空格。对于变长的字符集，比如uft8mb3和utf8mb4， InnoDB试图用N字节来存储 CHAR(N)。如果CHAR(N)列的值的长度超过N字节，列后面的空格减少到最小值。CHAR(N)列值的最大长度是最大字符编码数 x N。比如utf8mb4字符集的最长编码为4，则列的最长字节数是 4*N。 Text类型引发的问题插入text字段导致报错创建测试表 [root@barret] [test]>

Create table user (id bigint not null primary key auto_increment,-> name varchar (20) not null default''comment' name',-> age tinyint not null default 0 comment 'age',-> gender char (1) not null default' M' comment 'gender',-> info text not null comment 'user Information',-> create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'creation time' -> update_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'modification time'->) Query OK, 0 rows affected (0.04 sec)

Insert test data

Root@barret] [test] > insert into user (name,age,gender,info) values ('moon', 34,' root@barret, repeat); ERROR 1406 (22001): Data too long for column 'info' at row 1 [root@barret] [test] > insert into user (name,age,gender,info) values (' sky', 35, 'info' at row, repeat) ERROR 1301 (HY000): Result of repeat () was larger than max_allowed_packet (4194304)-truncated

Error analysis

[root@barret] [test] > select @ @ max_allowed_packet; +-- + | @ @ max_allowed_packet | +-+ | 4194304 | +-+ 1 row in set (0.00 sec)

Max_allowed_packet controls the maximum size of communication buffer. When the packet size exceeds this value, an error will be reported. As we all know, MySQL includes Server layer and storage engine, which follow 2PC protocol. Server layer mainly deals with user requests: connection request-> SQL parsing-> semantic check-> generate execution plan-> execution plan-> fetch data; storage engine layer mainly stores data and provides data read and write interface.

Max_allowed_packet=4M, when the first insert repeat, packet Server executes SQL to send packets to the InnoDB layer, checks that the packet size does not exceed the limit of 4m. When InnoDB writes data, it is found that exceeding the limit of Text leads to an error. The packet size of the second insert exceeds the limit of 4m _.

Reference the description of the parameter in the AWS RDS parameter group

Increasing the size of this parameter can alleviate the error report, but can not completely solve the problem.

RDS instance is locked

Background description

The company does some marketing activities every month. There is a service apush activity push, which is separately deployed in the high availability version of RDS for MySQL 5.7. the configuration is 4C8G 150G disk, and there are only four tables in the database. When leaving work at 22:00 in the evening, the rds instance data used 50 GB of space, and a nail alarm message was received on the subway at 9:30 the next morning, indicating that the push service rds instance was locked with-read-only, and the developer also gave feedback. The application log reported a bunch of MySQL error.

Analysis of problems

Log in to the database through DMS, take a look at which table is the largest, and find that there is a table push_log that occupies 100 gigabytes. Look at the structure of the following table, which has two text fields.

Request text default''comment' request information', response text default''comment' response information 'mysql > show table status like' push_log'

It is found that the Avg_row_length is about 150KB, Rows = 78w, and the size of the table is about 780000*150KB/1024/1024 = 111.5G.

Update is also slow through the primary key.

Insert into user (name,age,gender,info) values ('thooo', 35,' repeat, repeat); insert into user (name,age,gender,info) values ('thooo11', 35,' repeat); insert into user (name,age,gender,info) select name,age,gender,info from user Query OK, 6144 rows affected (5.62 sec) Records: 6144 Duplicates: 0 Warnings: 0 [root@barret] [test] > select count (*) from user; +-+ | count (*) | +-+ | 24576 | +-+ 1 row in set (0.05 sec)

Do update operation and track.

Mysql > set profiling = 1; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql > update user set info = repeat where id = 11; Query OK, 1 row affected (0.28 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql > show profiles +-- + | Query_ID | Duration | Query | +-- -+-+-- + | 1 | 0.27874125 | update user set info = repeat ('f') 0) where id = 11 | +-- + 1 row in set, 1 warning (65535 sec) mysql > show profile cpu,block io for query 1 +-- +-+ | Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops _ out | +-+-+ | starting | 0.000124 | 0.000088 | 0.000035 | 0 | 0 | checking permissions | 0.000021 | 0.000014 | 0.000006 | 0 | 0 | Opening tables | 0.000038 | 0.000026 | 0.000011 | 0 | init | 0.000067 | 0.000049 | 0.000020 | 0 | 0 | System lock | | 0.000076 | 0.000054 | 0.000021 | 0 | 0 | updating | 0.244906 | 0.000000 | 0.015382 | 0 | 16392 | end | 0.000036 | 0.000000 | 0.000034 | 0 | | query end | 0.033040 | 0 | . 000000 | 0.000393 | 0.000393 | closing tables | 0.000046 | 0.000000 | 0.000043 | 0 | 0 | freeing items | 0.000298 | 0.000000 | 0.000053 | 0 | 0 | cleaning up | 0.000092 | 0.000000 | 0.000092 | 0 | 0 | +-- +-+ 11 rows in set 1 warning (0.00 sec)

You can see that the main time-consuming step is updating, where the number of IO outputs is 16392, and update via id on concurrent tables becomes very slow.

Group_concat will also cause an error in the query.

In business development, there is often a need like this, which can be specified according to the name of a health insurance unit in each province, which is usually implemented as follows:

Select group_concat (dru_name) from t_drugstore group by province

The built-in group_concat returns an aggregated string, and the maximum length is determined by the parameter group_concat_max_len (Maximum allowed result length in bytes for the GROUP_CONCAT ()). The default is 1024, which is generally too short. Developers are required to make it a little longer, such as 1024000.

It is important to note that when the size of the result set returned by group_concat exceeds the max_allowed_packet limit, the program will report an error.

MySQL built-in log table

For log tables mysql.general_log and mysql.slow_log in MySQL, if the audit audit function is enabled and log_output=TABLE is enabled, there will be a mysql.audit_log table, which is more or less the same as mysql.general_log.

Take a look at their watch structure separately.

CREATE TABLE `general_ timestamp (6) NOT NULL DEFAULT CURRENT_TIMESTAMP (6) ON UPDATE CURRENT_TIMESTAMP (6), `user_ host` mediumtext NOT NULL, `thread_ id` bigint (21) unsigned NOT NULL, `server_ id` int (10) unsigned NOT NULL, `command_ type` varchar (64) NOT NULL, `argument` mediumblob NOT NULL) ENGINE=CSV DEFAULT CHARSET=utf8 COMMENT='General log'CREATE TABLE `slow_ log` (`start_ time` timestamp (6) NOT NULL DEFAULT CURRENT_TIMESTAMP (6) ON UPDATE CURRENT_TIMESTAMP (6), `user_ host` mediumtext NOT NULL `query_ time`time (6) NOT NULL, `lock_ time`time (6) NOT NULL, `rows_ sent` int (11) NOT NULL, `rows_ examined`int (11) NOT NULL, `db` varchar (512) NOT NULL, `last_insert_ id`int (11) NOT NULL, `insert_ id`int (11) NOT NULL, `server_ id`int (10) unsigned NOT NULL, `sql_ text`mediumblob NOT NULL, `thread_ id` bigint (21) unsigned NOT NULL) ENGINE=CSV DEFAULT CHARSET=utf8 COMMENT='Slow log'

Mysql.general_log records all SQL processed by MySQL Server, including backend and user's, insert is relatively frequent, and argument mediumblob NOT NULL has an impact on MySQL Server performance. Generally, in order to track and troubleshoot problems in dev environment, we can enable general_log,Production environment to disable general_log and enable audit_log. It does some filter on the basis of general_log. For example, I only need all SQL initiated by business account. This is very useful because it is often necessary to analyze which SQL has a higher QPS,TPS in a certain period of time.

Mysql.slow_log records all SQL that execute more than long_query_time. If you follow the MySQL development specification, there will not be too many slow query, but when log_queries_not_using_indexes=ON is enabled, many SQL of full table scan will be recorded, and the slow_log table will be very large. For RDS, generally only one day's data will be retained. In frequent insert into slow_log, doing truncate table slow_log to clean slow_log will lead to MDL and affect the stability of MySQL.

It is recommended to log_output=FILE, open slow_log,audit_log, so that slow_log,audit_log will be written to files, and these files will be processed by Go API to write the data to the distributed database clickhouse for statistical analysis.

Suggestions for Text transformation

Using es Stora

In MySQL, the general log table stores the data of request or response classes of text type, which is used to manually troubleshoot problems when the API call fails. You can consider writing to the local log file, extracting it into the es through filebeat, indexing by day, and cleaning up according to the data retention policy.

Use object Stora

Some business scenario tables use TEXT,BLOB type, and some stored image information, such as product images, are updated less frequently. You can consider using object storage, such as Aliyun's OSS,AWS S3, to achieve such requirements conveniently and efficiently.

Summary

Because MySQL is a single-process multithreaded model, a SQL statement cannot be executed by multiple cpu core, which determines that MySQL is more suitable for OLTP (characteristics: a large number of user access, logical reading, index scanning, return a small amount of data, SQL is simple) business system. At the same time, it is necessary to develop some modeling and development specifications for MySQL to avoid using Text types, which not only consumes a lot of network and IO bandwidth. At the same time, DML operations on this table will become very slow.

In addition, it is recommended to migrate complex statistical analysis SQL to real-time data warehouse OLAP. For example, clickhouse is widely used at present, and the Redshift of Liyun's ADB,AWS can be achieved to separate OLTP from OLAP business SQL to ensure the stability of the business system.

After reading the above, have you mastered the method of why the Text type is not used in MySQL? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.