Case-percona-online-schema-change various pits 05/12 Update SLTechnology News&Howtos

Case-percona-online-schema-change various pits

2025-05-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Online environment replication uses ROW mode. For hundreds of millions of tables, using pt online schema change to copy data from old tables to temporary tables will generate a lot of binlog, which will lead to master-slave delay.

Pt-online-schema-change will not print binlog before pt toolkit 2.1. if you want to index on the master and slave, you need to execute it once in the master library and once in the slave library.

It provides a-- log-bin parameter, and the default is to turn off binlog

-- bin-log

Allow binary logging (SET SQL_LOG_BIN=1). By default binary logging is turned off because in most cases the-- tmp-tabledoes not need to be replicated.

After version 2.2 of the pt tool, binlog will be used by default. The advantage is that instead of performing a table change operation on each node, only one table change operation needs to be performed in the master database, and the tables of the following slave library will be modified through binlog.

Pt tool version 3.0, there is a-- set-vars='sql_log_bin=0' parameter can replace-- bin-log=0 effect.

There is a need for indexing of 150 million tables. It is estimated that there will be 20 GB of binlog generated by 150 million. In order not to generate binlog, we are prepared to execute it once at each point. Execute the following command in the main library first.

Pt-online-schema-change-- host= host-- port= port number-- user= node number-- database= database name _ room_impeach-- alter= "ADD INDEX idx_psr (A Set-vars='sql_log_bin=0' C)-- set-vars='sql_log_bin=0'-- execute

As soon as this statement goes down, the synchronization of the four slave libraries under the master database is interrupted, and the show slave status reports an error.

Last_SQL_Errno: 1146Last_SQL_Error: Error executing row event: 'Table' live_oss._t_room_impeach_new' doesn't exist'

The error _ t_room_impeach_new table exists, why does this temporary table exist in the slave library?

The principle of the posc tool is to first create a temporary table whose name is _ original table _ new. This temporary table has already added the index you want, and keep copying the data from the old table to this temporary table. The data from the old table that is newly inserted, modified and deleted will all be inserted, modified and deleted to the temporary table according to the trigger. The old table and the temporary table are identical. At this time, the temporary table rename will be the name of the table, and the actual old watch will be dropped by drop and completed online.

When the main library executes the command, it is displayed to create a temporary table and a trigger.

Creating new table...Created new table live_oss._t_room_impeach_new OK.Altering new table...Altered `live_ oss`.` _ _ tweeds impeach`new` OK.2017-08-02T16:38:48 Creating triggers...2017-08-02T16:38:48 Created triggers OK.2017-08-02T16:38:48 Copying approximately 141559863 rows...

Because of the reason of-- set-vars='sql_log_bin=0', the DDL statement that creates the table cannot build the table from the slave database through binlog, so the slave database does not exist. The problem is that there is no need for temporary tables in the slave database, because only one point in the master database is sufficient.

This is the first pit of posc. The master database will generate binlog due to the data generated by the trigger, thus synchronizing to the slave library. When the slave library wants to execute these data, it is found that the table does not exist, resulting in synchronization interruption.

At this time, the solution is to create the same temporary table _ xxxx_new from the library, so that the data from the trigger can be smoothly inserted into this table. When it is built, you can see that the temporary table from the library has data, and verify that sql_log_bin=0 has no effect again.

Several slave libraries all have 176 pieces of data, and if you look at the temporary table of the main database, there is almost 100 million data, because in addition to triggers, there are also data from old tables.

There was a worry at that time.

Main library temporary table _ _ t_room_impeach_new data = trigger generation data + old table generation data

From the library temporary table _ _ t_room_impeach_new data = data generated by triggers

If all the points perform the last step to operate the rename temporary table _ _ t_room_impeach_new to t_room_impeach formal table, isn't it because the master-slave data is inconsistent and there is a lot less data from the slave database?

However, according to reason, this will not happen, because-- set-vars='sql_log_bin=0' will block the DDL statement rename like create table, which will not lead to the success of changing tables from the library.

In order not to take the risk, I intend to execute it again, this time by adding 2 parameters

-- no-drop-old-table, even after executing the command, do not drop the table. Let me confirm that the old table and the new table are consistent and then manually drop

-- no-drop-triggers triggers are also reserved

Before executing the command, delete the temporary tables and triggers manually, as prompted

Not dropping triggers because the tool was interrupted. To drop the triggers, execute:DROP TRIGGER IF EXISTS `live_ oss`.`pt _ osc_live_oss_t_room_impeach_ del`; DROP TRIGGER IF EXISTS `live_ oss`.`pt _ osc_live_oss_t_room_impeach_ upd`; DROP TRIGGER IF EXISTS `live_ oss`.`pt _ osc_live_oss_t_room_impeach_ ins`; Not dropping the new table `live_ oss`.` _ tweeted impeachnew` because the tool was interrupted. To drop the new table, execute:DROP TABLE IF EXISTS `live_ oss`.` _ tweeds impeach` new`; `live_ oss`.`t _ room_ impeach` was not altered.

In addition, the temporary table was first set up in the slave database, and in the middle of this execution, four slave libraries reported errors again, and synchronization was interrupted.

Last_SQL_Errno: 1032

Last_SQL_Error: Could not execute Update_rows event on table live_oss._t_room_impeach_new; Can't find record in'_ tweeds impeach news, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.056637, end_log_pos 41767716

The error report this time is that the update statement failed. The update statement of the row mode is the new value of set. If the old value of where cannot be found on the temporary table of the slave database, it will not be able to update into the new value. Also because-- set-vars='sql_log_bin=0', causes the temporary table of the slave database to have much less data than the temporary table of the master database, so it is very likely that a update statement will be interrupted because the data cannot be found.

In addition, if you use the-- no-drop-old-table and-- no-drop-triggers parameters, the end result is that the command stays stuck until 99%, which is also a pit.

Copying `live_ oss`.`t _ room_ impeach`: 99% 01:01 remain

Copying `live_ oss`.`t _ room_ impeach`: 99% 00:47 remain

Copying `live_ oss`.`t _ room_ impeach`: 99% 00:35 remain

Copying `live_ oss`.`t _ room_ impeach`: 99% 00:21 remain

Copying `live_ oss`.`t _ room_ impeach`: 99% 00:09 remain

It did not follow the documentation, renamed it first, and then did not drop the old table

When the tool finishes copying data into the new table, it uses an atomic RENAME TABLE operation to simultaneously rename the original and new tables. After this is complete, the tool drops the original table.

There is another trap, that is, two business feedback lasts for a few minutes as follows: a temporary table does not exist, but this temporary table should be transparent to the business, and the business will not be aware of it, but still report such an error.

Base table or view not found: 1146 Table 'live_oss.__t_room_impeach_new' doesn't existthread run: Table' live_oss.__t_room_impeach_new' doesn't exist

Conclusion:-- set-vars='sql_log_bin=0' can be used to reduce binlog generation when single-point mysql or only modify slave database table structure, but do not use this parameter in the process of master-slave replication. The greatest damage to master-slave replication is delay. We can use-- max-lag this parameter to limit the delay time, such as 30 seconds, 15 seconds, the cost is that the table change time will be very long, but it is much better than interrupting master-slave replication.

The problem of binlog generated by triggers is not mentioned in pt2.1,2.2,3.0 documents, but it is mentioned in pt2.0 documents.

Replicaiton will break if you alter a table on a master that does not exist on a slave

In the process of modification, there are tables in the master library and no tables in the slave library, which will cause master-slave replication to be interrupted. This is the problem encountered above.

Although the tool sets SQL_BIN_LOG=0 by default (unless-- bin-log is specified), triggers which track changes to the table being altered still write statements to the binary log

Despite the use of SQL_BIN_LOG=0, the data changes generated by triggers will still be written into binlog in the schema of statements, and it is estimated that the-- set-vars='sql_log_bin=0' parameter of pt3.0 is the same WTF!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.