Introduction of mysql 100 million-level large Table Reconstruction Scheme 07/06 Update SLTechnology News&Howtos

Introduction of mysql 100 million-level large Table Reconstruction Scheme

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction of mysql 100 million-level large Table Reconstruction Scheme

Author: sylar copyright [articles are allowed to be reprinted, but the source address must be indicated by link, otherwise legal liability will be investigated.]

[summary of the situation]

The blogger who shares this article mainly divides the hundreds of millions of large tables in the mysql production environment into several small tables and migrates them according to certain rules, implementation methods, matters needing attention, and so on.

[background description]

Table 580 million of the production environment favourite, which is as follows:

Table name

Table structure

Rows

Database version

Favourite

CREATE TABLE `favourite` (

`id`int (11) NOT NULL AUTO_INCREMENT

`user_ id` int (11) NOT NULL

`target_ type` int (11) NOT NULL

`target_ id` int (11) NOT NULL

`created_ at` datetime NOT NULL

`status` smallint (6) NOT NULL DEFAULT'0'

PRIMARY KEY (`id`)

UNIQUE KEY `uniq_user_ target` (`user_ id`, `target_ type`, `target_ id`)

KEY `idx_ targetid` (`target_ id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

587312519

5.7.12

Due to the slow magnitude of the table, the following sql can no longer be optimized by adjusting the index or sql:

SQL

Time

SELECT count (1) AS count_1 FROM `favourite`WHERE `favourite`.target _ id = 636 AND `favourite`.target _ type = 1

4.7S

SELECT `favourite`.target _ id AS `favourite`FROM `favourite`WHERE `favourite`.user _ id = 338072 AND `favourite`.target _ type = 0 AND `favourite`.status = 0 ORDER BY `favourite`.id DESC

2.25S

DELETE FROM favourite WHERE user_id = 17327373 AND target_id = 917 AND target_type = 1

0.9S

In order to make the business response faster, we decided to split the favourite table. After business communication, user_id is frequently used, so it is split through user_id, and the split rules are planned to table 1024 according to user_id24. The mapping relationship is as follows:

User_id24 = 0 = > favourite_0000

User_id24 = 1 = > favourite_0001

User_id24 = 2 = > favourite_0002

……

User_id24 = 1023 = > favourite_1023

Note:

1) the split must be decided according to the business situation, and cannot be generalized!

[train of thought]

1. Configure canal. Canal is Ali's open source software for obtaining binlog information. From the first step to the end, canal keeps getting binlog information.

2. Export favourite to files corresponding to 1024 tables on the database that does not affect the business (slave library used here).

3. Import the exported backup file into the production environment

4. Import the data obtained by canal into 1024 sub-tables (from the end to the end)

5. When the data of the sub-table is similar to that of the original large table, when the business is not busy, cut the favourite business read operation.

6. Cut production favourite write operation

7. When no new records are generated in canal, the whole service switching is completed.

8. End

Note:

1) use canal to obtain binlog information. Note that the parameter is set to

Binlog_format=row

Binlog_row_image=full (default is FULL, in case some instances are set to minimal)

Binlog_row_image=minimal, where the main library does delete from a where target_id =, which is also recorded in binlog, and if target_id is the only index, delete from a where id is recorded in binlog (id is the primary key). Similarly, the production of delete favourite is based on `user_ id`, `target_ type` and `target_ id`. According to the aforementioned recording of binlog, delete from favourite where id,canal needs to record changes when the business is switched, but the deleted id is obtained, and the id of the new sub-table will not be the same as the old favouriteid (because the id of the old favourite has reached more than 800 million, and it is a waste for the new sub-table to use the old id). Unable to synchronize the new table.

2) there will be some data errors in the switching time when the service is cut first and then cut and write. For example, when users switch between read operations and insert a piece of data into the favourite table, but the write operation is not cut over, and the data falls into the old favourite table, the user will find that the exception affects the user's experience. This situation is not resolved until canal synchronizes all the information to the new sub-table. If your business cannot accept this situation, you can adjust it according to the actual situation, and you may not necessarily switch services in this way.

[solution description]

According to the above ideas, two problems need to be solved.

1) how to configure canal. Canal is an open source product launched by Ali to obtain binlog. Our canal call is written by java engineers. Bloggers do not understand java, so omit the canal configuration information.

2) how to export favourite to 1024 subtables, and then pour it into the specified library.

The following mainly describes how to implement problem 2, which can be done in two ways:

Items

Option one

Option 2

Means of realization

Mysqldump

Mycat

Split takes time

4.5Hour

2Hour

Preparation time

3Hour, you need to add a functional index

/ data/favourite_256.sqlmysqldump... -- where= "mod (user_id,1024) > = 256and mod (user_id,1024) = 512and mod (user_id,1024)

< 768 " >

/ data/favourite_768.sqlmysqldump... -- where= "mod (user_id,1024) > = 768" > / data/favourite_1024.sql

Note:

1) Please back up the database that is not accessed from the library or business

2) the parameters set above should be adjusted according to the actual situation. Be sure to add the parameter-c-- skip-add-locks, otherwise importing mycat will cause an exception.

Step3. Import backup files into mycat

Filling the four backup files exported in step 2 into mycat at the same time took less than 90 minutes.

Note:

1) Please perform this operation in the background, which can be achieved with the screen command. Because mycat is a persistent connection, the following statements may continue to be executed even if they are broken halfway. In case this happens, please put them in the background to execute.

Step4. Initialize id in the mycat corresponding library

It takes 30 minutes here

1) initialize id on the 1024 sub-library, and remove the virtual column. The specific script is as follows

5. Import the processed file into the target library

It takes 30 minutes here

Import the processed backup files into the new production database and import them in parallel. The parallelism should not exceed 3, because the insert after dump export is about 30,000 at a time, and the parallelism is too high. The machine IO will not stop hold. The script is as follows:

Click (here) to collapse or open

Function instance ()

{

Log

Echo "- mysql instances with port number" $port "start to pour concurrently according to the table: the start time is" `date "+% Y-%m-%d% H:%M:%S" `.

# call execution function

DumpAllTable

}

# output the single table to be backed up to the log from large to small

Function log ()

{

BACKUP_DIR= "/ data/backup/"

# filter out the DB that comes with MySQL

If [- e ${BACKUP_DIR} / cyt.log]

Then rm-rf ${BACKUP_DIR} / cyt.log

For filename in `du-sk ${BACKUP_DIR} / * .sql | sort-nr | awk'{print $2}'`

A = "kk_favourite"

Echo "mysql-u$ {DB_USER}-p$ {DB_PASSWORD}-- socket=$ {socket}-- host=$ {host}-A ${a}-- tee=/data/pat.log-e\" source ${filename}\ "> $BACKUP_DIR/cyt.log

Done

}

# call the function log, view the log log, call concurrent functions to achieve multithreaded backup

Function dumpAllTable ()

{

Local schemaFile= "${BACKUP_DIR} / cyt.log"

# the largest table backup first (due to the concurrency of multiple processes, the minimum completion time depends on the completion of the maximum table)

AllTable= `cat $schemaFile | wc-l`

I_import=0

Declare-an array_cmds

I_array=0

While read file; do

ISupplyimport= `expr $I + 1`

Array_ cmds [I _ array] = "${file}"

Itemarray`expr ${i_array} + 1`

Done & 6;} & # > / dev/null 2 > & 1 &

Shift

Cmd=$2

Done

# wait for all backstage subprocesses to finish

Wait

# turn off df6

Exec 6 > &-

}

# main function

Function main ()

{

# obtain the local IP address

Host= ""

Port=

DATE= `date +% F`

Socket= ""

DB_USER=''

# password of database user

DB_PASSWORD=''

# start time of recording

BEGIN= `date "+% Y-%m-%d% H:%M:%S" `

BEGIN_T= `date-d "$BEGIN" +% s`

Echo'- start concurrent backups by table: start time is'$BEGIN

# set the number of threads for concurrent backups

ThreadsNum=8

# call instance function

Instance

Echo'- backup all database fully loaded! End time: '`date "+% Y-%m-%d% H:%M:%S" `

}

Main

[section]

Because the speed of scheme 2 is at least 2 hours less than that of scheme 1, and no other processing is needed after importing the target library, scheme 2 is used in the migration of production environment.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.