In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Introduction of mysql 100 million-level large Table Reconstruction Scheme
Author: sylar copyright [articles are allowed to be reprinted, but the source address must be indicated by link, otherwise legal liability will be investigated.]
[summary of the situation]
The blogger who shares this article mainly divides the hundreds of millions of large tables in the mysql production environment into several small tables and migrates them according to certain rules, implementation methods, matters needing attention, and so on.
[background description]
Table 580 million of the production environment favourite, which is as follows:
Table name
Table structure
Rows
Database version
Favourite
CREATE TABLE `favourite` (
`id`int (11) NOT NULL AUTO_INCREMENT
`user_ id` int (11) NOT NULL
`target_ type` int (11) NOT NULL
`target_ id` int (11) NOT NULL
`created_ at` datetime NOT NULL
`status` smallint (6) NOT NULL DEFAULT'0'
PRIMARY KEY (`id`)
UNIQUE KEY `uniq_user_ target` (`user_ id`, `target_ type`, `target_ id`)
KEY `idx_ targetid` (`target_ id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
587312519
5.7.12
Due to the slow magnitude of the table, the following sql can no longer be optimized by adjusting the index or sql:
SQL
Time
SELECT count (1) AS count_1 FROM `favourite`WHERE `favourite`.target _ id = 636 AND `favourite`.target _ type = 1
4.7S
SELECT `favourite`.target _ id AS `favourite`FROM `favourite`WHERE `favourite`.user _ id = 338072 AND `favourite`.target _ type = 0 AND `favourite`.status = 0 ORDER BY `favourite`.id DESC
2.25S
DELETE FROM favourite WHERE user_id = 17327373 AND target_id = 917 AND target_type = 1
0.9S
In order to make the business response faster, we decided to split the favourite table. After business communication, user_id is frequently used, so it is split through user_id, and the split rules are planned to table 1024 according to user_id24. The mapping relationship is as follows:
User_id24 = 0 = > favourite_0000
User_id24 = 1 = > favourite_0001
User_id24 = 2 = > favourite_0002
……
User_id24 = 1023 = > favourite_1023
Note:
1) the split must be decided according to the business situation, and cannot be generalized!
[train of thought]
1. Configure canal. Canal is Ali's open source software for obtaining binlog information. From the first step to the end, canal keeps getting binlog information.
2. Export favourite to files corresponding to 1024 tables on the database that does not affect the business (slave library used here).
3. Import the exported backup file into the production environment
4. Import the data obtained by canal into 1024 sub-tables (from the end to the end)
5. When the data of the sub-table is similar to that of the original large table, when the business is not busy, cut the favourite business read operation.
6. Cut production favourite write operation
7. When no new records are generated in canal, the whole service switching is completed.
8. End
Note:
1) use canal to obtain binlog information. Note that the parameter is set to
Binlog_format=row
Binlog_row_image=full (default is FULL, in case some instances are set to minimal)
Binlog_row_image=minimal, where the main library does delete from a where target_id =, which is also recorded in binlog, and if target_id is the only index, delete from a where id is recorded in binlog (id is the primary key). Similarly, the production of delete favourite is based on `user_ id`, `target_ type` and `target_ id`. According to the aforementioned recording of binlog, delete from favourite where id,canal needs to record changes when the business is switched, but the deleted id is obtained, and the id of the new sub-table will not be the same as the old favouriteid (because the id of the old favourite has reached more than 800 million, and it is a waste for the new sub-table to use the old id). Unable to synchronize the new table.
2) there will be some data errors in the switching time when the service is cut first and then cut and write. For example, when users switch between read operations and insert a piece of data into the favourite table, but the write operation is not cut over, and the data falls into the old favourite table, the user will find that the exception affects the user's experience. This situation is not resolved until canal synchronizes all the information to the new sub-table. If your business cannot accept this situation, you can adjust it according to the actual situation, and you may not necessarily switch services in this way.
[solution description]
According to the above ideas, two problems need to be solved.
1) how to configure canal. Canal is an open source product launched by Ali to obtain binlog. Our canal call is written by java engineers. Bloggers do not understand java, so omit the canal configuration information.
2) how to export favourite to 1024 subtables, and then pour it into the specified library.
The following mainly describes how to implement problem 2, which can be done in two ways:
Items
Option one
Option 2
Means of realization
Mysqldump
Mycat
Split takes time
4.5Hour
2Hour
Preparation time
3Hour, you need to add a functional index
/ data/favourite_256.sqlmysqldump... -- where= "mod (user_id,1024) > = 256and mod (user_id,1024) = 512and mod (user_id,1024)
< 768 " >/ data/favourite_768.sqlmysqldump... -- where= "mod (user_id,1024) > = 768" > / data/favourite_1024.sql
Note:
1) Please back up the database that is not accessed from the library or business
2) the parameters set above should be adjusted according to the actual situation. Be sure to add the parameter-c-- skip-add-locks, otherwise importing mycat will cause an exception.
Step3. Import backup files into mycat
Filling the four backup files exported in step 2 into mycat at the same time took less than 90 minutes.
Note:
1) Please perform this operation in the background, which can be achieved with the screen command. Because mycat is a persistent connection, the following statements may continue to be executed even if they are broken halfway. In case this happens, please put them in the background to execute.
Step4. Initialize id in the mycat corresponding library
It takes 30 minutes here
1) initialize id on the 1024 sub-library, and remove the virtual column. The specific script is as follows
5. Import the processed file into the target library
It takes 30 minutes here
Import the processed backup files into the new production database and import them in parallel. The parallelism should not exceed 3, because the insert after dump export is about 30,000 at a time, and the parallelism is too high. The machine IO will not stop hold. The script is as follows:
Click (here) to collapse or open
Function instance ()
{
Log
Echo "- mysql instances with port number" $port "start to pour concurrently according to the table: the start time is" `date "+% Y-%m-%d% H:%M:%S" `.
# call execution function
DumpAllTable
}
# output the single table to be backed up to the log from large to small
Function log ()
{
BACKUP_DIR= "/ data/backup/"
# filter out the DB that comes with MySQL
If [- e ${BACKUP_DIR} / cyt.log]
Then rm-rf ${BACKUP_DIR} / cyt.log
Fi
For filename in `du-sk ${BACKUP_DIR} / * .sql | sort-nr | awk'{print $2}'`
Do
A = "kk_favourite"
Echo "mysql-u$ {DB_USER}-p$ {DB_PASSWORD}-- socket=$ {socket}-- host=$ {host}-A ${a}-- tee=/data/pat.log-e\" source ${filename}\ "> $BACKUP_DIR/cyt.log
Done
}
# call the function log, view the log log, call concurrent functions to achieve multithreaded backup
Function dumpAllTable ()
{
Local schemaFile= "${BACKUP_DIR} / cyt.log"
# the largest table backup first (due to the concurrency of multiple processes, the minimum completion time depends on the completion of the maximum table)
AllTable= `cat $schemaFile | wc-l`
I_import=0
Declare-an array_cmds
I_array=0
While read file; do
ISupplyimport= `expr $I + 1`
Array_ cmds [I _ array] = "${file}"
Itemarray`expr ${i_array} + 1`
Done & 6;} & # > / dev/null 2 > & 1 &
Shift
Cmd=$2
Done
# wait for all backstage subprocesses to finish
Wait
# turn off df6
Exec 6 > &-
}
# main function
Function main ()
{
# obtain the local IP address
Host= ""
Port=
DATE= `date +% F`
Socket= ""
DB_USER=''
# password of database user
DB_PASSWORD=''
# start time of recording
BEGIN= `date "+% Y-%m-%d% H:%M:%S" `
BEGIN_T= `date-d "$BEGIN" +% s`
Echo'- start concurrent backups by table: start time is'$BEGIN
# set the number of threads for concurrent backups
ThreadsNum=8
# call instance function
Instance
Echo'- backup all database fully loaded! End time: '`date "+% Y-%m-%d% H:%M:%S" `
}
Main
[section]
Because the speed of scheme 2 is at least 2 hours less than that of scheme 1, and no other processing is needed after importing the target library, scheme 2 is used in the migration of production environment.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.