In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you "MySQL how to use Partition function to achieve horizontal partition", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "MySQL how to use Partition function to achieve horizontal partition" this article.
1 Review
We explained in detail how to partition the database, including vertical split (Scale Up vertical expansion) and horizontal split (Scale Out scale out), and briefly sorted out several strategies for horizontal partitioning. Now let's review.
2 5 strategies of horizontal zoning
2.1 Hash (hash)
This strategy calculates the Hash Key of one or more columns of the table, and finally partitions the data regions corresponding to different values of the Hash code. For example, we can establish a strategy for partitioning the year of the date of the table so that each year is aggregated in an interval.
PARTITION BY HASH (YEAR (createtime)) PARTITIONS 10
2.2 Range (range)
This strategy is to divide the data into different ranges. For example, we can divide a ten-million-level table into four partitions through id, with about 500W of data in each partition, and the data exceeding 750W will be placed in the fourth partition.
PARTITION BY RANGE (id) (PARTITIONP0 VALUES LESS THAN (2500001), PARTITIONP1 VALUES LESS THAN (5000001), PARTITIONp2 VALUES LESS THAN (7500001), PARTITIONp3 VALUES LESS THAN MAXVALUE)
2.3Key (key value)
An extension of the Hash strategy, where Hash Key is generated by the MySQL system.
2.4, List (predefined list)
This strategy allows the system to split the row data corresponding to the values of the defined list. For example, we partition according to the post code, and the codes of different job types correspond to different zones to achieve the purpose of division.
PARTITION BY LIST (gwcode) (PARTITIONP0 VALUES IN (46 mine77, 89), PARTITIONP1 VALUES IN (106125177), PARTITIONP2 VALUES IN (205219289), PARTITIONP3 VALUES IN (302317458509610)
The above SQL script uses the list matching LIST function to partition the employee post number, which is divided into four divisions. The administrative post number is 46, 777, 89, which corresponds to partition P0, and the technical post 106125177 category is in partition P1, and so on.
2.5, Composite (composite mode)
Composite mode is actually a combination of the above modes, such as you do Hash hash partition on the basis of Range.
3 Test Range strategy 3.1 establish summary table and sub-table
We create a common user table users and a partition table users_part. Users born in the 1980s are partitioned by year, as follows:
3.1.1 General statement
Mysql > CREATE TABLE users ("id" int (10) unsigned NOT NULL, "name" varchar (100) DEFAULT NULL, "birth" datetime) ENGINE=InnoDB DEFAULT CHARSET=utf8;Query OK, 0 rows affected
3.1.2 Sub-table statement
The last line is to note that all those born after 89 years are assigned to the 10th partition. What we simulate here are all users born in the 1980s, and the actual business is split according to specific conditions.
Mysql > create table users_part ("id" int (10) unsigned NOT NULL, "name" varchar (100) DEFAULT NULL, "birth" datetime) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY RANGE (year (birth)) (PARTITION p0 VALUES LESS THAN (1981), PARTITION p1 VALUES LESS THAN (1982), PARTITION p2 VALUES LESS THAN (1983), PARTITION p3 VALUES LESS THAN (1984), PARTITION p4 VALUES LESS THAN (1985), PARTITION p5 VALUES LESS THAN (1986), PARTITION p6 VALUES LESS THAN (1987), PARTITION p7 VALUES LESS THAN (1988) PARTITION p8 VALUES LESS THAN (1989), 17 PARTITION p9 VALUES LESS THAN MAXVALUE) Query OK, 0 rows affected
3.2 initialize table data
We can initialize the data in batches using functions or stored procedures, where 1000W pieces of data are inserted.
DROP PROCEDURE IF EXISTS init_users_part;delimiter $/ * sets the statement Terminator to $* / CREATE PROCEDURE init_users_part () begin DECLARE srt int default 0; while srt
< 10000000 /* 设定写入1000W的数据 */ do insert into `users_part` values (srt, concat('username_',idx1),adddate('1980-01-01',rand() * 3650)); /*在10年的时间内随机取值*/ set srt = srt + 1; end while; end $delimiter ;call init_users_part(); 3.3 同步数据至完整表中 mysql>Insert into users select * from users_part; / / copy 1000w data to the full unpartitioned table users Query OK, 10000000 rows affected (51.59 sec) Records: 10000000 Duplicates: 0 Warnings: 0
3.4 Test the efficiency of executing SQL
Mysql > select count (*) from users_part where `room` > '1986-01-01' and `room`
< '1986-12-31';+----------+| count(*) |+----------+| 976324 |+----------+1 row in set (0.335 sec)mysql>Select count (*) from users where `room` > '1986-01-01' and `room`
< '1986-12-31';+----------+| count(*) |+----------+| 976324 |+----------+1 row in set (5.187 sec) 结果比较清晰,分区表的执行效率确实比较高,执行时间是未分区表 1/10 都不到。 3.5 使用Explain执行计划分析 mysql>Explain select count (*) from users_part where `room` > '1986-01-01' and `room`
< '1986-12-31';+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+| 1 | SIMPLE | users_part | p7 | ALL | NULL | NULL | NULL | NULL | 987769| 100.00 | Using where |+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+1 row in set, 1 warning (0.00 sec)mysql>Explain select count (*) from users where `room` > '1986-01-01' and `room`
< '1986-12-31';+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+| 1 | SIMPLE | users | NULL | ALL | NULL | NULL | NULL | NULL |10000000 | 100.00 | Using where |+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+1 row in set, 1 warning (0.00 sec) 这边关注两个关键参数:一个 是partitions,users_part中是p7,说明数据检索在第七分区中,users表是null的,说明是全区域扫描,无分区。 另外一个参数是rows,是预测扫描的行数,users表明显是全表扫描。 3.6 建索引提效 因为我们使用birth字段进行分区和条件查询,所以这边尝试在birth字段上简历索引进行效率优化。 mysql>Create index idx_user on users (birth); Query OK, 0 rows affected (1 min 7.04 sec) Records: 10000000 Duplicates: 0 Warnings: 0mysql > create index idx_user_part on users_part (birth); Query OK, 0 rows affected (1 min 1.05 sec) Records: 10000000 Duplicates: 0 Warnings: 0
List of database file sizes after index creation:
2008-05-24 09:23 8608 no_part_tab.frm2008-05-24 09:24 255999996 no_part_tab.MYD2008-05-24 09:24 81611776 no_part_tab.MYI2008-05-24 09:25 0 part_tab#P#p0.MYD2008-05-24 09:26 1024 part_tab#P#p0.MYI2008-05-24 09:26 2555 0656 part_tab # P#p1.MYD2008-05-24 09:26 8148992 part_tab#P#p1.MYI2008-05-24 09:26 25620192 part_tab#P#p10.MYD2008-05-24 09:26 8170496 part_tab#P#p10.MYI2008-05-24 09:25 0 part_tab#P#p11.MYD2008-05-24 09:26 1024 part_tab#P#p11.MYI2008-05-24 09:26 25656512 part_tab#P#p2.MYD2008-05-24 09:26 8181760 part_tab#P#p2.MYI2008-05-24 09:26 25586880 part_tab#P#p3.MYD2008-05-24 09:26 8160256 part_tab#P#p3.MYI2008-05-24 09:26 25585696 part_tab#P#p4.MYD2008-05-24 09:26 8159232 part_tab#P#p4.MYI2008- 05-24 09:26 25585216 part_tab#P#p5.MYD2008-05-24 09:26 8159232 part_tab#P#p5.MYI2008-05-24 09:26 25655740 part_tab#P#p6.MYD2008-05-24 09:26 8181760 part_tab#P#p6.MYI2008-05-24 09:26 25586528 part_tab#P#p7.MYD2008-05-24 09:26 8160256 part_tab#P#p7 .MYI 2008-05-24 09:26 25586752 part_tab#P#p8.MYD2008-05-24 09:26 8160256 part_tab#P#p8.MYI2008-05-24 09:26 25585824 part_tab#P#p9.MYD2008-05-24 09:26 8159232 part_tab#P#p9.MYI2008-05-24 09:25 8608 part_tab.frm2008-05-24 09:25 68 part_tab.par
Test SQL performance again
Mysql > select count (*) from users_part where `room` > '1986-01-01' and `room`
< '1986-12-31';+----------+| count(*) |+----------+| 976324 |+----------+1 row in set (0.171 sec)mysql>Select count (*) from users where `room` > '1986-01-01' and `room`
< '1986-12-31';+----------+| count(*) |+----------+| 976324 |+----------+1 row in set (0.583 sec) 这边可以看到,在关键的字段添加索引并重启(net stop mysql,net start mysql)之后,分区的表性能有略微提升。而未分区的全表性能提升最明显,几乎接近分区的效率。 3.7 跨区执行效率分析 通过上面的分析可以看出,在单个区内执行,比不分区效率又很明显的差距,这是因为分区之后扫描非范围缩小了。 那如果我们上面条件增加出生年份的范围,让他产生跨区域的情况,效果会怎么样呢,我们测试一下。 mysql>Select count (*) from users_part where `room` > '1986-01-01' and `room`
< '1987-12-31';+----------+| count(*) |+----------+| 976324 |+----------+1 row in set (1.914 sec)mysql>Select count (*) from users where `room` > '1986-01-01' and `room` <' 1987-12-31 row in set row in set (3.871 sec) | count (*) | +-+ | 976324 |
It can be seen that the performance will be worse after crossing the zone. It should be understood here that the more cross-zone, the worse the performance, so when doing partition design, you should be aware of avoiding that kind of frequent cross-zone situation and carefully judge the partition boundary conditions.
3.8 Summary
1. Partitioned and unpartitioned files occupy roughly the same file space (data and index files)
2. When the key fields in the query statement are not indexed, the partitioning time is much better than the unpartitioned time.
3. If the fields in the query statement are indexed, the difference between partitioned and unpartitioned is reduced, but it is still better than unpartitioned, and this advantage will become more obvious as the amount of data increases.
4. For a large amount of data, it is recommended to use the partitioning function, regardless of whether it has established an index or not.
5. According to the MySQL manual, adding myisam_max_sort_file_size increases partition performance (maximum temporary file size allowed for mysql re-indexing)
6. When designing the partition, carefully judge the boundary conditions of the partition to avoid excessive and frequent cross-zone operations, otherwise the performance will not be ideal.
4 detailed explanation of partition strategy 4.1 HASH (hash)
HASH partitions are mainly used to ensure that data is evenly distributed among a predetermined number of partitions, while in RANGE and LIST partitions, you must clearly specify in which partition a given column value or set of column values should be stored
In the HASH partition, MySQL does this work automatically
All you have to do is specify a column value or expression based on the column value to be hashed, and specify the number of partitions that the partitioned table will be divided into. Examples are as follows:
/ * Hash*/drop table if EXISTS `t_ userinfo` CREATE TABLE `tuserinfo` (`id` int (10) unsigned NOT NULL, `personcode` varchar (20) DEFAULT NULL, `personname` varchar (100) DEFAULT NULL, `depcode` varchar (100) DEFAULT NULL, `depname` varchar (500) DEFAULT NULL, `gwcode` int (11) DEFAULT NULL, `gwname` varchar (200) DEFAULT NULL, `gravalue` varchar (20) DEFAULT NULL, `createtime` DateTime NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY HASH (YEAR (createtime)) PARTITIONS 4 (PARTITION P0 DATA DIRECTORY ='/ data0/data' INDEX DIRECTORY ='/ data0/idx') PARTITION P1 DATA DIRECTORY ='/ data1/data' INDEX DIRECTORY ='/ data1/idx', PARTITION P2 DATA DIRECTORY ='/ data2/data' INDEX DIRECTORY ='/ data2/idx', PARTITION P3 DATA DIRECTORY ='/ data3/data' INDEX DIRECTORY ='/ data3/idx')
In the above example, we use the HASH function to HASH the createtime date, and partition the data according to that date, which is divided into 10 partitions.
Add a "PARTITION BY HASH (expr)" clause to the table statement, where "expr" is an expression that returns an integer, either the name of a column with a field type of MySQL integer or an expression that returns a non-negative number.
In addition, you may need to add a "PARTITIONS num" clause later, where num is a non-negative integer that represents the number of partitions the table will be divided into.
Each partition has its own separate directory for data and index files, and the physical disk partition in which these directories are located may also be completely independent, which can improve disk IO throughput.
4.2 RANGE (range)
Based on column values belonging to a given contiguous interval, multiple rows are assigned to the same partition, which are contiguous and cannot overlap each other, and are defined using the VALUES LESS THAN operator. Examples are as follows:
/ * Range*/drop table if EXISTS `t_ userinfo` CREATE TABLE `tuserinfo` (`id` int (10) unsigned NOT NULL, `personcode` varchar (20) DEFAULT NULL, `personname` varchar (100) DEFAULT NULL, `depcode` varchar (100) DEFAULT NULL, `depname` varchar (500) DEFAULT NULL, `gwcode` int (11) DEFAULT NULL, `gwname` varchar (200) DEFAULT NULL, `gravalue` varchar (20) DEFAULT NULL, `createtime` DateTime NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY RANGE (gwcode) (PARTITION P0 VALUES LESS THAN (101) DIRECTORY ='/ data0/data' INDEX DIRECTORY ='/ data0/idx') PARTITION P1 VALUES LESS THAN DIRECTORY ='/ data1/data' INDEX DIRECTORY ='/ data1/idx',PARTITION P2 VALUES LESS THAN (301) DIRECTORY ='/ data2/data' INDEX DIRECTORY ='/ data2/idx',PARTITION P3 VALUES LESS THAN MAXVALUE DIRECTORY ='/ data3/data' INDEX DIRECTORY ='/ data3/idx')
In the above example, the range RANGE function is used to partition the job number, which is divided into 4 partitions
The number of the post with the number of 1x 100 is in partition P0, the number of 101 by 200 is in partition P1, and so on. If the category number is greater than 300, you can use MAXVALUE to store the data greater than 300 in partition P3.
Each partition has its own separate directory for data and index files, and the physical disk partition in which these directories are located may also be completely independent, which can improve disk IO throughput.
4.3 LIST (predefined list)
Similar to partitioning by RANGE, the difference is that LIST partitions select partitions based on column values matching a value in a collection of discrete values. LIST partitioning is implemented by using "PARTITION BY LIST (expr)", where "expr" is a column value or an expression based on a column value and returns an integer value
Each partition is then defined as "VALUES IN (value_list)", where "value_list" is a comma-separated list of integers. Examples are as follows:
/ * List*/drop table if EXISTS `t_ userinfo` CREATE TABLE `tuserinfo` (`id` int (10) unsigned NOT NULL, `personcode` varchar (20) DEFAULT NULL, `personname` varchar (100) DEFAULT NULL, `depcode` varchar (100) DEFAULT NULL, `depname` varchar (500) DEFAULT NULL, `gwcode` int (11) DEFAULT NULL, `gwname` varchar (200) DEFAULT NULL, `gravalue` varchar (20) DEFAULT NULL, `createtime` DateTime NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY LIST (`gwcode`) (PARTITION P0 VALUES IN (4681 7789) DATA DIRECTORY ='/ data0/data' INDEX DIRECTORY ='/ data0/idx') PARTITION P1 VALUES IN (106125177) DATA DIRECTORY ='/ data1/data' INDEX DIRECTORY ='/ data1/idx',PARTITION P2 VALUES IN (205219289) DATA DIRECTORY ='/ data2/data' INDEX DIRECTORY ='/ data2/idx',PARTITION P3 VALUES IN (302317458509610) DATA DIRECTORY ='/ data3/data' INDEX DIRECTORY ='/ data3/idx')
In the above example, the list matching LIST function is used to partition the employee position number, which is divided into four partitions. The corresponding number is 46, 777, 89 in partition P0, 106125177 category in partition P1, and so on.
Unlike RANGE, the data of the LIST partition must match the job number in the list before it can be partitioned, so this approach is only suitable for comparing interval values and a small number of cases.
Each partition has its own separate directory for data and index files, and the physical disk partition in which these directories are located may also be completely independent, which can improve disk IO throughput.
4.4 KEY (key value)
Similar to partitioning by HASH, the difference is that KEY partitions only support calculating one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values. Examples are as follows:
/ * key*/drop table if EXISTS `t_ userinfo` CREATE TABLE `tuserinfo` (`id` int (10) unsigned NOT NULL, `personcode` varchar (20) DEFAULT NULL, `personname` varchar (100) DEFAULT NULL, `depcode` varchar (100) DEFAULT NULL, `depname` varchar (500) DEFAULT NULL, `gwcode` int (11) DEFAULT NULL, `gwname` varchar (200) DEFAULT NULL, `gravalue` varchar (20) DEFAULT NULL, `createtime` DateTime NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY KEY (gwcode) PARTITIONS 4 (PARTITION P0 DATA DIRECTORY ='/ data0/data' INDEX DIRECTORY ='/ data0/idx') PARTITION P1 DATA DIRECTORY ='/ data1/data' INDEX DIRECTORY ='/ data1/idx', PARTITION P2 DATA DIRECTORY ='/ data2/data' INDEX DIRECTORY ='/ data2/idx', PARTITION P3 DATA DIRECTORY ='/ data3/data' INDEX DIRECTORY ='/ data3/idx')
Note: at present, this partitioning algorithm is rarely used, and the hash function provided by the server is uncertain, and it will be more complex for later data statistics and collation, so we prefer to use the Hash that we define the expression, as long as we know its existence and how to use it.
4.5 nested partitions (subzones)
A nested partition (subpartition) is a re-partition for each partition in a partition table of type RANGE/LIST. The second split can be of a type such as HASH/KEY.
Drop table if EXISTS `t _ userinfo` CREATE TABLE `tuserinfo` (`id` int (10) unsigned NOT NULL, `personcode` varchar (20) DEFAULT NULL, `personname` varchar (100) DEFAULT NULL, `depcode` varchar (100) DEFAULT NULL, `depname` varchar (500) DEFAULT NULL, `gwcode` int (11) DEFAULT NULL, `gwname` varchar (200) DEFAULT NULL, `gravalue` varchar (20) DEFAULT NULL, `createtime` DateTime NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY RANGE (id) SUBPARTITION BY HASH (id% 4) SUBPARTITIONS 2 (PARTITION p0 VALUES LESS THAN (5000000) DATA DIRECTORY ='/ data0/data' INDEX DIRECTORY ='/ data0/idx') PARTITION p1 VALUES LESS THAN MAXVALUE DATA DIRECTORY ='/ data1/data' INDEX DIRECTORY ='/ data1/idx')
As above, the RANGE partition is subpartitioned again, and the subpartition is of the HASH type.
5 Partition management
5.1 deleting Partition
/ * Delete partition P1roomram 2 ALERT TABLE users_part DROP PARTITION P1
5.2 rebuild Partition
5.2.1 RANGE Partition Reconstruction
/ * merge the original P0pl partition into the new P0 partition, and reset the condition to less than 5000000. * / ALTER TABLE users_part REORGANIZE PARTITION P0 P0 P1 INTO (PARTITION P0 VALUES LESS THAN (5000000))
Used for merging cases caused by excessive waste of space.
5.2.2 LIST partition reconstruction
/ * merge the original P0 P1 partition and put it into the new P0 partition, which is a bit similar to the previous one. * / ALTER TABLE users_part REORGANIZE PARTITION p0 INTO p1 INTO (PARTITION p0 VALUES IN)
5.2.3 HASH/KEY partition reconstruction [code]
/ * the number of partitions rebuilt by REORGANIZE has been changed to 2, where the number can only be reduced but not increased. If you want to add more, you can use the ADD PARTITION method. * / ALTER TABLE users_part REORGANIZE PARTITION COALESCE PARTITION 2
5.3 New Partition
5.3.1 add RANGE partition
/ * add a new RANGE partition * / ALTER TABLE category ADD PARTITION (PARTITION p4 VALUES IN (16, data8/data' INDEX DIRECTORY, 17, and 18) DATA DIRECTORY ='/ data8/data' INDEX DIRECTORY ='/ data8/idx')
5.3.2 add HASH/KEY Partition
/ * expand the total number of partitions to n. N Please replace * / ALTER TABLE users_part ADD PARTITION PARTITIONS n with a numeric value
5.3.3 Partition existing tables
Alter tableuser_part partition by RANGE (month (birth)) (PARTITION p0 VALUES LESS THAN (1), PARTITION p1 VALUES LESS THAN (2), PARTITION p2 VALUES LESS THAN (3), PARTITION p3 VALUES LESS THAN (4), PARTITION p4 VALUES LESS THAN (5), PARTITION p5 VALUES LESS THAN (6), PARTITION p6 VALUES LESS THAN (7), PARTITION p7 VALUES LESS THAN (8), PARTITION p8 VALUES LESS THAN (9), PARTITION p9 VALUES LESS THAN (10), PARTITION p10 VALUES LESS THAN (11), PARTITION p11 VALUES LESS THAN (12) PARTITION P12 VALUES LESS THAN (13))
6 remove the partition primary key restriction
The default partition restriction partition field must be part of the primary key (PRIMARY KEY), and this restriction needs to be removed.
If there is a primary key in the table, the following prompt will be reported: A PRIMARY KEY must include all columns in the table's partitioning function (prefixed columns are not considered).
One solution is to use the primary key as a partitioning condition:
ALTER TABLE users_part PARTITION BY HASH (id) PARTITIONS 4
Another way is to add the partition condition field to the primary key and become the federated primary key. As follows, id and gwcode form the joint primary key:
Alter table users_part drop PRIMARY KEY; alter table users_part add PRIMARY KEY (id, gwcode); these are all the contents of the article "how MySQL uses Partition to achieve horizontal partitioning". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.