Example Analysis of mysql Sub-Table Partition 07/19 Update SLTechnology News&Howtos

Example Analysis of mysql Sub-Table Partition

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the example analysis of mysql sub-table partition, the content is very detailed, interested friends can refer to, hope to be helpful to you.

In the face of today's big data storage, imagine that when the total record of a table in mysql exceeds 1000W, will there be a significant decline in performance?

The answer is yes, the total record of a table is more than 1000W, and retrieval at the operating system level is also very inefficient.

Solution:

At present, there are two methods to optimize massive data:

1. The way of separating small tables from large tables (mainly including sub-table and partition technology)

(1) Sub-meter technology

Vertical segmentation

Advantage: reduce locking on tables in the case of high concurrency.

Deficiency: for a single table, as the number of database records increases, the read and write pressure will further increase.

Horizontal division

If the IO pressure of a single table is large, horizontal segmentation can be considered. The principle is that through the hash algorithm, a table is divided into N pages, and the position of each page is recorded through a new table (total table). If a portal site, its database table has reached 10 million records, then if you query through select, it must be inefficient (without indexing). In order to reduce the IO pressure of reading and writing a single table, split the table into 10 pages by split horizontally, and generate a total table at the same time, recording the information of each page, then if I query a record of id=100, it no longer needs a full table scan, but find the record on which corresponding page through the summary table, and then go to the corresponding page to do retrieval, thus reducing the IO pressure.

Horizontal split table technology is to split a table into multiple tables, the more common way is to split the records in the table according to a certain HASH algorithm, at the same time, this partitioning method must also modify the SQL in the front-end application before it can be used, and for a SQL statement, two tables may be modified, so you have to modify two SQL statements to complete your logical transaction It will make logical judgments more and more complex, which will increase the maintenance cost of the program, so we should avoid this situation.

2. Optimization of SQL statement (index)

SQL statement optimization: it can be adjusted by adding indexes, etc., but at the same time, the increase of the amount of data will lead to an increase in the cost of index maintenance.

Zoning advantages:

1. Reduce IO

2. Improve reading and writing

3. Facilitate data management

The difference between partitions and subtables:

The partition is split horizontally at the logical level, and it is still a table for the application.

Partitioning is to divide the data of a table into N blocks, which can be on the same disk or on different disks.

1. In the way of realization

(1) the sub-table of mysql is a real sub-table. After a table is divided into many tables, each small table is a complete table, corresponding to three files, a .MYD data file, a .MYI index file, and a .frm table structure file.

[root@BlackGhost test] # ls | grep user

Alluser.MRG

Alluser.frm

User1.MYD

User1.MYI

User1.frm

User2.MYD

User2.MYI

User2.frm

To explain briefly, the above sub-table is the use of merge storage engine (a kind of sub-table), alluser is the total table, below there are two sub-tables, user1,user2. Both of them are independent tables, and when we take the data, we can get it through the general table. There are no .MYD and .MYI files in the master table here, that is to say, the master table is not a table, there is no data, and the data is put in the sub-table. Let's take a look at what the MRG is.

[root@BlackGhost test] # cat alluser.MRG | more

User1

User2

# INSERT_METHOD=LAST

As we can see from the above, there are some sub-table relationships stored in alluser.MRG, as well as the way to insert data. You can think of a summary table as a shell, or a connection pool.

(2) Partition is different. After a large table is partitioned, it is still one table and will not become two tables, but it has more blocks to store data.

[root@BlackGhost test] # ls | grep aa

Aa#P#p1.MYD

Aa#P#p1.MYI

Aa#P#p2.MYD

Aa#P#p2.MYI

Aa#P#p3.MYD

Aa#P#p3.MYI

Aa.frm

Aa.par

As we can see from the above, the aa table is divided into three sections. We all know that a table corresponds to three files. MYD, .MYI, .frm. The partition divides the data file and the index file according to certain rules, and there is an extra .par file. After opening the .par file, you can see that he recorded the partition information of this table, similar to the .MRG in the sub-table. After partitioning, it is still one table, not multiple tables.

two。 In data processing

(1) after the sub-table, the data is stored in the sub-table, the master table is just a shell, and the access to the data occurs in a sub-table. Look at the following example:

On the surface, select * from user1 user2 where id='12' operates on the table alluser, but in fact it is not. It is the operation of the sub-table in alluser.

(2) Partition, there is no concept of sub-table, partition only divides the file storing data into many small pieces, the partitioned table is still a table. Data processing is still done by yourself.

Select * from alluser where id='12'

3. To improve the performance

The main results are as follows: (1) after dividing the table, the concurrency ability of single table is improved, and so is the performance of disk Iripple O. Because it takes less time for a query, if there is high concurrency, the total table can divide the concurrent pressure into different small tables according to different queries. A very large .MYD file is now apportioned to the .MYD of each small table, so the pressure on the disk IO is reduced.

(2) mysql put forward the concept of partition. I think I just want to break through the bottleneck of disk I / G O and improve the read and write ability of disk to increase mysql performance.

At this point, the emphasis of partition and sub-table is different, the focus of sub-table is how to improve the concurrency ability of mysql when accessing data, while partition, how to break through the read and write ability of disk, so as to improve the performance of mysql.

4. In terms of the difficulty of realization

(1) there are many ways to divide a table, and using merge to divide a table is the simplest way. This way the root partition is about as easy as it is, and it can be transparent to the program code. If you use other sub-table methods, it will be more troublesome than partitioning.

(2) the implementation of partitioning is relatively simple, establishing a partition table is no different from building a normal table, and it is transparent to the open code end.

Partition Typ

Hash 、 range 、 list 、 key

RANGE partition: assigns multiple rows to a partition based on the column values of a given contiguous interval.

LIST partitions: similar to partitioning by RANGE, except that LIST partitions are selected based on column values matching a value in a set of discrete values.

HASH partition: a partition that is selected based on the return value of a user-defined expression that uses the column values of the rows to be inserted into the table. This function can contain any valid expression in MySQL that produces a non-negative integer value.

Hash is used when the data is relatively random. It is randomly and evenly distributed according to the contents of the table after the hash operation. Assuming that the column is gender, it is not suitable to use hash partition, because the content is either male or female, and there is no randomness.

KEY partitions: similar to partitioning by HASH, except that KEY partitions only support computing one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values. -rarely used

How do I check if the database supports partitioning technology?

Create a partition:

Mysql > create table T1 (id int) partition by hash (id) partitions 3

Query OK, 0 rows affected (0.03 sec)

[experiment]

Create a partitioned table and a non-partitioned table for performance testing

Create a partition table

Mysql > create table part_tab (C1 int default NULL, c2 varchar (30) default null, c3 date default null) engine=myisam

-> partition by range (year (c3)) (

-> partition p0 values less than (1995)

-> partition p1 values less than (1996)

-> partition p2 values less than (1997)

-> partition p3 values less than (1998)

-> partition p4 values less than (1999)

-> partition p5 values less than (2000)

-> partition p6 values less than (2001)

-> partition p7 values less than (2002)

-> partition p8 values less than (2003)

-> partition p9 values less than (2004)

-> partition p10 values less than (2010)

-> partition p11 values less than MAXVALUE)

Query OK, 0 rows affected (0.14 sec)

Create a non-partitioned table

Mysql > create table no_part_tab (C1 int default NULL, c2 varchar (30) default null, c3 date default null) engine=myisam

Query OK, 0 rows affected (0.11 sec)

Mysql >\ d / / # since stored procedures are used below, you need to change the Terminator to "/ /". The so-called stored procedure is actually a collection of sql statements.

Mysql > create procedure load_part_tab ()

-> begin

-> declare v int default 0

-> while v

< 8000000 ->

-> insert into part_tab

-> values (v values testing partitions',adddate ('1995-01-01), (rand (v) * 36520) mod 3652))

-> set v = VIP1

-> end while

-> end

-> / /

Query OK, 0 rows affected (0.04 sec)

Mysql >\ d; / / after executing this stored procedure, you need to modify the Terminator back

The above stored procedure is actually designed to create a large amount of data (8 million items)

Mysql > call load_part_tab (); / / call the stored procedure load_part_tab

Query OK, 1 row affected (9 min 18.95 sec)

Quickly insert the data from part_tab into no_part_tab.

Mysql > insert no_part_tab select * from part_tab

Query OK, 8000000 rows affected (8.97 sec)

Records: 8000000 Duplicates: 0 Warnings: 0

Test 1:

Make sure the data in the two tables are consistent before the experiment! Ensure the comparability of the experiment

Mysql > select count (*) from part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (0.49 sec) mysql>

Select count (*) from no_part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (3.94 sec) mysql>

Desc select count (*) from part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_tab type: ALL //全表扫描 possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 798458 Extra: Using where 1 row in set (0.09 sec) ERROR: No query specified mysql>

Desc select count (*) from no_part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: no_part_tab type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8000000 Extra: Using where 1 row in set (0.00 sec) ERROR: No query specified 结论：可以看到，做了分区之后，只需要扫描79万条语句，而不做分区的，则需要进行全表扫描，故可以看出，做了分区技术后，可以提高读写效率。测试2：创建索引，查看语句执行情况 mysql>

Create index idx_c3 on no_part_tab (c3)

Query OK, 8000000 rows affected (32.68sec)

Records: 8000000 Duplicates: 0 Warnings: 0

Result analysis:

Mysql > desc select count (*) from no_part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NO_part_tab type: range possible_keys: idx_c3 key: idx_c3 key_len: 4 ref: NULL rows: 785678 Extra: Using where; Using index 1 row in set (0.16 sec) ERROR: No query specified 结论：为未分区的表创建了索引之后，再次执行相同的语句，可以看到该SQL语句是根据range索引进行检索，而不是全表扫描了。明显效率也提高了。测试3：测试做索引与未作索引的读写效率。 mysql>

Create index idx_c3 on part_tab (c3)

Query OK, 8000000 rows affected (31.85 sec)

Records: 8000000 Duplicates: 0 Warnings: 0

Mysql > desc select count (*) from part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_tab type: index possible_keys: idx_c3 key: idx_c3 key_len: 4 ref: NULL rows: 798458 Extra: Using where; Using index 1 row in set (0.14 sec) ERROR: No query specified 测试未创建索引字段 mysql>

Select count (*) from no_part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31' and c2='hello'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (4.90 sec) 结论：可以看到如果没通过索引进行检索所耗费的时间将长于通过索引进行检索。测试4：删除 mysql>

Delete from part_tab where c3 > date '1995-01-01' and c3

< date '1995-12-31'; Query OK, 795181 rows affected (14.02 sec) mysql>

Delete from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'

Query OK, 795181 rows affected (15.21 sec)

As you can see, in terms of deletion, those with partitions are still faster than those without partitions. Thus it reflects the characteristic that it is convenient for data management.

This is the end of the example analysis of mysql sub-table partition. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.