What is the difference between a split table and a partition in mysql 07/02 Update SLTechnology News&Howtos

What is the difference between a split table and a partition in mysql

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what is the difference between subtables and partitions in mysql. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

First, what is the mysql sub-table, partition

What is a sub-table? on the surface, it is to divide a table into more than N small tables. For details, please see the three methods of mysql sub-table.

What is partition? partition is to divide the data of a table into N blocks. These blocks can be on the same disk or on different disks.

First, let's talk about why we need to divide the meter.

When a piece of data reaches millions, it takes more time for you to make a query, and if there is a joint query, I think you may die there. The purpose of sub-table is to reduce the burden of the database and shorten the query time.

Based on personal experience, the process for mysql to perform a sql is as follows:

1. Receive the sql;2 and put the sql in the queue; 3, execute the sql;4 and return the execution result. Where do you spend the most time in this implementation process? The first is the waiting time in the queue, and the second is the execution time of sql. In fact, these two are the same thing, while waiting, there must be sql in the implementation. So we need to shorten the execution time of sql.

There is a mechanism in mysql that is table locking and row locking. Why does this mechanism appear in order to ensure the integrity of the data? let me cite an example. If two sql have to modify the same data in the same table, what should I do at this time? can both sql modify this data at the same time? It is obvious that mysql handles this situation, one is table locking (myisam storage engine) and the other is row locking (innodb storage engine). Table locking means that none of you can operate on the table until I finish operating on the table. The same is true for row locking, other sql must wait until I have finished working on this data before I can operate on this data. If there is too much data and it takes too long to execute at one time, the longer the waiting time, which is why we have to split the table.

Second, sub-table

1. Make a mysql cluster, for example, using mysql cluster, mysql proxy,mysql replication,drdb, etc.

Some people will ask the mysql cluster, what does the root table have to do with it? Although it is not a sub-table in the actual sense, but it enables the role of sub-table, what is the significance of clustering? To lighten the burden on a database, to put it bluntly, it is to reduce the number of sql in the sql queue. For example, if there are 10 sql requests, if they are placed in the queue of a database server, it will take a long time for him to wait. If the 10 sql requests are assigned to the queue of 5 database servers, there are only 2 in the queue of one database server. Is the waiting time greatly shortened? It's already obvious. So I put it within the scope of the sub-table, and I've done some mysql clusters:

Linux

Installation, configuration, and read-write separation of mysql proxy

Mysql

Installation and configuration of replication as master and slave, as well as data synchronization

Advantages: good scalability, no complex operations after multiple sub-tables (php code)

Disadvantages: the amount of data in a single table remains the same, the time spent on one operation is still that much, and the hardware cost is high.

2. Pre-estimate the tables with large amount of data and frequently accessed, and divide them into several tables

Whether this estimate is big or bad, the list of posts posted in the forum must be very big over a long period of time, hundreds of thousands or millions. Chat room inside the information table, dozens of people chat together for a night, for a long time, the data on this table must be very large. There are a lot of situations like this. Therefore, for this predictable big data scale, we will divide it into N tables in advance, and what this N is depends on the actual situation. Take the chat information table as an example:

I built a hundred of these tables in advance, message_00,message_01,message_02.message_98,message_99. Then according to the user's ID to determine which table the user's chat information is put in, you can use hash to get it, and you can get it by asking for the remainder. There are many ways, and each person thinks of his own. Let's use the hash method to get the table name:

View copy print?

To explain, the above method tells us that the messages of the user18991 user are recorded in the message_10 table, and the messages of the user34523 user are recorded in the message_13 table. When reading, you just need to read it from their respective tables.

Advantages: avoid millions of pieces of data in a table and shorten the execution time of an sql

Disadvantages: when a rule is determined, it will be troublesome to break this rule. The hash algorithm I used in the above example is crc32. If I don't want to use this algorithm now, after using md5, the messages of the same user will be stored in different tables, so the data will be messed up. Poor scalability.

3. Use merge storage engine to realize sub-table.

I think this method is more suitable, those who do not consider in advance, but have already appeared, the data query is slow. At this time, if it is more painful to separate the existing big data scale, the most painful thing is to change the code, because the sql statement in the program has been written, and now a table is divided into dozens of tables, or even hundreds of tables, so does the sql statement have to be rewritten? For example, I like to raise my son very much.

When mysql > show engines;, you will find that mrg_myisam is actually merge.

View copy print?

Mysql > CREATE TABLE IF NOT EXISTS `user1` (

-> `id` int (11) NOT NULL AUTO_INCREMENT

-> `name` varchar (50) DEFAULT NULL

-> `sex` int (1) NOT NULL DEFAULT'0'

-> PRIMARY KEY (`id`)

->) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1

Query OK, 0 rows affected (0.05 sec)

Mysql > CREATE TABLE IF NOT EXISTS `user2` (

-> `id` int (11) NOT NULL AUTO_INCREMENT

-> `name` varchar (50) DEFAULT NULL

-> `sex` int (1) NOT NULL DEFAULT'0'

-> PRIMARY KEY (`id`)

->) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1

Query OK, 0 rows affected (0.01 sec)

Mysql > INSERT INTO `user1` (`name`, `sex`) VALUES ('Zhang Ying', 0)

Query OK, 1 row affected (0.00 sec)

Mysql > INSERT INTO `user2` (`name`, `sex`) VALUES ('tank', 1)

Query OK, 1 row affected (0.00 sec)

Mysql > CREATE TABLE IF NOT EXISTS `alluser` (

-> `id` int (11) NOT NULL AUTO_INCREMENT

-> `name` varchar (50) DEFAULT NULL

-> `sex` int (1) NOT NULL DEFAULT'0'

-> INDEX (id)

->) TYPE=MERGE UNION= (user1,user2) INSERT_METHOD=LAST AUTO_INCREMENT=1

Query OK, 0 rows affected, 1 warning (0.00 sec)

Mysql > select id,name,sex from alluser

+-- +

| | id | name | sex | |

+-- +

| | 1 | Zhang Ying | 0 | |

| | 1 | tank | 1 |

+-- +

2 rows in set (0.00 sec)

Mysql > INSERT INTO `alluser` (`name`, `sex`) VALUES ('tank2', 0)

Query OK, 1 row affected (0.00 sec)

Mysql > select id,name,sex from user2

+-- +

| | id | name | sex | |

+-- +

| | 1 | tank | 1 |

| | 2 | tank2 | 0 | |

+-- +

2 rows in set (0.00 sec)

I don't know if you have found anything from the above operation. Suppose I have a user table user with 50W items of data, and now I want to split it into two tables, user1 and user2, each with 25W items of data.

INSERT INTO user1 (user1.id,user1.name,user1.sex) SELECT (user.id,user.name,user.sex) FROM user where user.id 250000

In this way, I successfully divided a user table into two tables. At this time, there is a problem. What about the sql statement in the code? it used to be a table, but now it has become two tables, and the code has changed a lot. This has brought a lot of work to programmers. Is there a good way to solve this? The way to do this is to back up the previous user table and then delete it. In the above operation, I created an alluser table and only changed the name of the alluser table to user. However, not all mysql operations are available

A, if you use alter table to change the merge table to another table type, the mapping of the layer table is lost. Instead, rows from the underlying myisam table are copied to the replaced table, which is then assigned a new type.

B, I saw some people on the Internet saying that replace didn't work. I tried something that could work. Get dizzy first.

Mysql > UPDATE alluser SET sex=REPLACE (sex, 0,1) where id=2

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

Mysql > select * from alluser

+-- +

| | id | name | sex | |

+-- +

| | 1 | Zhang Ying | 0 | |

| | 1 | tank | 1 |

| | 2 | tank2 | 1 | |

+-- +

3 rows in set (0.00 sec)

C, one merge table cannot maintain unique constraints on the entire table. When you execute an insert, the data enters the first or last myisam table (depending on the value of the insert_method option). Mysql ensures that unique key values remain unique in that myisam table, but not across all tables in the collection.

D, when you create a merge table, there is no check to make sure that the underlying table exists and has the same mechanism. When the merge table is used, mysql checks whether the record length of each mapped table is equal, but this is not very reliable. If you create a merge table from a similar myisam table, you are very likely to run into strange problems.

Advantages: good expansibility, and the program code is not changed much.

Disadvantages: this method is a little less effective than the second one.

Third, sum up.

Of the three methods mentioned above, I have actually done two, the first and the second. The third kind has not been done, so put it in more detail. Ha ha. Everything has a degree, more than a degree will become very poor, can not blindly do database server cluster, hardware is to spend money to buy, do not blindly sub-table, divided into 1000 tables, mysql storage in the final analysis, the file situation is also stored on the hard disk, a table corresponds to three files, 1000 sub-tables is corresponding to 3000 files, so the retrieval will become very slow. My advice is

The table is divided by the combination of method 1 and method 2

The table is divided by the combination of method 1 and method 3.

My two suggestions are suitable for different situations, depending on the individual situation. I think many people will choose the combination of method 1 and method 3.

Second, what is the difference between mysql sub-table and partition

1, in the way of implementation

A), the sub-table of mysql is a real sub-table. After a table is divided into many tables, each small table is a complete table, corresponding to three files, a .MYD data file, a .MYI index file, and a .frm table structure file.

[root@BlackGhost test] # ls | grep user

Alluser.MRG

Alluser.frm

User1.MYD

User1.MYI

User1.frm

User2.MYD

User2.MYI

User2.frm

Php code

[root@BlackGhost test] # ls | grep user

Alluser.MRG

Alluser.frm

User1.MYD

User1.MYI

User1.frm

User2.MYD

User2.MYI

User2.frm

To explain briefly, the above sub-table uses the merge storage engine (a kind of sub-table). Alluser is the total table, and there are two sub-tables below, user1,user2. Both of them are independent tables, and when we take the data, we can get it through the general table. There are no .MYD and .MYI files in the master table here, that is to say, the master table is not a table, there is no data, and the data is put in the sub-table. Let's take a look at what the MRG is.

[root@BlackGhost test] # cat alluser.MRG | more

User1

User2

# INSERT_METHOD=LAST

Php code

[root@BlackGhost test] # cat alluser.MRG | more

User1

User2

# INSERT_METHOD=LAST

As we can see from the above, there are some sub-table relationships stored in alluser.MRG, as well as the way to insert data. You can think of a summary table as a shell, or a join pool.

B), the partition is different, after a large table is partitioned, it is still one table, it will not become two tables, but he has more blocks to store data.

[root@BlackGhost test] # ls | grep aa

Aa#P#p1.MYD

Aa#P#p1.MYI

Aa#P#p3.MYD

Aa#P#p3.MYI

Aa.frm

Aa.par

Php code

[root@BlackGhost test] # ls | grep aa

Aa#P#p1.MYD

Aa#P#p1.MYI

Aa#P#p3.MYD

Aa#P#p3.MYI

Aa.frm

Aa.par

As we can see from the above, the aa table is divided into two areas, p1 and p3. Originally, there were three zones, but I deleted one area. We all know that a table corresponds to three files. MYD, .MYI, .frm. The partition divides the data file and the index file according to certain rules, and an extra .par file is added. After opening the .par file, you can see that he recorded the partition information of this table, which is a bit like the .MRG in the root table. After partitioning, it is still one table, not multiple tables.

2, in data processing

A), after the sub-table, the data is stored in the sub-table, the total table is just a shell, and the data access occurs in a sub-table. Look at the following example:

On the surface, select * from alluser where id='12' operates on the table alluser, but in fact it is not. It operates on the sub-tables in the alluser.

B), partition, there is no concept of sub-table, partition only divides the file that stores the data into many small pieces, and the table after partition is still a table. Data processing is still done by yourself.

3, improve the performance

A), after dividing the table, the concurrency ability of the single table is improved, and so is the performance of the disk Iripple O. Why is the concurrency improved? because it takes less time to search once, and if there is high concurrency, the total table can divide the concurrency pressure into different small tables according to different queries. How can the performance of disk Icano be so high that a very large .MYD file is now apportioned to the .MYD of each small table.

B) mysql put forward the concept of partition. I think I just want to break through the bottleneck of disk I / G O and improve the read and write ability of disk to increase mysql performance.

At this point, the testing focus of partition and sub-table is different, the focus of sub-table is how to improve the concurrency ability of mysql when accessing data, while partition, how to break through the read and write ability of disk, so as to achieve the purpose of improving mysql performance.

4), in terms of the difficulty of implementation

A) there are many ways to divide a table, and using merge to divide a table is the easiest way. This way the root partition is about as easy as it is, and it can be transparent to the program code. If you use other sub-table methods, it will be more troublesome than partitioning.

B), partitioning implementation is relatively simple, the establishment of partitioned tables, the root of the normal table is no different, and is transparent to the open code side.

Third, what is the relationship between mysql sub-table and partition

1. Both can improve the performance of mysql and have a good surface in the state of high concurrency.

2, sub-table and partition are not contradictory, can cooperate with each other, for those tables with large access volume and more table data, we can adopt the combination of sub-table and partition (if merge this kind of sub-table mode, can not cooperate with partition, you can use other sub-table test), the visit volume is not large, but the table with a lot of table data, we can adopt the way of partition and so on.

About what is the difference between sub-table and partition in mysql, so much for sharing here. I hope the above content can help you to some extent and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.