How to optimize and decompose MySQL big data 07/06 Update SLTechnology News&Howtos

How to optimize and decompose MySQL big data

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces MySQL big data how to optimize and decompose, the things involved, learned from the theoretical knowledge, there are many books and documents for your reference, from the perspective of practical significance, accumulated years of practical experience can be shared with you.

Too much data or too many visits in the company will lead to database performance degradation, and excessive loss of disk iCompo and other CVM performance will seriously lead to downtime. According to this situation, we have come up with a solution, so let's move on:

Last time we talked about sub-tables and partitions: first, let's review the difference between sub-tables and partitions:

Sub-table:

Decompose a large table into several small tables, each with a separate file. MYD/.MYI/.frm three files

Zoning:

The data will be stored in more blocks, the table is still a large table, there will be a summary later.

Talking about the partition last time, this time it is mainly the content of the partition; the partition mainly includes five partition types:

1): range partition:

Assign column values of continuous intervals to partitions, and these intervals cannot overlap each other

So let's give an example to verify the difference between range partitions and non-partitions: {performance comparison}

First create an unpartitioned library and table

MySQL > create database test

Mysql > create table test.tab1 (C1 int,c2 varchar (30), c3 date)

Next, create a partitioned table and split it by year

Mysql > use test

Mysql > CREATE TABLE tab2 (C1 int, c2 varchar (30), c3 date)

PARTITION BY RANGE (year (c3)) (PARTITION p0 VALUES LESS THAN (1995)

PARTITION p1 VALUES LESS THAN (1996), PARTITION p2 VALUES LESS THAN (1997)

PARTITION p3 VALUES LESS THAN (1998), PARTITION p4 VALUES LESS THAN (1999)

PARTITION p5 VALUES LESS THAN (2000), PARTITION p6 VALUES LESS THAN (2001)

PARTITION p7 VALUES LESS THAN (2002), PARTITION p8 VALUES LESS THAN (2003)

PARTITION p9 VALUES LESS THAN (2004), PARTITION p10 VALUES LESS THAN (2010)

PARTITION p11 VALUES LESS THAN MAXVALUE)

The last line indicates that it is possible to insert data that is larger than 2005 in the year in which it is inserted. If not, there will be an error.

The two tables tab1 and tab2 you just created have no data but an empty table; then insert data for them

A little more, or you won't see the effect; 1000000 pieces of data

To create a stored procedure, you need to use a delimiter

Mysql > delimiter & & / / specify stored procedure Terminator

Mysql > CREATE PROCEDURE load_part_tab ()

Begin

Declare x int default 0

While x insert into test.tab2 select * from test.tab1

Note: this sql statement is nested, and the result of the subsequent execution is given to the previous execution.

Check to see if it is successful.

Let's wait and see; test sql performance:

Query for tab1 table:

Mysql > select count (*) from test.tab1 where c3 > '1995-01-01' and c3

< '1995-12-31'; tab2表的查询： mysql>

Select count (*) from test.tab2 where c3 > '1995-01-01' and c3

< '1995-12-31'; 通过explain语句来分析下它们的执行情况：结果表明分区的效果要比为分区的效果好太多了。既然这样我们就现在这个前题之下为tab1 表和tab2表创建索引，看看它们的效果如何：刷新下表的缓存，进行查询： tab1表查询如下 tab2表的查询如下；由此结果可见索引创建之后还是分区后的速度快，但是结果相差不大，但是如果数据结构复杂，数据量庞大的情况之下，结果会越发的显著不知道大家看明白了没有，如果没有我们在举个例子，主要因为range分区在工作中使用较多，所以再次在举个例子： mysql>

Create database test2

Mysql > CREATE TABLE employees (= > create an employees table

Id INT NOT NULL, = > ID number is × × × cannot be empty

Fname VARCHAR (30), = > Last name

Lname VARCHAR (30), = > name

Hired DATE NOT NULL DEFAULT '1970-01-01-01, = > date of employment, cannot be empty

Separated DATE NOT NULL DEFAULT '9999-12-31 hours, = > date of departure, cannot be empty

Job_code INT NOT NULL, = > Job number of employee job, cannot be empty

Store_id INT NOT NULL = > Store ID number, cannot be empty

)

Partition BY RANGE (store_id) (= > Partition range is subject to store ID

Partition p0 VALUES LESS THAN (6), = > p0 including id numbers less than 6

Partition p1 VALUES LESS THAN (11), = > p1 including less than 11ID number

Partition p2 VALUES LESS THAN (16), = > p2 including less than 16ID number

Partition p3 VALUES LESS THAN (21) = > p3 including ID numbers less than 21

);

The above zoning shows store 1-5; the working employees are saved in the p0 area, 6-11 store employees are saved in the p1 area, and the other analogy

But it is important to note that if an employee is transferred from other regions, such as 81grad, "llllum", "xxffy,"1998-06-25," 2001-26-25, "25", can you join us? Of course you can and it's still in the p2 area, which needs to check his store id number 13, so you can. So let's verify OK.

Inserted successfully

If there is another store, can the employee with ID number 25 join?

Insert failed, because there are no rules to include stores with store_id greater than 20, the server will not know where to save the row, which will result in an error. To avoid this error, you can create a maxvalue partition where all records that are not within the specified range are stored in the same partition as maxvalue.

We can modify the secondary partition through the sql statement:

Mysql > alter table employees add partition (partition p4 values less than maxvalue)

You can see that the modified partition can be joined by employees from the store with an ID of 25.

After introducing the range partition, let's make a summary of range:

Assign multiple rows to a partition based on column values that belong to a given contiguous interval. These intervals should be continuous and should not overlap each other.

And set maxvalue in the last area to prevent content larger than the area from being inserted.

2) list partition

More similar to range partition, the difference is that the value of range is continuous, and list is a collection of scattered values in a row, perhaps we do not quite understand, we will give you an example.

LIST partitions are implemented by using "PARTITION BY LIST (expr)", where "expr" is an expression based on a column value and returns an integer value, and then defines each partition as "VALUES IN (value_list)", where "value_list" is a comma-separated list of integers.

We also use the store as an example to demonstrate, which is in contrast to range:

First create the library

Mysql > create database tty

This makes it easy to add or delete employee records in the established area in the table. If the pNorth store closes down, you need to delete all the employee records now. You can delete them using "ALTER TABLE employees DROP PARTITION pWest;", which is much more effective than the DELETE (delete) query "DELETE query DELETE FROM employees WHERE store_id IN".

Next, I would like to introduce you to the other three ways of zoning: as an understanding of

3) HASH partition

This mode allows DBA to compute the Hash Key of one or more columns of the table and finally partition the data regions corresponding to different values of the Hash code.

The purpose of hash partition is to distribute the data evenly among the predefined partitions to ensure that the amount of data in each partition is roughly the same. Partition; MYSQL does this automatically by specifying a column value or expression and specifying the number of partitions that the partitioned table will be divided into.

Next, an example is given:

Mysql > create table t_hash (an int (11), b datetime) partition by hash (year (b)) partitions 4

Next, let's insert the point data and verify it:

So take a look at which partition MySQL automatically inserts this data into?

We can also view the tree structure of the t_hash table in the ll library through the system database information_schema:

Four partitions should have been created earlier, so four will be displayed here.

There is only data in the p2 table, and none of the other three have data.

4) key partition:

Key partition is similar to hash partition, except that hash partition is partitioned by user-defined functions, key partition is partitioned using functions provided by mysql database, and internal hash functions are used for other storage engine mysql.

The sql statement created is:

Mysql > create table t_key (an int (11), b datetime) partition by key (b) partitions 4

Insert a sql statement for it:

Let's take a look.

In the above four partitions of RANGE, LIST, HASH and KEY, the condition of the partition must be × × ×. If it is not × × ×, it needs to be converted to × × × by function.

5) columns partition

Mysql-5.5 began to support COLUMNS partitions, which can be regarded as the evolution of RANGE and LIST partitions. COLUMNS partitions can be partitioned using non-× × data directly. COLUMNS partitions support the following data types:

All × ×, such as INT SMALLINT TINYINT BIGINT. FLOAT and DECIMAL do not support it.

Date types, such as DATE and DATETIME. The remaining date types are not supported.

String types, such as CHAR, VARCHAR, BINARY, and VARBINARY. BLOB and TEXT types are not supported.

COLUMNS can use multiple columns for partitioning.

Finally, I will make a complete summary for you, and introduce the differences between subtables and partitions from different aspects.

What's the difference between mysql sub-table and partition?

1. In the way of implementation

A) the sub-table of mysql is a real sub-table. After a table is divided into many tables, each small table is a complete table, corresponding to three files, a .MYD data file, a .MYI index file, and a .frm table structure file.

B) Partition is different. After a large table is partitioned, it is still one table and will not become two tables, but it has more blocks to store data.

2. Data processing

A) after the sub-table, the data is stored in the sub-table, the total table is just a shell, and the access to the data occurs in a sub-table.

B) Partition, there is no concept of sub-table, partition only divides the file storing data into many small pieces, the table after partition is still a table, the data processing is still done by itself.

3. Improve the performance

A) after dividing the table, the concurrency ability of the single table is improved, and so is the performance of disk Icano. Why is the concurrency improved? because it takes less time to search once, and if there is high concurrency, the total table can divide the concurrency pressure into different small tables according to different queries.

B) mysql put forward the concept of partition, which mainly wants to break through the disk I / O bottleneck and improve the read and write ability of the disk, so as to increase mysql performance.

At this point, the testing focus of partition and sub-table is different, the focus of sub-table is how to improve the concurrency ability of mysql when accessing data, while partition, how to break through the read and write ability of disk, so as to achieve the purpose of improving mysql performance.

4. The degree of difficulty of realization

A) there are many ways to divide a table, and using merge to divide a table is the easiest way. This approach is similar to the difficulty of partitioning and can be transparent to the program code. If you use other sub-table methods, it will be more troublesome than partitioning.

B) partitioning implementation is relatively simple, creating partitioned tables, building normal tables is no different, and is transparent to the open code side.

What is the relationship between mysql subtables and partitions?

1. Both can improve the performance of mysql and have a good performance in the state of high concurrency.

2. Sub-tables and partitions do not contradict each other, and can cooperate with each other. For those tables with large access volume and more table data, we can adopt the combination of sub-tables and partitions. For tables with small access volume but a lot of table data, we can adopt partitioning and so on.

3. The sub-table technology is troublesome, it needs to create the child table manually, and the app server needs to calculate the child table name when reading and writing. It is better to use merge, but you also need to create union relationships between child tables and configure child tables.

4. Compared with the sub-table, the table partition is easy to operate, and there is no need to create child tables.

Read the above MySQL big data how to optimize and decompose the introduction, hope to bring some help to everyone in the practical application. Due to the limited space in this article, there will inevitably be deficiencies and areas that need to be supplemented. You can continue to pay attention to the industry information section and will update your industry news and knowledge regularly. If you need more professional answers, you can contact us on the official website for 24-hour pre-sales and after-sales to help you answer questions at any time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.