In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Today, I would like to talk to you about the MySQL solution for complex data requirements, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.
A few days ago, I dealt with a requirement. At that time, the database environment was Oracle. I thought of all the solutions related to Oracle, and in the process of dealing with the problem, I kept thinking about what other solutions there would be if I failed.
So although Oracle is such a mature commercial database, it is still difficult to do and requires some additional skills, such as circumventing bug, indirectly implementing requirements, and so on.
But from another point of view, MySQL is nothing new for tables with more than 200 million data. What to do if MySQL encounters such a situation.
Sort out the business needs
Assuming that the business requirements remain the same, as follows:
Business students feedback, there is a table in the database with a large amount of data, because to do a phase of activities, you need recent data, the previous old data can be considered to clean up. How much old data do you clean up? it's almost 99%. How big is the data? almost 200 million. So this requirement sounds simple, but business students clearly want to keep the business sustainable, so they have some options for implementation.
This seemingly simple requirement has the following supplementary information. The database is MySQL 5.6and the data volume is 200 million. The data query efficiency is very poor. More than 99% of the data are dirty data and need to be cleaned up. Developers query according to the time range. The data in the table is only insert, not update and delete.
To sum up, there are four things to do:
Optimize the query, which is currently based on the time range. After evaluation, you need to add an index to this table.
Clean up the data, there are 200 million data in the table, but clean up most of the data.
To ensure the sustainability of the business, statistical analysis will be done every 10 minutes, and the data will be entered into the system in real time.
Change the table to a partition table, put the old data into one partition, the new data into another partition, and delete the partition after the change.
Sort out the priority of requirements
In this way, adding an index to the table is a key problem to be solved.
The online DDL function in MySQL is still very good, and version 5.6 support for indexing operations is still very complete.
So the native solution of MySQL online DDL is very good, and it doesn't matter if it is 5.5. there are pt-osc tools and so on.
The main road is simple, and the ideas are the same.
One solution to this is as follows: the data flow is the same as the previous Oracle scheme, but the implementation principle and details are different.
The first thing you need to do is to generate a shadow table serverlog_read into which table data changes to the source database can be synchronized.
Materialized views are not supported in MySQL, so incremental refresh and other solutions will be limited, but the method is always more difficult. There are still some other methods to realize materialized views in MySQL, such as Flexviews, or to achieve requirements in the form of triggers. Here insert,delete,update needs to have trigger conditions, so the pt tool will create 3 triggers by default, and the principle is very similar.
With this materialized view, there is a basic guarantee for caching incremental data, so we also need two auxiliary tables, one is serverlog_par_old, which is a partition table, which retains only one partition, which stores the refresh data found in the materialized view, and the other is serverlog_host, which stores incremental data and real-time input system data.
At this time, there are actually three types of data processing to consider. The first is old data, which can also be understood as cold data, and the second is incremental data. For example, it is specified that nearly a month's data needs to be retained. Then the data within this time range is incremental data, and the third type is real-time data, which will be entered into the system in real time. This data is almost real-time. So the above scheme is to archive the cold data, intercept the incremental data reasonably and have as little impact on the real-time data as possible.
How to switch between 200 million data and 10 million data? MySQL 5.6 also supports exchange partition. So there is no problem to support this operation, after all, there are several ways to play the partition operation. MySQL because of its own storage characteristics, to achieve this requirement is actually more pure.
Finally, there is increment, the supplementary recording of real-time data, using serverlog_hot to supplement the data on the line.
Two additions outside the plan
Two additional points are also two highlights of MySQL in this implementation process.
The first bright spot is that MySQL replication table structure has a unique advantage. We all know that create table xxxx as select xx is not supported in MySQL 5.6, but there are many more excellent ways.
We can rewrite it in the following way:
1.create table test1 like test;-this way you can copy the DDL information completely.
Or use show create table to do this, of course, this is a little bit different, or you can use mysqldump-- no-date to export statements.
two。 Insert data, such as insert into test1 select * from test
The second highlight is the backup and archiving of data, which is simple and complex. For example, we strictly limit the validity of the data, and the old data that is not needed will not be retained in the current database, but in order to achieve the basic backup requirements, we can use rename user to do it. It's a bit complicated to implement rename user in Oracle, but it's very light to implement MySQL. To put it more colloquially, it means to move the data in it to another directory.
To deal with such a demand, there is no doubt that although I am full of information, there are still many difficulties in practice. If we encounter problems and think more and sum up, we will form our own cognitive system and take many detours less.
After reading the above, do you have a better understanding of the MySQL solution for complex data requirements? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.