In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article focuses on "do MySQL data tables need to be synchronized across clouds?", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "do MySQL data tables need to be synchronized across clouds?"
A background
The important goal of disaster recovery system is to ensure the "continuity" of system data and services. When the system fails, the disaster recovery system can quickly restore the service and ensure the validity of the data. In order to prevent natural and man-made disasters and force majeure, a corresponding IT system is established in the same city or in different places, in which the core work is data synchronization.
In the scenario of application layer disaster recovery, this paper discusses which data tables need to be synchronized across clouds and which do not need to be synchronized across clouds. Through a specific case, to help readers better sort out the synchronization table and filter table methods, in order to meet the application layer business disaster recovery needs.
II related terms
The scenario discussed in this article is an application-layer disaster recovery based on Ali Cloud, which involves the following key terms:
RDS MySQL:MySQL is one of the most popular open source databases in the world. As an important part of the open source software combination LAMP (Linux + Apache + MySQL + Perl/PHP/Python), it is widely used in various application scenarios. Aliyun RDS MySQL provides extremely stable database performance through in-depth kernel optimization and exclusive instances, as well as flexible deployment architecture and product form to meet database requirements in different scenarios.
DTS: data transfer service (Data Transmission Service) supports data transfer between relational databases (MySQL, etc.), NoSQL, big data (OLAP) and other data sources. It is a data transmission service that integrates data migration, data subscription and real-time data synchronization. Data transmission aims to solve the problem of long-distance and millisecond asynchronous data transmission in public cloud and hybrid cloud scenarios. Use data transfer to easily build a secure, scalable, highly available (disaster recovery) data architecture.
ASR:ASR-DR (Apsara Stack Resilience Disaster Recovery) is a cloud product that provides disaster recovery capabilities and supports disaster recovery management of RDS MySQL. ASR is a switching tool based on graphic interaction, which is developed to quickly achieve disaster recovery switching and reduce RTO as much as possible when a disaster occurs.
Synchronized tables: this article specifically refers to which tables in RDS MySQL databases and datasheets must be backed up from one cloud to another, that is, cross-cloud synchronization.
Filter tables: this article specifically refers to which tables in RDS MySQL databases and datasheets cannot or do not need to back up from one cloud to another.
Application configuration table: this article specifically refers to the data table of the application layer in RDS MySQL, which records the relevant configuration information of the application layer, such as IP, domain name, switch status of timing tasks, and so on.
Sequence: globally unique serial number ID, widely used in distributed systems, can be used for transaction serial number, user ID and so on. It is of great significance in many aspects, such as searching, storing data, speeding up retrieval and so on. This ID is often the primary key of the database, requiring global uniqueness, support for high concurrency, and fault tolerance for a single point of failure. In order to improve performance, the application layer usually takes a batch of serial numbers (such as 10,000) from the database at a time and stores them in the application memory to avoid frequent access to the database. After the serial numbers in memory are used, get a new batch of serial numbers from the database again.
3. Key technical problems of filtering tables in disaster recovery applications.
Why do you need to sort out filtering tables that do not do cross-cloud synchronization?
Non-disaster-tolerant application
Resource restrictions of the slave center: in the actual project, due to the resource constraints of the slave center, the application system cannot be deployed in the slave center, so the databases and data tables corresponding to non-disaster recovery applications do not need to be synchronized.
Operation and maintenance temporary backup database and backup tables do not need to be synchronized: in daily operation and maintenance, DBA usually makes temporary backups when making changes to the database. Temporarily backed up databases or tables. Since Aliyun RDS MySQL cluster itself has been backed up in the background, there is no need for users to do another cross-cloud synchronization. This can reduce the bandwidth of the synchronous link and the management workload of disaster recovery handover.
Disaster recovery applications are not supported: the disaster recovery capacity building of cloud products is a continuous process. Some cloud products do not have disaster recovery capabilities at the project delivery phase, but users' applications rely on these specified cloud products. Therefore, this part of the application is temporarily unable to do disaster recovery exercises, and the corresponding databases and data tables can not be synchronized for the time being. The cloud products that depend on the whole process of the application all support disaster recovery, and then synchronize the data.
Differential configuration table
The way to apply configuration: in order to manage the code and configuration separately, the application system usually stores and manages the configuration parameters separately. The common configuration forms are configuration files, RDS MySQL database and dedicated configuration center, in which RDS MySQL is also used to store data in the background of dedicated configuration center. The more taboo way is to hard-code configuration parameters in the code, such as IP, domain name and so on.
Environmental parameters: when using cloud products such as RDS MySQL, OSS, SLB and other products, application software needs to connect through IP, domain name, account password and AK/SK.
Application parameters: some functions can only be performed in a central application, and these function switches are controlled by certain field values in the data table. For example, some scheduled tasks will make batch calls with external agencies on a regular basis. If the scheduled tasks of the two centers run at the same time, it may cause the batch execution of the external organization to be repeated, which depends on whether the external organization can support the repeated execution of the same batch task. The configuration tables for these scheduled tasks need to be configured separately in two centers.
The configuration of disaster recovery in the same city: the environmental parameters in point 2 are the same by default. The distance between the two centers of a cloud in the same city is relatively close (less than 100 km). The application is deployed in two availability zones of the same cloud, and the connection information of cloud products is the same. Therefore, when the application software is deployed, it accesses the same environment parameters. In this scenario, there are few environmental parameters that need to be sorted out.
The configuration of remote disaster recovery: there are differences in the environmental parameters in the second point. The two centers of the two clouds in the same city are far away (more than 100 kilometers). The applications are deployed in the two availability zones of the two clouds, and the connection information of cloud products is different. Therefore, when the application software is deployed, it accesses different environment parameters. In this scenario, each application needs to sort out the different environment parameters separately. The data tables with different environment parameters cannot be synchronized across clouds, otherwise the deployment of the application system will fail.
Business tables that need to be double-written
Double write scenario: a) Business traffic is processed at the same time in two centers, which is called application layer double activity, and data tables need to be written to the two centers at the same time. B) record the call log of the microservice during the run time of the application. Ideally, the application will record data to the database only when there is business traffic being processed. In the actual project, the business will also have special circumstances. In the application of the slave center, even if there is no traffic request, some logs will be written regularly, such as micro-service call log, timed task log, update global unique serial number Sequence when the application starts, and so on. In the double-write scenario, the RDS MySQL of both the master center and the slave center is required to have read and write permissions.
Double-live scenario in the same city: in the double-active architecture of one cloud in the same city, the master center and slave center provide unified cloud product connection information to the application layer, and both applications have the permission to write to RDS MySQL.
Remote master / slave scenario: in the master / slave architecture of two clouds, the master center RDS MySQL provides read and write permission to the application layer, while the slave center RDS MySQL provides read-only permission to the application layer. This permission policy does not meet the double write requirement in point 1. Therefore, for a double-written table, you need to sort out and filter the table according to the application dimension.
How to sort out data tables that do not do cross-cloud synchronization?
In the project, we will find that application software developers pay more attention to the implementation of business logic, and their understanding of the best practices and disaster recovery capabilities of cloud products may be different from what we expected. The carding filter table is mainly implemented by application developers, and there are several common problems in the carding process.
During design and development, what should developers do to reduce out-of-sync filtering tables during disaster recovery?
During the deployment and operation and maintenance period, from what point of view should the operators ensure the integrity and correctness of the filter table?
If the carding is wrong, what is the impact on the application layer disaster recovery drill?
In the project, it is often limited by the time limit and the stable operation of the production system, so even if the application developers and cloud platform manufacturers are clear about the best practices of design and development, it is difficult to complete the transformation in a limited time. Therefore, during the deployment and operation and maintenance period, combing the filter table and preparing the emergency plan are the key work items of the disaster recovery drill.
Let's analyze, if there is a filter table error, what impact may it have on the application layer disaster recovery?
Impact on non-disaster recovery applications:
Almost no effect. As analyzed earlier, it is recommended that non-disaster recovery applications do not need to do data backup, or backup center applications do not use them for production purposes.
Impact on disaster recovery applications:
After the slave center deploys the application, it fails to start the application, and the wrong environment parameters can be identified at this time. The response is to stop the synchronization of the corresponding data tables, modify the read and write permissions and continue to deploy.
When testing the function, the slave center focuses on the scenarios where backend scheduled tasks and non-business requests write RDS MySQL, and modifies the list of filter tables during the test phase.
Do a disaster recovery switching exercise for the operation period of the production system. In the remote disaster recovery architecture, incorrect filtering table lists may lead to database primary key write conflicts, resulting in write business failure. At this time, it can be restored by emergency plan, emergency stop or increase synchronization function or modify data table field value and restart application mode. Fix the filter list before the next drill. This scenario will be briefly illustrated with a case later in this article.
Fourth, design non-synchronous data tables in application disaster recovery.
We have already introduced the necessity of applying which tables are out of sync in disaster recovery, and in this section we will explore how to comb and set up filter tables. The following analysis is ideal, and there will be some differences in the actual project.
Cloud platform Angle
Understand cloud platform capabilities: currently, mainstream cloud platform vendors have RDS MySQL products, but each manufacturer's RDS MySQL has different disaster recovery capabilities in multi-availability zones in the same city and multi-Region in different locations. Users need to know that the data synchronization capability of each cloud vendor is automatically completed in the background in the same city and in different places. Or use tools (such as Aliyun's DTS)? Or do you write the script manually?
How to configure filtering tables: Aliyun DTS products support configuring which databases and data tables are out of sync when creating synchronous links for RDS MySQL instances.
Automatic configuration filter table function: during the disaster recovery drill, master and backup are involved, so the direction of data synchronization is reversed, which is called forward synchronization and reverse synchronization. When a synchronization direction reversal occurs, the disaster recovery switching platform is required to support automatic configuration of filter tables. Aliyun ASR-DR supports saving the list of filter tables when creating a synchronous link for the first time, and ASR-DR automatically configures the filter table for the new link each time the synchronization direction is switched.
The following are the documents exposed by Aliyun data transfer service DTS products.
Application layer Angle
Next, we analyze how to effectively deliver application software based on cloud disaster tolerance from several stages of comparative concern of application developers.
1. Design phase:
The design idea based on cloud disaster recovery. Consider that the application will be deployed on two or more clouds in the future, possibly on cloud platforms of different vendors. Therefore, in the early disaster recovery architecture based on IOE architecture, the data layer synchronization completed by professional storage hardware will not be applicable in multi-cloud scenarios, and Oracle expensive license is also difficult for many enterprises to accept.
Consider setting aside identification parameters for each cloud and each center to indicate which cloud the current configuration applies to. The configuration center uniformly manages which cloud's parameters take effect in the current running environment, and the application code does not need to pay attention to which cloud it is running on.
Identify which scenarios can only run on one of the clouds, and arrange switches for these functions. By configuring the center and setting the switch to be dynamically configurable and effective. Focus on scheduled tasks.
It is recommended to put the operation of these function switches on the white screen interface, so as to allow operators to operate quickly during the limited and urgent time of disaster recovery switching, instead of calling around to ask people. Turn off a scheduled task in which library, which table which field to control the switch.
Record the filter list and update it in a timely manner.
two。 Development phase:
Priority is given to the configuration Center to save the parameters. In the actual project, there are many ways to save the configuration, including configuration center, configuration file, RDS MySQL, and even directly encode an address and account password in the code. Aliyun EDAS products provide configuration center features, which support dynamic configuration, static configuration, and dynamic push after configuration changes, without the need for application restart to take effect.
The address of the configuration center itself can be recorded in the application's configuration file, and the configuration file and the application can be packaged and released together. Because the configuration center service rarely changes after deployment.
If configuration Center is temporarily unavailable, you must use RDS MySQL to manage the configuration. It is recommended that configurations that record different cloud environment parameters be placed in a separate data table, and configurations that provide function switches separately should also be placed in a separate data table and should not be coupled with the business table. The advantage is that it makes it easier to manage filter tables. Focus on the domain name, IP, account password and AK/SK of cloud products.
3. Deployment phase:
Operators and developers, confirm the reason why each filter table is selected, and what is the business basis behind it? Focus on whether there are more filter tables.
Log in to each database and check whether the disaster recovery switching platform ASR-DR sets the filter table as expected. When there are hundreds of filter tables, omissions or errors are easy to occur.
Create conditions to verify business functions in advance on the slave center, focus on whether the filter table scenario is in line with expectations, and focus on whether scheduled tasks run on only one center.
4. Operation and maintenance phase:
Configuration changes are performed on the filter table on both clouds at the same time. When the filter step table is changed on the main center, such as adding fields or adjusting field types, the slave center is not aware of it and needs to make the same modification on the slave center manually. Otherwise, after the disaster recovery is switched to the slave center, the application error will occur because the table has not been updated.
The filter table is restored to the synchronized table. Early carding filter table list was wrong, more filter tables were configured, and later verified that synchronization was needed. It is necessary to re-synchronize the full data of the data table and modify the flag of whether the table is synchronized or not on the disaster recovery management platform ASR-DR.
The synchronization table is changed to the filter table. For early synchronized tables, due to business adjustments, there is no need to synchronize later. You need to modify the flag of whether the table is synchronized on the disaster recovery management platform ASR-DR and on the disaster recovery management platform ASR-DR.
The following figure shows the configuration logic of synchronization table and filter table under remote disaster recovery master / slave architecture.
Five cases
In the following analysis of a remote disaster recovery project, the error in sorting out the filter table list leads to business exception problems and handling experience, which makes it easier for readers to have a better sense of whether data tables need to be synchronized across clouds.
(1) problem description
After the disaster recovery platform ASR-DR of Aliyun performs a disaster recovery switch for an application (RDS MySQL read and write permission is switched from Cloud A to Cloud B), when the business request is in the slave center (Cloud B), the business reports an error and the database prompts "primary key conflict".
(2) problem analysis
According to the order of problem processing, we analyze the problem positioning process.
1. Analyze database error "primary key conflict":
Confirm that the conflicting field value is the transaction serial number ID. Check the business data table to confirm that the transaction information for this ID already exists.
two。 Analyze the business request path:
Analyze whether double writes are caused by abnormal traffic scheduling in the access layer. In the master / slave architecture of remote disaster recovery, through the GSLB control of the global load balancer device in the access layer, it is ensured that only the main center has service request traffic, and the slave center has no service request traffic. Therefore, the suspicion of primary key conflict caused by dual-center business double writing can be ruled out.
Analyze whether there is a cache in the main center application layer to delay writing data after the master / slave handover. In the master / slave architecture, the disaster recovery platform ASR-DR platform ensures that the RDS MySQL database permission of the master center is set to read-only, and then the read and write permissions to RDS MySQL are opened to the applications of the slave center. Even if there is cache write delay in the application layer of the main center, after the disaster recovery switch, the main center application does not have the permission to write data, and there will be no double write scenario. Rule it out.
Analyze whether the serial number has been used before switching for disaster recovery. Log in to the database of the main center and check that the current available range of the serial number field is [90000000000, 18446744073709551615], indicating that serial numbers less than 90000000000 have been used. The current serial number 80000000000 that prompts the primary key conflict already has a corresponding transaction record in the business table. So confirm that the transaction number has been used in the main center.
The application of the analysis standby center obtains the record of the serial number. From the application log, we can see that when the slave center application is started for the first time, it gets the latest serial number, and then it does not get the latest serial number from the database. At the same time, check the memory value of the application and find that the slave center is currently using the serial number range [80000000000,80000009999]. Obviously, this is an expired serial number.
Conclusion: the slave center application uses the expired transaction serial number ID, which leads to a "primary key conflict" in writing to the database.
3. Analyze the problem introduction process:
Analyze the process of obtaining the serial number of the application: when the application is started for the first time, get 10,000 available serial numbers from the database, and update the memory values of the database and the application.
Analyze the data synchronization mechanism on the main and standby centers: the data table xx_table, which manages the global unique sequence number, can ensure real-time data synchronization between the two centers through the data synchronization tool DTS, and when updating the database sequence number, lock the database to prevent inconsistency. In theory, it will not be possible to get the same serial number on the active and standby center.
Analyze whether the xx_table content of the data table on the master / slave center is consistent: it is found that the available range of the serial number on the master center is [90000000000,18446744073709551615], while that of the slave center is [80000010000, 18446744073709551615]. The two are not consistent, indicating that the data tables are not synchronized.
Check the data synchronization tool DTS: it works fine and no errors or faults are found.
Check the filter table list: the data table xx_table that manages the global uniqueness sequence number should be synchronized across the cloud, but it is configured to filter the table, causing the data not to be synchronized.
Check the carding process of the filter table: in the preparation phase before the disaster recovery drill, after the operation and maintenance personnel deployed the application in the standby center, the business staff failed to verify the function transaction. The reason for the failure is that the initialization of the transaction function failed because the application failed to write to the database after obtaining the serial number, indicating that there was no write permission. Under the master / slave architecture, the master center application has read and write access to RDS MySQL by default, and the slave center has read-only permission to RDS MySQL. Some permissions are needed when the slave center starts, so the business staff adds the data table xx_table that manages the global unique sequence number to the list of non-synchronous filter tables, resulting in that the table is not synchronized from the master center to the slave center.
Problem conclusion: the data table xx_table, which manages the global unique sequence number, has been mistakenly added to the filter table list that does not do cross-cloud synchronization.
emergency measure
Manually correct the valid range of the serial number in the data table xx_table of the slave center to the correct [90000000000,18446744073709551615].
Restart the application software of the standby center, triggering the application to retrieve the serial number.
Improvement measures
Synchronization data: the data table xx_table that manages the global uniqueness sequence number needs to be synchronized. Remove xx_table from the filter table list to ensure that the valid sequence number range of the master / slave center is consistent.
Application modification: when the slave center has read-only permission for RDS MySQL, it fails to update the serial number, and the application initialization is successful. After the disaster recovery switch, the slave center obtains the RDS MySQ read and write permission, and the business request triggers to re-obtain the latest serial number on demand.
Test results:
After the synchronization data between the main center and the standby center is completed, the synchronization link is disconnected and the standby center database is manually set to read-only.
Redeploy the modified application and verify that the application starts successfully and the business request fails (as expected) in read-only mode.
Manually set the slave center database to read and write, and the business request is successful. Check whether the application has successfully regained a valid serial number.
Reconfigure the primary and standby center data synchronization links.
Disaster recovery drill: conduct another exercise to verify the full-service scenario.
Before improvement
After improvement
At this point, I believe you have a deeper understanding of "do MySQL data tables need to be synchronized across clouds?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.