Practical information: detailed explanation of five common data replication techniques 04/20 Update SLTechnology News&Howtos

Practical information: detailed explanation of five common data replication techniques

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

According to the relevant data statistics of IDC, the global data replication and storage market has exceeded 50 billion US dollars in 2018, and the market of data backup and recovery software, which is an important application scenario of data replication technology, is also considerable. Today, let's talk about five common data replication techniques.

Replication is the technique of copying a set of data from one data source to one or more data sources. The methods are mainly divided into synchronous replication and asynchronous replication:

1. Synchronous replication: each write operation is required to be completed on both the source side and the destination side before performing the next operation. It is characterized by less data loss, which will affect the performance of the production system, unless the target system is physically close to the production system.

2. Asynchronous replication: do not wait for data to be replicated to the target system before processing the next operation. The characteristic is that there is a time difference between the replicated data and the source data, but this replication has little impact on the performance of the production system.

In the design of disaster recovery plan, the choice of data replication technology is related to the final disaster recovery effect, that is, the value of RTO and RPO. According to the application of data replication technology in different system layers, it can be divided into the following five categories:

1. Host-based data replication technology

The host-based data replication is carried out through the mirroring or replication of disk volumes, and the business is carried out in the volume manager layer of the host, with little restrictions on hardware devices, especially storage devices. Using the host system of production center and backup center to establish data transmission channel through IP network, the data transmission is reliable and the efficiency is relatively high. Remote data replication is realized through the host data management software. when the data of the main data center is destroyed, the application can be restored from the backup center or the data can be restored from the backup center at any time.

Host-based data replication does not need to use the same storage device on both sides, so it has greater flexibility, but the disadvantage is that the replication function will occupy the CPU resources of some hosts and have higher requirements for software (many software can not provide point-in-time snapshot function), which has a certain impact on the performance of the host.

During the implementation of the unique byte-level data capture and replication technology based on the operating system level, the British side will first do the initial data mirroring, and then, through the core replication engine, begin to monitor the write operations of all file systems, such as Rename, SetAttr, etc., which can be accurately captured and transmitted asynchronously to the disaster recovery side through the data serialization transfer technology (Data-Order Transfer, referred to as "DOT"). Complete the whole process of data capture and replication.

Schematic diagram of △ byte-level replication technology

First of all, when the core engine of byte-level replication works, there is no complex mathematical operation, and the consumption of computing resources on the production machine can be ignored, only by-pass to capture data.

Second, all the data is captured from memory and does not involve the read operation of the storage of the production host, so the data replication process does not consume the storage IO resources of the host.

Finally, the granularity of data replication based on byte level is as small as bytes, so the requirement for bandwidth resources is very low, and it is a replication method suitable for remote, long-distance and future-oriented hybrid IT environment and cloud architecture.

2. Data replication technology based on application and middle tier.

The data replication at the application level can write synchronously or asynchronously with the database of the main and standby center through the application program, so as to ensure the consistency of the data of the main and standby center. The disaster recovery center and the production center can operate normally at the same time, which can not only recover disasters, but also achieve partial function sharing. However, the implementation of this technology is complex, directly related to the business logic of the application software, and it is difficult to implement and maintain. And the use of application-level data replication will increase the risk of the system and the risk of data loss.

Independent of the underlying operating system, database and storage, the application can achieve double-write or multi-write according to the requirements, so as to achieve the data replication function between the master and multiple data copies. This technology implemented by applications can be encapsulated and implemented at the middleware or application platform level, transparent to the above applications, and can also be implemented at the application level.

Its main advantage is that it can be customized according to demand and can achieve replication at the application and database level; the main disadvantage is that there are no mature middleware products suitable for large-scale promotion and use of traditional IT enterprises in the market. If it is completely implemented by the application encapsulation platform or the application, the complexity of the code will be increased and the maintenance cost of the application will be increased.

3. Data replication technology based on database.

The replication technology based on database software includes physical replication and logical replication.

Logical replication is to use the redo log and archive log of the database to transfer the log of the site where the master is located to the site where the copy is located, and to achieve data replication by redoing SQL. Logical replication only provides asynchronous replication, and the final consistency of the master copy data cannot guarantee real-time consistency.

Physical replication is not based on SQL Apply operation to complete replication, but through synchronous or asynchronous persistent writing of redo log logs or archive logs at the replica site, and the data of the replica site can provide read-only function.

Open platform database replication technology is a structured data replication technology based on database log (log). It obtains data additions, deletions and changes by parsing the source database online log or archiving log, and then applies these changes to the target database to synchronize the source database with the target database, so as to achieve the purpose of double or even multi-activity of the database between multiple sites, and the continuous availability of business and disaster recovery.

Log Analysis Technology of △ Database

Database-based data replication is a basic technology that is highly available for database record-level and table-level disaster tolerance. British database disaster recovery technology combines the advantages of host replication and database log analysis, and improves the flexibility of system applications. it can not only achieve multi-activity of database applications, but also greatly reduce the incremental data transmission of database applications. It still has a broad application prospect in the fields of fine-grained data disaster recovery and wide-area cloud disaster recovery.

The real-time data synchronization at the semantic level of the database, when the database is in normal use, automatically completes the initialization and full replication of the data from the source end to the standby end, and monitors and synchronizes the incremental data in real time. The state transition and conditions of the normal process are as follows:

British i2Active is a real-time Oracle data replication tool based on redo log log analysis technology, which has the characteristics of simple and flexible, high performance, non-intrusive, low impact, less than second delay, low cost, and easy to deploy and use. It can help users to complete Oracle disaster recovery backup, data migration, business data distribution, construction of large-scale data warehouse and other technical data integration in complex application environment.

Semantic level replication of △ Active database

4. Data replication technology based on storage system gateway.

Storage gateway, which is located between server and storage, is a dedicated storage service technology based on SAN network. This technology is based on storage virtualization technology.

Direct definition of storage virtualization: the transparent abstraction layer of storage resources formed in storage devices, that is, storage virtualization is an abstraction layer between servers and storage, which is the logical representation of physical storage. Its main purpose is to abstract physical storage media into logical storage space, to integrate scattered and complicated heterogeneous storage management into unified and simple centralized storage management, and to simplify many storage problems faced by people. The process of simplification (including storage read and write mode, connection mode, storage specification or structure, etc.) is storage virtualization.

By providing various data storage services for incoming IO data streams, the storage gateway greatly improves the flexibility, diversity, heterogeneity and other storage services that are difficult to achieve at the server or storage level. Using storage gateways, remote data replication, heterogeneous storage convergence, highly available mirrors of storage devices, snapshot services, data migration services and even partial storage gateways can provide accurate continuous data protection and continuous data recovery services for back-end storage data.

Because the storage gateway offloads the replication workload of servers and arrays, it can run across a large number of server platforms and storage arrays, which makes it an ideal choice for disaster recovery technology in highly heterogeneous environments. In addition, due to the unique advantages of bandwidth optimization and fine data recovery, this technology has also become a more mainstream disaster recovery technology.

The main issue of this technology is the degree of development of performance assurance capabilities. In recent years, with the increasing popularity of SAN applications, the management complexity, low resource utilization and low data service capacity of storage devices in SAN networks caused by heterogeneous storage devices and explosive growth of data have promoted the development and application of storage gateways.

5. Data replication based on storage media

Through the transmission media connections such as the built-in firmware or operating system of the storage system, IP network or fibre Channel, the data is copied to the remote end synchronously or asynchronously, so as to realize the disaster protection of production data.

The construction of disaster recovery scheme based on data replication technology based on storage media is mainly characterized by high requirements for network connection and hardware. Storage-based replication can be a "one-to-one" replication or a "one-to-many or many-to-one" replication, in which one stored data is replicated to multiple remote stores or multiple stored data is replicated to the same remote storage, and replication can be bidirectional.

Based on the realization of direct mirroring between storage disk arrays, storage replication technology makes it possible to copy data to the remote end synchronously or asynchronously through the built-in firmware (Firmware) or operating system of the storage system and using transmission interfaces such as IP network or fibre Channel. Of course, in general, this mode can only be realized between the same storage brand and the same type of storage system controller, and low latency and large bandwidth is also one of the necessary conditions.

In storage array-based replication, replication software runs on one or more storage controllers and is ideal for environments with a large number of servers for the following reasons:

Independent of the operating system; able to support Windows and Unix-based operating systems as well as mainframes (high-end arrays); license fees are generally based on storage capacity rather than the number of connected servers; no administrative work is required to connect to the server.

Because the replication work is handed over to the storage controller, the excessive performance overhead of the server can be avoided when the asynchronous transfer local cache is large, which makes storage array-based replication very suitable for critical tasks and high-end transaction applications.

Summary

In practical work, it can not be said that which kind of technology is necessarily better than another kind of technology, advantage is always a relative concept, in practical application, enterprises need to choose a technical route that is more suitable for their own business scenarios, after all, it is only suitable. Is the best.

Source: British side

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.