Pacific Insurance Home big data project DSG application (more than 30 Oracle and other real-time synchronization to KAFKA) 04/16 Update SLTechnology News&Howtos

Pacific Insurance Home big data project DSG application (more than 30 Oracle and other real-time synchronization to KAFKA)

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Pacific Insurance Group

"Home Project" big data platform DSG Application (oracle&kafka)

Project background

According to the IT construction plan of Pacific Insurance Group, the "one Pacific Insurance, Common Home" project (referred to as Home Project) needs to be completed by the end of 2017, which aims to provide customers with more portable and comprehensive services, which can be completed through a home platform.

As we all know, Pacific Insurance has a wide range of business, including property insurance, life insurance, car insurance and other business. at the same time, a type of insurance is jointly provided by multiple systems. Now to complete these services on one platform, the aggregation, centralization and transformation of data have become the core and difficulty of the whole project.

Project requirements

According to the ultimate goal of the Taibao Home Project, in the first phase of construction, it is necessary to centralize the data of more than 30 core systems, such as life insurance, property insurance, and car insurance under the Pacific Insurance Group, on the big data platform by means of real-time synchronous replication. It involves a series of processes, such as data conversion, standardization, cleaning, de-duplication and so on. The specific requirements are as follows:

1. The core data needs to be synchronized from more than 30 systems to the kafka component of big data platform.

2. ensure the real-time performance of data replication (seconds) and the accuracy of data

3. The copied data needs to be tagged with time, operation type and other tags to facilitate the identification of back-end applications.

4. Take the dg library of the production environment as the data aggregation source to reduce the impact on the production database.

5. The data format entering kafka can be flexibly configured to better adapt to the back-end application.

6. It is necessary to have the functions of data operation statistics and data comparison to facilitate checking the accuracy of the data.

Project difficulty

In realizing the data aggregation of the whole home project, according to the requirements of the project and the actual production environment, there are mainly the following difficulties in order to complete the whole data synchronization:

There are many business systems involved. According to the preliminary plan, there are more than 30 core production systems that need to be connected to this platform, including oracle, mysql, db2 and so on. The basic platforms and data formats of each system vary greatly, and the amount of data is large. At present, the data capacity of the whole platform is more than 30T. And the source business system is a very strict 7x24 hourly system, which brings great difficulty to initialization. Network bandwidth resources are limited. The production environment data is in the Shanghai data center, and the big data platform is in the Chengdu data center. The network bandwidth in the middle is shared by all business systems, so you can't take up too much bandwidth resources. There is a lot of business. The daily archiving volume of the database is more than 800G, and the core tables that participate in replication have hundreds of thousands of transactions per second. The delay time is short. As the home platform needs to provide customers with real-time business consultation and management services, the replication delay can not exceed 10s, otherwise, the experience of users will be greatly reduced, which goes against the original intention of home project construction.

High accuracy of data is required. The home platform carries all the queries and part of the business processing. If the data is not accurate, it will inevitably lead to business logic confusion, unable to provide services for users and other problems.

Solution

In this scheme, DSG SuperSync products are used to complete the data replication from oracle to kafka, and the scheme architecture is shown in the figure above. In the system architecture of Taibao, the production center is located in Shanghai and the disaster preparedness center is located in Chengdu. All core systems have a first-level DG library in the local production center and a second-level DG library in Chengdu disaster preparedness center. At the same time, the big data Center of this project is also located in Chengdu disaster Preparedness Center. Based on this architecture, the total synchronization with a large amount of data is placed on the secondary DG database in Chengdu, which can save the bandwidth resources from Shanghai to Chengdu and improve the synchronization efficiency. At the same time, incremental synchronization is placed in the local first-level DG library in Shanghai to meet the requirements of real-time synchronization.

Scheme advantage

This scheme has the following advantages:

From the architecture level, relying on the perfect support of DSG products to heterogeneous platforms, synchronizing all data to cluster hdfs and incremental data to kafka, it well solves the problem of limited network bandwidth resources in the two data centers. In order to reduce the pressure on the production database, the DG library of the production database is supported to be used as the source end for data replication. Through the cjson template, the data format that can be highly customized into kafka can customize the output data content. After adding, deleting and modifying the collected data, the data can be delivered to kafka for verification. For the data delivered to kafka, the operation data will be recorded in three dimensions: details, timing statistics, and cumulative statistics, and the records will be stored in specified locations, such as database, hdfs or file system, so that subsequent businesses can check back data operations and achieve data verification. DSG SuperSync software supports fast synchronization between Oracle databases on different platforms, including first data synchronization and incremental data replication. DSG SuperSync uses a completely logical way for data synchronization, which can span different platforms; and in the process of data synchronization, it uses DSG's unique XF1 file format, data stream compression technology and fast data extraction and loading technology. In the case of configuring multiple synchronization channels, the data in the existing database can be quickly synchronized to the target database, and then the incremental data during the synchronization period can be copied to the target database to achieve data leveling. DSG SuperSync currently supports data replication between versions of Oracle (Oracle8i-10g) on mainstream platforms (HP/IBM/SUN/Comppaq/PC).

The data replication efficiency of DSG SuperSync products is the highest in this field. On the delivery side of kafka, the delivery can be accelerated by means of multi-thread and multi-concurrency, and the field efficiency can reach 20, 000 messages per second.

Introduction to DSG

DSG is a leading professional manufacturer dedicated to data storage management, providing excellent big data management software and solutions such as data security, disaster recovery, data extraction and sharing, data archiving and retrieval, and integrated management platform. Products include: backup, disaster recovery, synchronous data replication / extraction / sharing, data archiving, data audit, etc., which have been widely used in China. At present, the company has nearly 300 employees, 3 R & D centers, more than 20 offices and branches, service outlets covering the country, and hundreds of high-end users in telecom, financial and government industries in the Chinese market.

SuperSync data synchronous replication software application: (more than 800 domestic customers, in addition to the original powerful Oracle real-time synchronous replication / disaster preparedness, it can also support real-time synchronous replication between domestic and foreign databases such as Mysql/Sql/DB2/PostgreSql/Hana/Qcubic/Redis/Teradata/ tide K-DB/ Dameng / NTU Gbase and Hadoop, HBase, Phoneix, Storm, Flume, Spark, Kafka, tibc, Ali Cloud. It can be customized according to kafka and other format requirements (adding fields / data conversion / classification, etc.), which can be used in big data sharing, read-write separation and real-time disaster recovery.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.