Data synchronization Porter for Micro Services 04/28 Update SLTechnology News&Howtos

Data synchronization Porter for Micro Services

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Porter is a data synchronization middleware, mainly used to solve the problem of table level data synchronization between homogeneous/heterogeneous databases.

background

The microservices architecture model profoundly affects the relationship between applications and databases. Unlike traditional multiple services sharing a database, each service in the microservices architecture has its own database. If you want to reap the benefits of microservices, a database unique to each service is essential because microservices emphasize loose coupling. We want databases, like services, to be sufficiently independent, deployable, scalable, and reconfigurable with services. At the same time, it is also necessary to take into account the contradictions of data aggregation in the data center, multiple database backups for DBAs, and business reports in the report center. This led to the creation of the "Porter" project.

In the process of microservice transformation, an unavoidable hurdle is vertical library disassembly, which splits the past "one library and multiple services" into "one library and one service" according to different sub-services.

A library of multiple servings or a library of one serving?

Regardless of whether it is a microservice architecture, the various modules of the application need frequent communication, collaboration, and data sharing to achieve the overall value of the system. The difference is that a monolithic application is done through local method calls; in microservices, it is done through remote API calls.

The most cheap way to share data is to adopt the shared database mode, which is the most common way in single application. Generally, there is only one database, such as the way of one library and multiple servers and one library and one server:

The one-server architecture pattern is often considered to be the anti-paradigm of microservices architecture, and its problems are:

Stability: Single point of failure, one database hangs, the entire batch of services stops. Is service independence stifled?

Coupling: data together, will be for the convenience of developers or DBA engineers to write a lot of data between highly dependent programs or tools;

Scalability: It is impossible to accurately optimize or expand a service. The service will be roughly divided into two types: read more and write less, write more and read less. Database optimization is based on the service, not a single one.

Therefore, it is generally recommended to prepare a separate database for each microservice, that is, a library and a service model. This pattern is more suitable for microservice architecture, which satisfies the characteristics that each service is independently developed, deployed and expanded. When one service needs to be upgraded or the data architecture changed, there is no need to affect other services. When a service needs to be expanded, it can also be surgically expanded locally.

Then the problem came. In the transformation, we found that the following problems were born into the project:

SQL Join mode exists in both the report center and the front-end detail page. After splitting our database into one service, SQL Join mode cannot be used anymore.

The data center is doing data aggregation. After data splitting, it brings a lot of trouble to the data center...

After microservices, the requirements of various application modules for the database have diverged, and the database types are diversified and independent.

Wait a minute...

Porter Introduction

Porter is a centralized data processing channel, where all data is aggregated and distributed. Porter is a decentralized, plug-in friendly distributed data synchronization middleware. The default registry plug-in implementation is zookeeper, but you can also implement custom registry modules based on registry interfaces. Plug-in modules such as cluster plug-in, source-side consumer plug-in, source-side message converter plug-in, target-side write plug-in, alarm plug-in, custom data definition plug-in are distributed outside the main process of Porter. Except cluster plug-in and alarm plug-in are global scope domains of Porter task node, other plug-in modules are combined according to different synchronization tasks. Thanks to good design patterns, Porter can present such flexible scalability and ease of use for everyone.

function

Porter started in 2017 and provides data synchronization, but it is not limited to data synchronization. It is widely used in mobile phones. It mainly provides the following functions:

Native support for Oracle| Mysql to Jdbc relational database finally consistent synchronization

Plug-in friendly, support custom source-side consumption plug-ins, target-side loading plug-ins, alarm plug-ins and other plug-ins secondary development.

Support custom source-side, target-side tables and field mappings

Support node configuration based on configuration file synchronization tasks.

Support management background synchronization task push, node, task management. It provides task operation indicator monitoring, node operation log and task exception alarm.

Support node resource flow limiting and allocation.

Distributed architecture based on Zookeeper cluster plug-ins. Support custom cluster plug-ins.

architecture design

Porter nodes implement distributed clustering through the registry, and dynamically expand and shrink according to resource requirements. Porter negotiates a set of tasks, nodes and statistical interfaces with the registry, and Porter nodes manage task allocation by monitoring changes in registry interface data. The configuration management background complies with and implements the interface specification of the registry to achieve remote management of Porter nodes. The registry also has a distributed locking mechanism for allocating task resources.

In addition to this mechanism, Porter nodes can implement task definition through local configuration files.

Principle introduction:

1. Based on Canal open source products, obtain incremental log data of MySql database.

2. Management system architecture. Web manager Manage work node Task scheduling, data Work node Report work progress

3. Distributed architecture based on Zookeeper cluster plug-in. Support custom cluster plug-ins

4. Based on Kafka message components, each table corresponds to a Topic, and data nodes consume work by Topic.

processing flow

In order to ensure the consistency of data, source data extraction and target data insertion are executed sequentially by single thread, and in the middle stage, data processing speed is improved by multi-thread execution. Compared with the above figure, SelectJob and LoadJob are executed in single thread, ExtractJob and TransformJob threads are executed in parallel, and then the packets are sorted in the LoadJob stage and written to the target side in sequence.

As mentioned at the beginning of this article, alarm plug-ins and registry plug-ins are shared among multiple tasks, and each task selects a matching processing plug-in based on the type of source and target, and the source data format. That is to say, alarm plug-ins and registry plug-ins are related to Porter node configuration, and plug-ins such as data consumption plug-ins, target plug-ins and custom data processing plug-ins are related to task configuration.

plug-in design

Porter achieves great flexibility and loose coupling by combining SPI specifications with design patterns such as singleton, factory, and listener patterns to meet the secondary development of different scenarios. Specifically, plug-in design covers the following four aspects:

Registry Plugins

source-side consumer plug-in

target-side load plug-in

Custom data processing plug-ins

clustering mechanism

Porter's clustering pattern relies on clustering plug-ins, and the default clustering plug-in is based on zookeeper implementation. Porter task nodes and management nodes are not compulsorily bound. Task deployment can be pushed through task configuration files or management nodes. The management node can also manage nodes, collect and display monitoring indicator information, etc. It is a good management platform that simplifies operation and maintenance. Similarly, you can implement your own management platform based on the zookeeper data structure protocol. System structure in cluster mode:

zookeeper cluster mode plugin

Zookeeper Data Structure Protocol:

Porter's clustering mechanism mainly has the following functions:

Realize the load of node tasks, and automatically drift to other task nodes after the current task node fails

Communicate between task node and management node

Storage and retrieval of task processing progress

Upload statistics (latest development version supports custom statistics upload client, native support kafka)

Implementation of Distributed Locks for Node and Task Preemption

Standalone mode plug-ins based on file system

The latest development version supports Porter task nodes to run in standalone mode, independent of admin background and zookeeper, configuring tasks through configuration files. Standalone mode is a special cluster mode, which only supports some cluster functions, but simplifies the complexity of task deployment and is flexible and changeable.

Storage and retrieval of task processing progress

Upload statistical indicator data

Configuration mode of Porter task node operation mode

#zookeeper Cluster Configuration

porter.cluster.strategy=ZOOKEEPER

porter.cluster.client.url=127.0.0.1:2181

porter.cluster.client.sessionTimeout=100000

#Standalone Mode Configuration

porter.cluster.strategy=STANDALONE

porter.cluster.client.home=/path/.porter

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.