Data synchronization and query of production Environment Safety based on MySQL and Otter 07/06 Update SLTechnology News&Howtos

Data synchronization and query of production Environment Safety based on MySQL and Otter

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Preface

In the daily operation and maintenance work of the cloud platform, there are many scenarios of troubleshooting and data checking. In order to provide real-time query of current network data for all OPS personnel (including some developers and operation analysts), we use MySQL and open source tool otter to build a data query and management system, which can query the current data of each resource pool of the platform. And maintain quasi-real-time synchronization with the existing network (second delay).

The main component of the query module is MySQL, the core database of the business system on the nano pipeline, which is frequently used by users. Part of the core data in this MySQL is also synchronized to the remote computer room in real time as the source data of other resource pools. The otter management node responsible for real-time data synchronization is deployed on the same physical machine as MySQL, and is the central node of the query module in all resource pools of the cloud platform.

First, introduce the open source tool Otter (quoted from GitHub)

Otter is a distributed database synchronization system based on database incremental log parsing and quasi-real-time synchronization to MySQL database in local or remote computer room provided by Ali. Its working principle is as follows:

Db: data sources and libraries to be synchronized to

Canal: users get database incremental logs

Manager: configure synchronization rules, set data sources, synchronization sources, etc.

Zookeeper: coordinate the node for coordination work

Node: responsible for partial synchronization received by task processing.

I. the characteristics of Otter

1. Pure JAVA development with high time-consuming resources

2. Obtain database incremental log data based on Canal. Canal is another open source product of Ali.

Here is the schematic of Canal:

Big data's systematic learning materials have been carefully prepared for you. From the Linux-Hadoop-spark-., you can click

Based on the principle of MySQL active / standby replication:

MySQL master writes data changes to the binary log (binary log, where the record is called the binary log event binary log events, which can be viewed through show binlog events)

MySQL slave copies the binary log events of master to its relay log (relay log)

MySQL slave replays events in relay log to reflect data changes to its own data.

How Canal works:

Canal simulates the interaction protocol of MySQL slave, disguises itself as MySQL slave, and sends dump protocol to MySQL master

MySQL master receives the dump request and starts to push binary log to slave (i.e. canal)

Canal parses the binary log object (originally a byte stream).

3. Typical management system architecture, manager (web management) + node (work node)

1) push synchronization configuration to node node when manager is running, responsible for configuration and monitoring

2) the node node feeds back the synchronization status to the manager and is responsible for handling the task

4. Based on zookeeper, it can solve distributed state scheduling and allow multi-node nodes to work together.

5. Using aria2 multithreaded transmission technology, it is less dependent on network bandwidth.

What problems can be solved by Otter

1. Heterogeneous library synchronization

MySQL-> MySQL/Oracle. (currently, the open source version only supports MySQL increments, and the target library can be MySQL or Oracle, depending on the function of Canal)

2. Synchronization of single computer room (RTT < 1ms between databases)

Database version upgrade

Data table migration

Asynchronous secondary index.

3. Synchronization of remote computer rooms (one of the biggest highlights of Otter, which can solve the problem of internationalization and synchronize data from domestic to foreign countries for users to use, and can achieve disaster recovery in multiple data rooms in domestic scenarios)

Computer room disaster recovery

4. Two-way synchronization (two-way synchronization is the most difficult scenario in data synchronization. Otter can cope with this scenario very well. Otter has two characteristics: loop avoidance algorithm and data consistency algorithm to ensure the final data consistency in double A computer room mode.)

1) avoid loopback algorithm (a general solution that supports most relational databases)

2) data consistency algorithm (ensure the final consistency of data in double A computer room mode, bright spot)

5. File synchronization

Site mirroring (copying associated pictures, such as copying product data and copying product pictures at the same time as data replication)

Copy the schematic diagram of the single computer room:

Description:

Data on-Fly, as far as possible not to land, faster data synchronization. (enable the node loadBalancer algorithm. If the Node node S+ETL falls on a different Node, the data will have a network transmission process.)

Node nodes can have failover / loadBalancer.

Copy the schematic diagram of the remote computer room:

Description:

Data is related to network transmission. Several stages of S/E/T/L are scattered on 2 or more Node nodes, and multiple Node work together through zookeeper (usually the Node,Transform/Load of Select and Extract in one computer room falls on the Node of another)

Node nodes can have failover / loadBalancer. The Node node of each server room can be a cluster, one or more machines.

You can log in to Otter about the scheduling model, data storage algorithm, consistency, high availability and scalability of GitHub.

Big data's systematic learning materials have been carefully prepared for you. From the Linux-Hadoop-spark-., you can click

There is a detailed introduction, this article will not repeat, the following focus on the installation and use of otter.

III. Installation and deployment

Mobile cloud business needs data aggregation, and multiple master databases need to be synchronously summarized into a slave database to facilitate data statistical analysis. Otter middleware meets this requirement and is more flexible and malleable than multi-source replication.

After a brief introduction to the basic information of Otter, let's start to build an Otter environment, because an Otter requires a Manage+node+ database and has a lot of dependencies, so let's first build Otter's management server Manager.

1. Environmental preparation

1) Ali software

Otter (manager, node) software: https://github.com/alibaba/otter/releases

Manager database initialization script: https://raw.githubusercontent.com/alibaba/otter/master/manager/deployer/src/main/resources/sql/otter-manager-schema.sql

2) Cluster

Zookeeper: http://download.csdn.net/download/jxplus/9451794

3) JAVA

JDK: the test environment uses yum to install version 1.6 and above

4) Database

Mysql5.7: http://dev.mysql.com/downloads/mysql/

5) operating system

CentOS 7.1.1503 (Core): https://www.centos.org/download/

Version information

2. Software installation

1) operating system installation

2) java jdk1.6

After installing the operating system, use yum to install jdk1.6 and above (including 1.6)

Yum-y install java-1.6.0-openjdk.x86_64

3) install MySQL database

4) install the cluster software ZooKeeper

Download the installation package and decompress it without compiling and installing it. Then configure:

① modifies tickTime, clientPort, dataDir parameters

Vim / zookeeper-3.4.8/conf/zoo.cfg

TickTime: the unit of time is millisecond, which is the basic unit of time measurement used by zk. For example, 1 * tickTime is the heartbeat time between the client and the zk server, and 2 * tickTime is the timeout of the client session.

The default value of tickTime is 2000 milliseconds, and a lower tickTime value can detect timeout problems faster, but can also result in higher network traffic (heartbeat messages) and higher CPU usage (session tracking processing).

ClientPort: the TCP port on which the zk service process listens. By default, the server listens on port 2181.

DataDir: there is no default configuration and must be configured to configure the directory where snapshot files are stored.

② executes the following command to start server

Cd / zookeeper-3.4.8/bin/

. / zkServer.sh start

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.