In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "how to use Binlog". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
I wonder if you are still struggling with the following questions:
When you use redis or other middleware for caching, you often find that the cache and database data are inconsistent, and you can only make some restrictions by timed tasks or cache expiration.
When you use ES as a search tool, use the double-write approach and worry that ES and database are not a single transaction.
When you need to migrate data, you are still using the double-write method. If it is the same database, it is okay. If it is different databases, it cannot guarantee transactions. Then data consistency is also a problem.
These problems are believed to have been encountered in many students 'businesses, and may also be because these problems often increase a lot of workload or lead to some data inconsistency failures. So how can we solve these problems more easily?
What is the nature of the problem? That is, we need to ensure that our data, whether in redis or es, is consistent with our mysql, essentially a copy of the data. At the thought of data replication, friends familiar with Mysql will say: Isn't Mysql's primary and backup also data replication? If we emulate Mysql's master-backup replication, then we synchronize our data and it will be easy.
Mysql master-slave
Since we can imitate Mysql master-slave replication to complete our requirements, we need to understand the principle of mysql master-slave first, as shown in the following figure:
Stpe 1: mysql as master needs to serially write the record of each transaction update to a binlog file stored on local disk before the data is completed.
Step 2: Open an I/O Thread in our salve server, which will constantly read if from binlog. If progress has caught up with master, go to sleep and wait for master to generate new events. All read data is written to the Relay log.
Step 3: SQL Thread reads the relay log and sequentially executes SQL events in that log to be consistent with the data in the primary database.
In the process of master-slave replication, the most important one is binlog. The slave database will copy a copy of the master database data according to the information of binlog.
If we can get binlog in the business code, copy the data through binlog to redis or es, then we don't have to worry about data consistency at all.
binlog
binlog(Binary Log) is a binary log in Mysql that records all operations Mysql performs to change the database. Binlog is also a log generated by the server layer and has nothing to do with our storage engine, no matter which storage engine you use, you can use our binlog.
binlog format
There are three formats in binlog: Statement, Row, Mixed. You can view the binlog format of the current database by showing variables like 'binlog_format', as shown below is a binlog in Row format:
Statement
Statement is also a statement type, which records the Sql of each modified data into binlog.
? Pros: The space ratio is minimal, and fields that have not been modified are not recorded. Compared to other modes, it reduces a lot of log lights and improves I/O performance.
? Disadvantages: Heterogeneous systems are inconvenient to use, such as redis cache replication, it is difficult to simulate mysql slave operation, you need to recheck the data once. And slave can also have problems, such as using some UUID functions, slave playback does not guarantee that both sides are consistent. We can see what is in the Statement log. Here we can enter the command: show master status; check the binlog that our current master is using, as shown below:
Then use the command show binlog events in 'mysql-bin. 000003', to see what is in this log:
We can find that all our operations will be carried out in a complete transaction. If the transaction is not committed, it will not appear in our binlog. This can be experimented with. Our updates to the original sql in the database will be completely recorded.
Row
Row mode is different from Statement, it records all the data after each row is modified:
Advantages: Heterogeneous systems can also synchronize data more conveniently, and there is no problem with UUID functions, which can be replicated in any case.
Disadvantages: The amount of data is relatively large, such as the update statement, which also records each field before and after the update. This causes a large log volume and has a certain impact on I/O.
Let's also look at the contents of this:
In the show binlog events in 'mysql-bin. 00004' command, we found that we can't view our specific data in the transaction. At this time, we need our tool to help mysqlbinlog. It is also in the bin directory of mysql. We can call it directly. Enter the command/usr/local/mysql/bin/mysqlbinlog --base64-output=decode-rows -v mysql-bin. 00004, we can see:
Shown here is an update statement that shows not only the original value but also the modified value.
Note here that binlog_row_image is used to determine whether row will record the original value. The default is FULL, which means it will record, which is the case above. There is also a parameter of minimal, which means only the updated value will be recorded.
Mixed
In mixed mode, MySQL still records in statement format by default, but records in row format once it determines that a data inconsistency (UUID function) may occur.
We currently use Row mode by default. In Row mode, it is convenient to heterogeneous data. In fact, the impact of Row mode on I/O is not particularly obvious in business perception.
Canal
Once we know what binlog is, we need to know how to use it. Binlog synchronization tools are common: databus,canal,maxwell, Aliyun dts and so on, here we will not compare their respective advantages and disadvantages, focus on introducing canal.
canal(github address: https://github.com/alibaba/canal), translated as waterway/pipeline/ditch, mainly used to provide incremental data subscription and consumption based on MySQL database incremental log parsing
In the early stage, Alibaba had the business requirement of cross-computer room synchronization due to the deployment of dual computer rooms in Hangzhou and the United States, and the implementation mode was mainly based on incremental changes obtained by business trigger. Since 2010, businesses have gradually tried to parse database logs to obtain incremental changes for synchronization, resulting in a large number of incremental database subscriptions and consumption businesses. Later, it gradually evolved into DTS project in Alibaba Cloud.
The general principle of canal is also to imitate mysql slave, constantly pull binlog from master, and then binlog can be placed in different places, such as our common message queue: kafka,rocketmq and so on. Of course, Alibaba Cloud's paid DTS can also be directly synchronized to redis, es or some other storage media.
The simple use of canal can be viewed at quickStart: github.com/alibaba/canal/wiki/QuickStart, without much introduction here. The next step is to introduce more about the overall architecture of canal, as well as the principle of implementation, etc.
Canal overall structure
Canal Server: A Jvm can be understood as a Canal Server, and if it is a Canal in cluster mode, there will be multiple Canal Servers.
CanalInstance: It can be understood that a job is an Instance. For example, there is an Instance that synchronizes the binlog of library A to the message queue of A and the binlog of library B to the message queue of B. Then these are two different Instances. As for which Instance runs on which CanalServer, it depends on who preempts the temporary node in ZK first. If the distribution is evenly distributed enough, it can relieve a lot of pressure in cluster mode.
CanalParser: Used to pull mysql-binlog and parse it.
EventSink: Process parsed data (filter, merge, etc.).
Canal EventStore: This is a bit like relay log in slave, used to relay logs, but in canal currently only supports storage in memory, currently does not support disk storage.
CanalParser, EventSink and CanalEventStore are all very important components of Canal, and their relationship is as follows:
CanalParser generates data for EventSink to process. The processed data will be stored in CanalEventStore, and MQ will continuously pull the latest data from CanalEventStore and then deliver it to MQ.
CanalParser
Let's talk about how Canal masquerades as a slave to pull data in CanalParser. There are the following steps in AbstractEventParser.java:
Step1: Build a database link and generate a slaveId to identify yourself as a slave.
Step2: Get the meta-information of the database, such as binlogFormat,binRowImage, etc.
Step3: Get the serverId we need to listen to for binlog services by using the show variables like 'server_id' command.
Step4: Obtain the position that needs to be consumed this time. If there is one stored last time, obtain it from the previous time. If there is no one stored last time, you need to consume the latest Position obtained from the show master status command.
Step5: Perform dump operation, simulate slave sending registration slave request and dump binlog request, and then pull data from binlog continuously with an endless loop:
Step6: Convert the acquired binary data into logEntry according to mysql binlog protocol for subsequent processing.
EventSink
EventSink will process the logEntry obtained above:
Filter:
Filter empty transactions
Filtered heartbeat
Custom filtering
Records. Prometheus is used here to report statistics.
Merger. Now there are many business requirements for sub-database and sub-table. Their data sources are all from different Parser, but they all need to be summarized into the same EventStore in the end. What we need to pay attention to in this scenario is that we will do time merge control, that is, try to make the data of each sub-library submitted in an incremental way after summary, so as to avoid the data of a certain sub-library being much ahead or behind others.
EventStore
Let's start by looking at the interfaces provided in EventStore:
You can see that EventStore is actually a simple store, providing MemoryEventStoreWithBuffer in the canal, data transferred in memory, the principle of which is realized through RingBuffer(lockless, high-performance queue), information about RingBuffer can refer to my previous article You should know Disruptor, RingBuffer is explained in detail in 3.1.
CanalMq then continuously acquires data through EventStore to send data.
"How to use Binlog" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.