How to use the binlog of mysql 05/02 Update SLTechnology News&Howtos

How to use the binlog of mysql

2025-05-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use binlog of mysql". In daily operation, I believe many people have doubts about how to use binlog of mysql. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to use binlog of mysql". Next, please follow the editor to study!

Mysql master and slave

Since we can imitate the master-slave replication of Mysql to complete our requirements, we need to understand the principle of mysql master-slave first, as shown in the following figure:

Stpe 1: mysql as a master needs to write the operation record serially to the binlog file and store it on the local disk before the update data of each transaction is completed.

Step 2: open an I Thread O Thread in our salve server and it will continue to read from the binlog if read. If you have caught up with master, go to sleep and wait for master to generate new events. All read data is written to Relay log (Relay Log). Step 3:SQL Thread reads the relay log and sequentially executes the SQL events in the log to be consistent with the data in the primary database.

In the process of master-slave replication, the most important thing is binlog, the slave database will copy a copy of the data of the master database according to the information of binlog.

If we can get binlog in the business code, copy the data from binlog to redis or es, then we don't have to worry about data consistency at all.

Binlog

Binlog (Binary Log), as its name implies, is a binary log in Mysql, which records all operations that Mysql makes changes to the database. Binlog is also the log generated by the server layer has nothing to do with our storage engine, no matter which storage engine you use, you can use our binlog.

Binlog format

There are three formats in binlog: Statement,Row and Mixed. You can view the binlog format of the current database through show variables like 'binlog_format'. As shown in the following figure, it is a binlog in Row format:

Statement

Statement, which is the statement type, records each Sql that modifies the data to binlog.

Advantages: the space share is the smallest and unmodified fields are not recorded. Compared with other modes, it reduces the log brightness a lot, and improves the performance of Iswab O. Disadvantages: heterogeneous systems are not convenient to use, such as redis cache replication, it is difficult to simulate the mysql slave operation, the data needs to be checked again. And slave will also have problems, such as the use of some UUID functions, slave playback can not guarantee that the two sides are consistent. We can take a look at what the contents of Statement's log are. We can enter the command here: show master status; to view the binlog currently being used by our master, as shown below:

Then use the command show binlog events in 'mysql-bin.000003', to see what is in the log: we can find that all our operations will be carried out in a complete transaction, if the transaction is not committed will not appear in our binlog, you can come down to experiment, our updates in the database of the original sql will be fully recorded.

Row

Unlike Statement, Row records all the data that has been modified for each row:

Advantages: heterogeneous systems can also be more convenient to synchronize data, and will not have the problem of UUID function, no matter what the situation can be copied. Disadvantages: a large amount of data, such as update statement, he will also record every field before the update and every field after the update. As a result, the amount of log is relatively large, which has a certain impact on Icano.

Similarly, let's take a look at the contents:

In the show binlog events in 'mysql-bin.000004' command, we found that we cannot view our specific data in the transaction. At this time, we need the help of our tool mysqlbinlog, which is also in the bin directory of mysql. We can just call it directly. Enter the command / usr/local/mysql/bin/mysqlbinlog-- base64-output=decode-rows-v mysql-bin.000004, and we can see:

What is shown here is a update statement that shows not only the original value, but also the modified value.

Note that binlog_row_image is used to determine whether row will record the original value. By default, FULL represents the record, which is the case above, and another parameter is minimal, which only records the updated value.

Mixed

In mixed mode, MySQL still records in statement format by default, but once it determines that there may be data inconsistencies (the UUID function), it records in row format.

At present, we use Row mode by default, which makes it easier to heterogeneous data in Row mode. In fact, the impact of Row mode on Row O is not particularly obvious in the business.

Canal

When we know what binlog is, we need to know how to use the binlog. Binlog synchronization tools are common: databus,canal,maxwell, Aliyun dts and so on, here we do not compare their respective advantages and disadvantages, focus on canal.

Canal (github address: https://github.com/alibaba/canal), translated as channel / pipe / ditch) is mainly used for incremental log parsing based on MySQL database, providing incremental data subscription and consumption

In the early days, Alibaba had the business requirement of synchronizing across data centers because of the deployment of dual computer rooms in Hangzhou and the United States, which was mainly achieved by obtaining incremental changes based on the business trigger. Since 2010, the business has gradually tried to synchronize the database log parsing to obtain incremental changes, resulting in a large number of database incremental subscription and consumption business. Later, it gradually evolved into the DTS project in Aliyun.

The general principle of canal is to imitate mysql's slave, constantly pulling binlog from master, and then putting binlog into different places, such as our common message queue: kafka,rocketmq, and so on. Of course, Aliyun's paid dts can also be synchronized directly to redis,es or other storage media.

For simple use of canal, you can check out quickStart: https://github.com/alibaba/canal/wiki/QuickStart, which is not introduced too much here. Next, it mainly introduces the overall architecture of canal, the principle of implementation and so on.

The overall architecture of Canal

CanalServer: a Jvm can be understood as a CanalServer, and if it is a cluster mode Canal, then there will be multiple CanalServer.

CanalInstance: it can be understood that a job is an Instance. For example, if there is a message queue that synchronizes the binlog of library A to message queue An and the binlog of library B to message queue B, then these are two different Instance. As for which Instance runs on which CanalServer, it depends on who preempts the temporary node on the ZK first. If the distribution is uniform enough, it can relieve a lot of pressure in cluster mode.

CanalParser: used to pull mysql-binlog and parse it.

EventSink: process the parsed data (filtering, merging, etc.).

CanalEventStore: this is similar to relay log in slave, which is used for relay storage of logs, but currently only supports in-memory storage in canal. Currently, disk storage is not supported.

CanalParser,EventSink,CanalEventStore are all very important components in Canal, and the relationship between them is as follows:

CanalParser generates data for EventSink to process, and the processed data is stored in CanalEventStore, and then MQ constantly pulls the latest data from CanalEventStore and delivers it to MQ.

CanalParser

Let's talk about how Canal pretends to be slave to pull data in CanalParser. In the class AbstractEventParser.java, there are the following steps:

Step1: build a database link and generate a slaveId to identify yourself as slave. Step2: get the meta-information of the database, such as binlogFormat,binRowImage and so on. Step3: get the serverId that we need to listen to the binlog service through the show variables like 'server_id' command.

Step4: get the location that needs to be consumed this time. If there is any storage last time, get it from the previous one. If not, you need to consume it through the latest Position obtained in the show master status command.

Step5: perform dump operations, simulate slave to send registration slave requests and dump binlog requests, and then use an endless loop to pull data from the binlog:

Step6: convert the obtained binary data into logEntry according to mysql binlog protocol to facilitate subsequent processing.

EventSink

EventSink will process the logEntry obtained above:

Filter:

Filter empty transaction filter heartbeat custom filter

Record, Prometheus is used here for statistical reporting of the data. Merging, there are many business needs for sub-libraries and tables, and their data sources are all from different Parser, but they all need to be aggregated into the same EventStore in the end. What we need to pay attention to in this scenario is that time merging control will be done, that is, the data of each sub-database will be submitted incrementally as far as possible, so as to avoid that the data of one sub-database is much ahead or far behind other sub-libraries.

EventStore

Let's first take a look at the interface provided in EventStore: we can see that EventStore is actually a simple storage, providing MemoryEventStoreWithBuffer in canal and transferring data in memory, the principle of which is achieved through RingBuffer (no lock, high-performance queue). For information about RingBuffer, you can refer to my previous article, Disruptor, which you should know. RingBuffer is explained in detail in 3.1.

Then CanalMq continuously obtains data through EventStore to send data.

At this point, the study on "how to use the binlog of mysql" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.