In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Yun Mei's guide:
The so-called write confirmation refers to the concept that after the user writes the data to the database, the database informs the user that the write is successful. According to the characteristics and configuration of the database, it can be returned to users at different levels of writing, which involves different performance, data security levels and data consistency.
Different write confirmation levels or configurations are a kind of self-control ability that the database provides to users. Users can choose the appropriate database configuration to realize their own needs according to the characteristics of their business, the needs of data management, performance considerations, data consistency and service availability.
First of all, several important concepts are introduced, which are also common sense knowledge in databases, but they are different expressions in different databases.
These concepts mainly involve two important considerations of write confirmation, one is the non-loss of write operations in the local database, and the other is the consistency of data redundancy in the distributed environment.
Local database write operation means that the database can persist after handling the user's write operation to prevent accidental data loss, which mainly involves logs, such as redo log in MySQL and journal log in MongoDB.
The consistency of data redundancy refers to how to achieve the data consistency between the slave node and the master node after the data is written into the master node in a multi-replica environment, such as the master-slave or replication set architecture, while the master-slave uses another log to achieve data synchronization, such as the binlog of MySQL and the oplog log of MongoDB.
In addition, it prevents the master node from collapsing and the data can not be synchronized to the slave node, resulting in the loss of unsynchronized data after the slave node becomes the new master node, which is also an important part of the write confirmation, that is, not only to synchronize the data, but also to make the data synchronized safely and quickly.
Redo/journal
MySQL's redo log and MongoDB's journal logs are WAL (Write-Ahead Logging) pre-written logs at the database storage engine level, recording physical modifications of data, which is a technology to improve the persistence of the data system.
Redo log
Redo log is one of the core log files in the innodb transaction log, MySQL's default storage engine, commonly known as the redo log, which is mainly used for rolling forward data recovery.
When we want to modify a row of data in the MySQL database, innodb is to read the data from disk into the memory buffer pool to make the change. At this time, the data is modified in memory, which is different from that on disk. We call this discrepancy data dirty pages. Innodb does not flush dirty pages back to disk every time it generates dirty pages, which will generate a large number of io operations and seriously affect the processing performance of innodb, so it is not immediately refreshed to disk every time there are dirty pages. Since there is a difference between the data on the dirty page and the disk, if the database fails during this period, the data will be lost.
And redo log is to solve this problem. Due to the existence of redo log, the time of refreshing dirty pages to disk can be delayed, which ensures the performance of the database and improves the security of data. Although the overhead of redo log refresh is increased, due to the sequential io adopted by redo log, it is much faster than the random io of data pages, and this additional overhead is acceptable.
That is, the database first writes the physical modification of the data page to the redo log file that is faster to refresh the disk, so as to prevent data loss. In the event of a failure, when the database is restarted and restored, the committed physical data pages that have not been refreshed to disk can be recovered from redo log.
Journal
Journal is a concept at the MongoDB storage engine level. Mmapv1, wiredtiger, mongorocks and other storage engines supported by MongoDB all support configuration journal. MongoDB can recover information that was not written to disk in time because of the crash based on journal.
All the data writing and reading of MongoDB ultimately call the connection of the storage engine layer to store and read the data. Journal is an auxiliary mechanism for the storage engine to store data.
Before version 4.0 of MongoDB, users can set whether to enable journal log; starting from version 4.0, members of the replica set must enable the journal feature.
Take wiredtiger as an example. If journal is not configured, the data written to wiredtiger will not be persisted immediately; instead, a full amount of checkpoint (storage.syncPeriodSecs configuration item, default is 1 minute) will be done every minute to persist all the data. If there is a downtime in the middle, the data can only be restored to the most recent checkpoint, which may lose up to 1 minute of data.
Therefore, it is recommended that "be sure to turn on journal". When journal is enabled, an operation log will be recorded for each write (the written data can be reconstructed through journal). In this way, even if there is a downtime, Wiredtiger will first restore the data to the point of the most recent checkpoint at startup, and then replay the subsequent journal operation log to recover the data.
Binlog/oplog
Binlog of MySQL and oplog of MongoDB are logical logs corresponding to write operations at the database level, which are mainly used to realize synchronous replication of data between master and slave as well as incremental backup and recovery.
Binlog
Binlog is a kind of binary log at the MySQL database level, which will be generated by changes to the database no matter what storage engine is used at the bottom. The method of recording operations in binlog is logical statements, which can be started by setting log-bin=mysql-bin.
Binlog records details about the execution time, operation type, and operation of the write operation, such as SQL statements (statement) or changes to actual data per row (row).
The figure above shows how data replication is implemented between MySQL masters and slaves. Three important processes are:
The master database (Master) records database changes to binlog (the Binary Log in the figure); the slave database (Slave) copies the binlog on the primary database to its own relay log (Relay log); and the slave library reads the events in the relay log and replays (Replay) them to the secondary database data.
In this way, the continuous replication realizes the consistency of the data between the database nodes.
Oplog
Oplog is a concept at the database level of MongoDB. Under the replication set architecture, the data synchronization between the active and standby nodes is realized through oplog. All writes in Primary are recorded in MongoDB Oplog, and then the Oplog is pulled from the main library and applied to its own database. The Oplog here is a collection of MongoDB local databases, which is Capped collection, which means it is fixed size and recycled.
Oplog is an ordinary capped collection in MongoDB, and oplog is just a part of ordinary data for the storage engine.
Only nodes started by the replica set schema automatically create collections of oplog.rs in the local library.
Oplog records the operation time, operation type, and operation details of the write operation, and changes to almost every row of actual data retained (in version 4.0 and later, multiple documents involved in a transaction are written in an oplog).
The figure above shows how to achieve data replication between MongoDB master and slave. Four important processes are:
The master library (Primary) records database changes to oplog (the Capped Oplog collection in the figure); the slave library (Secondary) pulls the oplog on the master database into its own playback queue (Queue); the slave library reads the oplog in the queue, and applyOps to the slave database data; and then writes the Oplog in the queue to the oplog.rs collection in the slave database.
In this way, the continuous replication realizes the consistency of the data between the database nodes.
In addition, MongoDB supports chain replication, that is, oplog is not necessarily obtained from Primary, but can also be obtained from other Secondary. The figure above shows how to achieve data replication between MongoDB master and slave. Four important processes are:
The master library (Primary) records database changes to oplog (the Capped Oplog collection in the figure); the slave library (Secondary) pulls the oplog on the master database into its own playback queue (Queue); the slave library reads the oplog in the queue, and applyOps to the slave database data; and then writes the Oplog in the queue to the oplog.rs collection in the slave database.
In this way, the continuous replication realizes the consistency of the data between the database nodes.
In addition, MongoDB supports chain replication, that is, oplog is not necessarily obtained from Primary, but can also be obtained from other Secondary.
Redo and binlogredo log are generated in the innodb storage engine layer, while binlog is generated in the upper layer of the MySQL database, and binlog is not only aimed at the innodb storage engine, any storage engine in the MySQL database changes to the database will generate binlog.
The content forms of the two kinds of log records are different. The binlog of MySQL is a logical log, which records the changes of the corresponding SQL statement or line. The redo log at the innodb storage engine level is a physical log.
The two types of logs differ from the point in time at which the records are written to disk, and binlog writes only once after the transaction is committed. The redo log of the innodb storage engine is constantly written during the transaction, and the log is not written in the order in which the transaction is committed.
Binlog is recorded only when the transaction commits, and for each transaction, it is recorded only when the transaction commits, and for each transaction, only one log for the corresponding transaction is included. For the redo log of innodb storage engine, because its records are physical operation logs, each transaction corresponds to multiple log entries, and the redo log writes of transactions are concurrent, not written when the transaction is committed, and the order in which they are recorded in the file is not the order in which the transaction begins.
Binlog is not recycled. After being full or restarted, a new binlog file will be generated, and redo log will be recycled.
Binlog can be used for data recovery, master-slave replication, and redo log for data recovery after abnormal downtime or media failure.
Journal and oplog
Journal logs are generated at the storage engine layer such as wiretiger and mmapV1, while oplog is the master-slave replication level concept of MongoDB database, and oplog has nothing to do with the storage engine.
The content forms of the two kinds of log records are different. The oplog of MongoDB is a logical log, which records the contents of the corresponding write operation. The physical modification of journal storage
The two kinds of logs and records are written to disk at a different point in time.
When writing a document in a MongoDB replication set, you need to modify the following data
Write all the index information of the document data to the corresponding set update collection to an oplog for synchronization. The final storage engine applies all modification operations and writes the above three operations to a journal operation log. Journal is not recycled. After being full or restarted, a new journal file is generated, and oplog is recycled. Oplog can be used to restore data, replication set architecture, and journal can be used as data recovery after an outage or media failure.
Write confirmation
The concept of write confirmation actually comes from write concern in MongoDB, which describes the acknowledge level of MongoDB to a write operation. The corresponding concept in MySQL can be understood as that when a user commit a write operation, he or she will inform the user of the successful submission after what actions he or she needs to take.
MongoDB
In MongoDB, the database supports a write policy that enables the user to configure flexibly based on the write concern function, then different policies correspond to different data writing levels, that is, the write success is returned to the user, and the user can proceed to the next write request.
Write concern
Write concern supports 3 configuration items:
{w:, j:, wtimeout:}
Where:
W, this parameter requires that the write operation has been written to a node before confirming to the user; {w: 0} does not need to send any acknowledgement to the client, so it is suitable for scenarios that require high performance but do not care about correctness; {w: 1} default writeConcern, when data is written to Primary, an acknowledgment is sent to the client. {w: "majority"} data is written to most members of the replica set and an acknowledgment is sent to the client, which is suitable for scenarios that require high data security. This option reduces write performance; j, this parameter indicates whether the write operation should be journal persisted before confirming to the user; {j: true} requires the primary write operation to be journal persisted before confirming to the user. {j: false} requires that the write operation is already in the journal cache to confirm to the user. Journal will persist to disk later. The default is 100ms. This parameter indicates the write timeout. It is valid if w is greater than 1. When w is greater than 1, the write operation needs to be successfully written to several nodes to be considered successful. If there is a failure in the node during the writing process, the write operation cannot meet the w requirements, and the confirmation result cannot be returned to the user all the time. In order to prevent this situation, the user can set wtimeout to specify the timeout period. If the write process continues beyond this time and does not finish, the write failure is considered.
Write confirmation under replica set
The following is a replica set architecture to describe, a write operation process, to understand the write confirmation under MongoDB.
The above write operation, {j:false 2}, requires at least two nodes to write successfully before it can be returned to the user. The write success of each node can be judged based on the parameter {j}. If {j:true}, the journal of the write operation of each node can be brushed; if {j:false}, the journal of the write operation can be returned successfully in the cache.
In addition, how the Primary of MongoDB knows whether the Secondary has been synchronized successfully is based on the following process:
Client initiates a request to Primary, specifying writeConcern as {w: "majority"}, Primary receives the request, writes locally and records the write request to oplog, and then waits for most nodes to synchronize this / batch oplog (after Secondary applies oplog, it will report the latest progress to the master); Secondary pulls the newly written oplog on Primary, replays and records oplog locally. In order to enable Secondary to pull the oplog,find command on the master in the first time to support an option of awaitData, when find does not have any eligible documents, it does not return immediately, but waits for up to maxTimeMS (default is 2s) to see if there is any new eligible data, and returns if so; so when you write a new oplog, the backup can get the new oplog immediately. There is a separate thread on Secondary. When the latest timestamp of oplog is updated, it will send a replSetUpdatePosition command to Primary to update its oplog timest when Primary finds that enough node oplog timestamps have met the conditions, send confirmation to the client, so that Primary can know that the data has been synchronized.
MySQL
The so-called write confirmation or write success of the MySQL database can be reflected by the commit commit of the transaction, and the successful commit is the write success. Therefore, I mainly analyze the settings and problems related to the success of commit writing from the processes of redo log, binlog and two kinds of log flushing and master-slave replication in the process of executing transactions and MySQL transactions.
MySQL replication architecture
At present, the versions of MySQL with more traffic include 5.5,5.6,5.7,8.0, while Group Replication is used in version 8.0 to achieve multi-node data consistency. This kind of group replication relies on the distributed consistency protocol (a variant of Paxos protocol) to achieve the final consistency of distributed data.
There are several common replication mechanisms in MySQL:
Synchronous replication. When the master commits the transaction, all slave nodes must receive, Replay, and commit the transaction before the master thread can continue to do subsequent operations. However, the disadvantage is that the time for the main library to complete a transaction will be lengthened and the performance will be degraded. Asynchronous replication. The master library writes transactional Binlog events to the Binlog file. At this time, the master library will only notify the Dump thread to send these new Binlog, and then the master library will continue to process the commit operation, and at this time there is no guarantee that these Binlog will be passed to any slave library node.
Semi-synchronous replication. It is between full synchronous replication and full asynchronous replication. The master database only needs to wait for at least one slave node to receive and Flush Binlog to the Relay Log file, and the master database does not need to wait for all slave libraries to give feedback to the master database. At the same time, this is only a feedback received, not feedback that has been fully completed and submitted, thus saving a lot of time.
Group replication. A replication group is formed by several nodes, and the submission of a transaction must be resolved and approved by most of the nodes in the group (N / 2 + 1) before it can be submitted. For example, a replication group is composed of three nodes, and the Consensus layer is the consistency protocol layer. In the process of transaction commit, intergroup communication occurs, and the transaction is passed by two nodes (certify), so that the transaction can finally be committed and respond.
In addition to group replication, semi-synchronous replication technology is a relatively better design of performance and security, especially in version 5.7, which optimizes the logic related to semi-synchronous replication in previous versions, so we mainly introduce version 5.7.
The principle of MySQL5.6/5.5 semi-synchronous replication: the thread that commits the transaction is locked until at least one Slave receives the transaction, and since the transaction is not sent to the Slave until it is committed to the storage engine, the number of transactions lost can be reduced to up to one per thread. Because the transaction is sent to the Slave after it has been committed, when the Slave does not receive successfully, and the Master fails, it will cause the master to be inconsistent: the master has data, never has data. This is called AFTER_COMMIT.
MySQL5.7 has improved the commit time of Master transactions. Transactions are sent to Slave (AFTER_SYNC) before they are committed. When Slave does not receive successfully, and Master goes down, it will not lead to master inconsistency, because the master has not yet committed, so the master and slave have no data.
However, if the Slave is successfully received, and the binlog in the Master fails to flush and goes down before the storage engine commits, it is obvious that the transaction is not successful. However, because the corresponding Binlog has already done the Sync operation, the Binlog has been received from the library and executed successfully, which is tantamount to more data on the slave database, which is considered a problem, but with more data, the problem is generally not serious. Group replication from version 8.0 may be required at this point.
MySQL write confirmation behavior
We take the master-slave architecture of semi-synchronous replication of MySQL version 5.7 as an example to introduce the different effects of various parameters of MySQL on write acknowledgement, that is, commit.
In the figure above, the process that can reflect semi-synchronous replication (AFTER SYNC) is:
Step 6, write binlog; (where the sync_binlog parameter works), step 7, synchronize the binlog to the Relay log; of the slave database (where the sync_relay_log parameter works), step 8, return to the main database ack;, steps 9 and 10, commit the transaction, mark the transaction in redo log as committed; (the innodb_flush_log_at_trx_commit parameter works here), step 11, return the user to write successfully
In the figure above, the process that reflects the sequence consistency of redo log and binlog is:
Step 4, set redo log to prepare and brush the disk; step 6, write binlog; (where the sync_binlog parameter works), steps 9 and 10, commit the transaction, and set redo log to commit state
The above configuration items related to disk brushing will be introduced into the whole process to see the different behaviors and risks of writing operations.
Note: the above is the start of the internal two-phase commit process, that is, innodb_support_xa=true. At this time, you can recover the committed transaction by judging binlog, so innodb_flush_log_at_trx_commit seems dispensable. If the two-phase commit of internal transactions is not enabled, it will be more complicated. Only when innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 and sync_relay_log = 1, the security of committed transactions can be guaranteed. Other situations may lead to the risk of data loss or inconsistency between master and slave data.
However, when innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 and sync_relay_log = 1, the performance of MySQL is the lowest. You can improve performance, such as innodb_flush_log_at_trx_commit = 2 and sync_binlog = N (N = 500 or 1000). In this case, both redo log and binlog are in the system cache, and the cache cache with battery backup power can be used to prevent abnormal system power outage.
In addition, the rpl_semi_sync_master_wait_for_slave_count parameter controls how many nodes to synchronize to, similar to the w parameter in write concern in MongoDB. If this parameter is set to 0 (actually cannot, at least 1), it becomes purely asynchronous replication. If this parameter is set to maximum (the number of all slave nodes), it becomes purely synchronous replication, so this place can also be adjusted as needed to submit data security.
At the same time, other parameters that may affect the synchronization mode include the rpl_semi_sync_master_wait_no_slave parameter, the parameter rpl_semi_sync_master_timeout that affects the replication wait timeout, and so on. When rpl_semi_sync_master_wait_no_slave is OFF, as long as master finds that Rpl_semi_sync_master_clients is less than rpl_semi_sync_master_wait_for_slave_count, master will immediately switch to asynchronous mode; if ON, rpl_semi_sync_master_timeout,master will switch to asynchronous mode if it times out during the transaction commit phase (master waits for ACK).
Contrast
Configuration comparison
Other
Although MongoDB and MySQL can have similar or similar settings in many ways, there are still some differences, such as:
The journal of MongoDB contains the information of oplog; while binlog and redo log are two relatively independent contents; almost all write operations of MySQL are committed based on transactions; while MongoDB began to support multi-document transactions, and single-document transactions are implemented based on internal transaction logic and are not directly provided to users; the write confirmation of MySQL is marked by the success of commit commit, and the normal write operation of MongoDB is marked by the return of write success. Transaction write operations are also marked by commit; summary
The concept of write confirmation introduced in this article involves a number of processes and modules, such as MongoDB and MySQL log files (redo log/journal), synchronization logs (binlog/oplog), flushing mechanism and timing, master-slave synchronization architecture, and so on. The purpose is to realize the atomicity, persistence and data consistency of write operations, which have an impact on data performance and security. It needs to be adjusted according to data, business, stress, security and other objective factors.
As there are so many contents involved, not all the cases have been tested and verified, and there may be omissions or errors. I hope you will not hesitate to give us advice. I also hope that this article can be used as a summary and reference for students who are interested in both MySQL and MongoDB.
references
High performance MySQL (https://item.jd.com/11220393.ht ml)
Official manual of MongoDB (https://docs.mongodb.com/manual /)
Deep and simple MongoDB replication (https://mongoing.com/archives/5 200)
Mysql binlog-based replication (https://blog.csdn.net/u01254801 6/article/details/86584293)
MongoDB journal or oplog, which one writes first? (https://mongoing.com/archives/3 988)
New feature of MySQL5.7-- introduction to the official high availability solution MGR (https://www.cnblogs.com/luoahong/ar ticles/8043035.html)
Analysis of the principle of MongoDB writeConcern (https://mongoing.com/archives/2 916)
Redo log and bin log of mysql log system (https://www.jianshu.com/p/4bcfffb27 ed5)
MySQL 5.7 semi-synchronous replication Enhancement (https://www.cnblogs.com/mao3714/p/8 777470.html)
Sequence consistency of Redo and Binlog in MySQL [transfer] (https://www.cnblogs.com/mao3714/p/8 734838.html)
Detailed analysis of MySQL transaction logs (redo log and undo log) (https://www.cnblogs.com/f-ck-need-u / archive/2018/05/08/9010872.html)
MySQL's "double 1 setting"-key parameters of data security (case sharing) (https://www.cnblogs.com/kevingrace/ p/10441086.html)
Rpl_semi_sync_master_wait_no_slave parameter research experiment (https://www.cnblogs.com/konggg/p/12 205505.html)
Process Analysis and Summary of AFTER_SYNC/AFTER_COMMIT for semi-synchronous replication of MySQL5.7 New Features (http://blog.itpub.net/15498/vi ewspace-2143986/)
Above, Enjoy~
Click [read] to learn more about the database
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.