What is the parallel replication mode based on WRITESET in MySQL? 07/02 Update SLTechnology News&Howtos

What is the parallel replication mode based on WRITESET in MySQL?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly talks about "what is the parallel replication mode based on WRITESET in MySQL". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the WRITESET-based parallel replication in MySQL?"

1. Strange last commit

Let's first look at a screenshot and take a closer look at the last commit:

We can see that the last commit appears to be out of order, which is not possible in COMMIT_ORDER-based parallel replication. In fact, it is the result of what we said earlier about parallel replication based on WRITESET and then reducing the last commit as much as possible. This situation will result in better parallel playback from the library in MTS, and the criteria for determining parallelism will be explained in detail in section 19.

What is Writeset?

Writeset is actually a collection that uses the set container in C++ STL. The class Rpl_transaction_write_set_ctx contains the following definitions:

Std::set write_set_unique

Each element in the collection is a hash value, which is related to the algorithm specified by our transaction_write_set_extraction parameter and comes from the primary and unique keys of the row data. Each row of data contains two formats:

The field value is in binary format

The field value is in string format

The specific format of each row of data is:

Primary key / unique key name delimiter library name delimiter library name length table name delimiter table name length key field 1 separator length key field 2 delimiter length other fields …

After the Innodb layer modifies a row of data, the above formatted data is hash and written to Writeset. You can refer to the function add_pke, and I will also give part of the flow in pseudo-code later.

Note, however, that the hash values of all row data for a transaction are written to a Writeset. If you modify more rows, you may need more memory to store these hash values. Although 8 bytes are small, if a transaction modifies many rows, it consumes more memory resources.

In order to observe this data format more intuitively, you can use debug to get it. Let's take a look.

III. Generation of Writeset

We use the following table:

Mysql > use testDatabase changedmysql > show create table jj10\ gateway * 1. Row * * Table: jj10Create Table: CREATE TABLE `jj10` (`id1` int (11) DEFAULT NULL, `id2` int (11) DEFAULT NULL, `id3` int (11) NOT NULL, PRIMARY KEY (`id3`), UNIQUE KEY `id1` (`id1`) KEY `id2` (`id2`) ENGINE=InnoDB DEFAULT CHARSET=latin11 row in set (0.00 sec)

We write a row of data:

Insert into jj10 values (36. 36.)

A total of four elements will be generated from this row of data:

Note: what is shown here? It's a separator.

1. Primary key binary format (gdb) p pke$1 = "PRIMARY?test?4jj10?4\ 200\ 000\ 000 $? 4" * * Note:\ 200\ 000\ 000 $: 3 octal bytes and ASCII character $, which is converted to hexadecimal is "0X80 00 24" * *

Decompose into:

Primary key name delimiter library name delimiter library name length table name delimiter table name length primary key field 1 delimiter length PRIMARY?test?4jj10?40x80 00 24 separator 42. Primary key string format: (gdb) p pke$2 = "PRIMARY?test?4jj10?436?2"

Decompose into:

| |-- |-- |

| | PRIMARY |? | test |? | 4 | jj10 |? | 4 | 36 |? | 2 |

3. Unique key binary format (gdb) p pke$3 = "id1?test?4jj10?4\ 200\ 000 $? 4"

Parsing the same as above

4. Unique key string format: (gdb) p pke$4 = "id1?test?4jj10?436?2"

Parsing the same as above

Eventually, the data will be written to Writeset through the hash algorithm.

Fourth, the general flow of the function add_pke

The following is a piece of pseudo code that describes this generation process:

If there is an index in the table: write the database name, table name information to the temporary variable loop scan each index in the table: if it is not a unique index: exit the cycle and continue the cycle. Loop two ways to generate data (binary format and string format): write the index name to the pke. Write temporary variable information to pke. Loop through each field in the index: write the information for each field to the pke. If the field scan is complete: pke generates a hash value and writes it to the write collection. If there is no primary key or unique key to record a tag, then use this tag to determine whether to use the parallel replication mode of Writeset. 5. Writeset sets how to handle last commit

In the previous section we discussed how ORDER_COMMIT-based parallel replication generates last_commit and seq number. In fact, the parallel replication method based on WRITESET only makes further processing of last_commit on the basis of ORDER_COMMIT, and does not affect the original ORDER_COMMIT logic, so it is very convenient to fall back to ORDER_COMMIT logic. You can refer to MYSQL_BIN_LOG::write_gtid function.

Further processing will be done according to the value of binlog_transaction_dependency_tracking, as follows:

ORDER_COMMIT: call the m_commit_order.get_dependency function. This is the way we discussed earlier.

WRITESET: call the m_commit_order.get_dependency function, and then call m_writeset.get_dependency. You can see that the m_writeset.get_dependency function will deal with the original last commit.

WRITESET_SESSION: call the m_commit_order.get_dependency function, then call m_writeset.get_dependency and then call m_writeset_session.get_dependency. M_writeset_session.get_dependency will process last commit again.

The code described in this paragraph corresponds to:

Case DEPENDENCY_TRACKING_COMMIT_ORDER: m_commit_order.get_dependency (thd, sequence_number, commit_parent); break; case DEPENDENCY_TRACKING_WRITESET: m_commit_order.get_dependency (thd, sequence_number, commit_parent); m_writeset.get_dependency (thd, sequence_number, commit_parent); break Case DEPENDENCY_TRACKING_WRITESET_SESSION: m_commit_order.get_dependency (thd, sequence_number, commit_parent); m_writeset.get_dependency (thd, sequence_number, commit_parent); m_writeset_session.get_dependency (thd, sequence_number, commit_parent); break; VI. Historical MAP of Writeset

We have discussed what Writeset is here, and we have said that if we want to lower the value of last commit, we need to compare the transaction's Writeset with the historical MAP of Writeset to see if there is a conflict before we can decide why. Then you must keep a copy of such a historical MAP in memory. Define it in the source code as follows:

/ * Track the last transaction sequence number that changed each row in the database, using row hashes from the writeset as the index. * / typedef std::map Writeset_history; / / map to implement Writeset_history m_writeset_history

We can see that this is the map container in C++ STL, which contains two elements:

Hash value of Writeset

Seq number of the latest data modification transaction of our bank

It is sorted by the hash value of Writeset.

Secondly, a value called m_writeset_history_start is maintained in memory, which is used to record the seq number of the earliest transactions in the history of Writeset MAP. If the Writeset's historical MAP is full, it cleans up the historical MAP and writes the transaction's seq number to m_writeset_history_start as the earliest seq number. Later, you will see that the value of the transaction last commit is always modified from this value and then compared to determine the modification. If no conflict is found in the historical MAP of Writeset, then simply set last commit to the value of last commit _ start. Here is the code to clean up the Writeset history MAP:

The history MAP of if (exceeds_capacity | |! can_use_writesets) / / Writeset is full {masks writesetroomystarting = sequence_number; / / if the maximum setting is exceeded, clear the writeset history. Re-record from the current seq number, that is, the smallest transaction seq number m_writeset_history.clear (); / / clear the history of MAP} 7. Parallel replication of Writeset processing flow for last commit

Here is an introduction to the whole process, assuming the following:

Currently, through parallel replication based on ORDER_COMMIT, what is constructed is (last commit=125,seq number=130).

Four pieces of data have been modified in this transaction, which I use on behalf of ROW1/ROW7/ROW6/ROW10.

The table contains only primary keys and no unique keys, and my graph retains only hash values in binary format for row data, not hash values in string format that contain data.

The initialization is shown in the following figure (figure 16-1, the original HD image is included in the original image at the end of the article):

The first step is to set last commit to the value of writeset_history_start.

The second step, ROW1.HASHVAL, looks in the Writeset history MAP and finds the conflicting row ROW1.HASHVAL to change the seq number of this row in the history MAP to 130. At the same time, set last commit to 120.

The third step, ROW7.HASHVAL, looks in the Writeset history MAP and finds the conflicting row ROW7.HASHVAL to change the seq number of this row in the Writeset history MAP to 130. Since the corresponding seq number in the historical MAP is 114, no change will be made if it is less than 120s. The last commit is still 120.

The fourth step, ROW6.HASHVAL, looks in the Writeset history MAP and finds the conflicting row ROW6.HASHVAL to change the seq number of this row in the Writeset history MAP to 130. Since the corresponding seq number in the historical MAP is 105, no change will be made if it is less than 120. The last commit is still 120.

Step 5 ROW10.HASHVAL looks in the Writeset history MAP, and no conflicting lines are found, so you need to insert this line into the Writeset history MAP to find it. (you need to determine whether it causes the history MAP to fill up. If it is full, you do not need to insert it, and then you need to clean it up.) That is, to insert ROW10.HASHVAL and seq number=130 into the Writeset history MAP.

The whole process is over. Last commit has been reduced from 130 to 120, and the goal has been achieved. In fact, we can see that the Writeset history MAP is equivalent to saving a snapshot of modified rows for a period of time, and if you ensure that the modified data in this transaction does not conflict during this period, it can obviously be executed in parallel from the library. The last commit is reduced as shown in the following figure (figure 16-2, the original HD image is included in the original image at the end of the article):

The whole logic is in the function Writeset_trx_dependency_tracker::get_dependency. Here are some key code, with a little more code:

If (can_use_writesets) / / if you can use writeset mode {/ * Check if adding this transaction exceeds the capacity of the writeset history. If that happens, m_writeset_history will be cleared only after and add_pke using its information for current transaction. * / exceeds_capacity= m_writeset_history.size () + writeset- > size () > Compute the greatest sequence_number among all conflicts and add the transaction's row hashes to the history _ Compute the greatest sequence_number among all conflicts and add the transaction's row hashes to the history. / / if it is greater than the parameter binlog_transaction_dependency_history_size, set the cleanup flag / * cleanup. * / int64 last_parent= the temporary variable, first set to the smallest seq number for (std::set::iterator it= writeset- > begin (); it! = writeset- > end (); + + it) / loop every element {Writeset_history::iterator hst= m_writeset_history.find (* it) in each Writeset; / / whether it already exists in writeset history. The element in map is key and the writeset value is sequence number if (hst! = m_writeset_history.end ()) / / if there is {if (hst- > second > last_parent & & hst- > second)

< sequence_number) last_parent= hst->

If the second;// is larger than the sequence_number} else {if (! exceeds_capacity) m_writeset_history.insert (std::pair (* it, sequence_number)) that does not need to be set hst- > second= sequence_number; / / change this row of records; / / insert if there is no conflict. }}. If (! write_set_ctx- > get_has_missing_keys ()) / / do not change last commit {/ * The WRITESET commit_parent then becomes the minimum of largest parent found using the hashes of the row touched by the transaction and the commit parent calculated with COMMIT_ORDER if there is no primary key and unique key. * /; commit_parent= std::min (last_parent, commit_parent); / / changes have been made to last commit here. Lower his last commit} if (exceeds_capacity | |! can_use_writesets) {masked writesetroomystarting = sequence_number; / / empty writeset history if the maximum setting is exceeded. The way to re-record the smallest transaction seqnuce number m_writeset_history.clear () from the current sequence; / / clear the real MAP} 8. WRITESET_SESSION

As mentioned earlier, this approach is to continue processing on the basis of WRITESET, but in fact it means that transactions of the same session are not allowed to be played back in parallel from the slave library. The code is simple, as follows:

Int64 session_parent= thd- > rpl_thd_ctx.dependency_tracker_ctx (). Get_last_session_sequence_number (); / / take the seq number if (session_parent! = 0 & & session_parent) of the last transaction of this session

< sequence_number) //如果本session已经做过事务并且本次当前的seq number大于上一次的seq number commit_parent= std::max(commit_parent, session_parent);//说明这个session做过多次事务不允许并发，修改为order_commit生成的last commit thd->

Rpl_thd_ctx.dependency_tracker_ctx (). Set_last_session_sequence_number (sequence_number); / / set the value of session_parent to the value of this seq number

After this operation, we found that the last commit eventually reverted to the ORDER_COMMIT way.

IX. Description of binlog_transaction_dependency_history_size parameters

The default value of this parameter is 25000. Represents the number of elements in the Writeset history MAP that we are talking about. For example, modifying a row of data during the Writeset generation process analyzed above may generate multiple hash values, so this value cannot completely wait for the number of modified rows, which can be understood as follows:

Number of rows modified by binlog_transaction_dependency_history_size/2= * (1 + unique keys)

From the previous analysis, we can find that the higher the value, the more elements can be contained in the Writeset history MAP, the more accurate (smaller) the generated last commit, and the higher the efficiency of concurrency from the library. But we need to note that the larger the setting, the higher the memory requirements.

In the absence of a primary key

In fact, whether there is a primary key or a unique key is determined in the function add_pke, and if there is a unique key, it is OK. Row data hash values for unique keys are stored in Writeset. Referring to the function add_pke, here is the judgment:

If (! (table- > key_ info [key _ number] .flags & (HA_NOSAME)) = = HA_NOSAME) / / Skip non-unique KEY continue

If there is no primary key or unique key, the following statement will be triggered:

If (writeset_hashes_added = = 0) ws_ctx- > set_has_missing_keys ()

Then when we generate the last commit, we will determine that this setting is as follows:

If (! write_set_ctx- > get_has_missing_keys ()) / / do not change last commit {/ * The WRITESET commit_parent then becomes the minimum of largest parent found using the hashes of the row touched by the transaction and the commit parent calculated with COMMIT_ORDER if there is no primary key and unique key. * /; commit_parent= std::min (last_parent, commit_parent); / / changes have been made to last commit here. Lower his last commit}}

So there is no primary key to use a unique key, and if there is none, the WRITESET setting will not take effect and fall back to the old ORDER_COMMIT mode.

Why can transactions executed by the same session generate the same last commit

With the previous foundation, we can easily explain this phenomenon. The main reason is the existence of Writeset's historical MAP, as long as the rows modified by these transactions do not conflict, that is, the primary key / unique key is different, then this phenomenon can exist in WRITESET-based parallel replication, but this phenomenon will not occur if binlog_transaction_dependency_tracking is set to WRITESET_SESSION.

At this point, I believe you have a deeper understanding of "what is the WRITESET-based parallel replication in MySQL". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.