Talking about the Scheme of Database Cluster 07/03 Update SLTechnology News&Howtos

Talking about the Scheme of Database Cluster

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

single-point database

Databases are often performance bottlenecks in systems, so various caching mechanisms are often introduced into system design to avoid frequent access to databases. In addition, due to the importance of the database, high availability requirements are also unavoidable, because once the database hangs, basically the entire system cannot be used.

These common problems are the limitations of single-point databases. In order to solve these problems and achieve high performance and high availability, we need to adopt database clustering solutions in system architecture design.

performance test

Since there are performance issues with single-point databases, is there actual data? Let's run a performance test on a single-point database to see what its concurrency limit is. I am using a 2-core 2G cloud service with MySQL version 8.0.18.

mysql comes with a performance testing tool: mysqlslap, we can use this tool to test, the specific test parameters are as follows:

[root@localhost ~]# mysqlslap -hlocalhost -uroot -pyour_password -P3306 --concurrency=500 --iterations=1 --auto-generate-sql --auto-generate-sql-load-type=mixed --auto-generate-sql-add-autoincrement --engine=innodb --number-of-queries=500

Description of main parameters:

Parameter description--concurrency Number of concurrency, i.e. number of clients simulated--iterations How many times to execute the test--auto-generate-sql Use SQL scripts generated by the system itself to test--auto-generate-sql-load-type Whether to test read or write or a mixture of both (Value: read, write, update, mixed)--auto-generate-sql-add-autoincrement Add columns to the auto-generated table--engine Storage engine to be tested--number-of-queries Number of accesses per client, divided by the number of concurrency is the number of accesses per client, In this example: 500 / 500 = 1 See official documentation for more details

Here I have tested different magnitudes, and when there are 500, 1000, and 5000 concurrent connections, the database can still handle it normally without too much problem:

# 500 concurrent connections Benchmark Running for engine innodb Average number of seconds to run all queries: 0.391 seconds Minimum number of seconds to run all queries: 0.391 seconds Maximum number of seconds to run all queries: 0.391 seconds Number of clients running queries: 500 Average number of queries per client: 1#1000 concurrent connections Benchmark Running for engine innodb Average number of seconds to run all queries: 0.802 seconds Minimum number of seconds to run all queries: 0.802 seconds Maximum number of seconds to run all queries: 0.802 seconds Number of clients running queries: 1000 Average number of queries per client: 1#5000 concurrent connections Benchmark Running for engine innodb Average number of seconds to run all queries: 3.884 seconds Minimum number of seconds to run all queries: 3.884 seconds Maximum number of seconds to run all queries: 3.884 seconds Number of clients running queries: 5000 Average number of queries per client: 1

But after testing 1w concurrent connections, the database starts reporting an unconnectible error:

From this test case, we can see that the performance bottleneck of an ordinary single-node database is about 1w concurrent connections. Of course, the test results here are related to the hardware differences of the machine, just to provide a reference.

PXC Cluster Solution

The previous section described the problems with single-point databases and conducted a simple performance test. To address these issues, we need to transform single-point databases into clusters.

There are many database clustering schemes, and none of them is good or bad, only the right one is good. This section introduces one of the mainstream solutions: PXC cluster solution, its architecture diagram is roughly as follows:

The biggest characteristic of PXC cluster is the strong consistency of data reading and writing. If data is written to any node in the graph, other nodes will be able to read the data. There will be no successful writes to node A and no reads to node B. This feature makes PXC clusters suitable for storing high-value critical data because data such as orders and money basically have strong consistency requirements.

The above figure is only the most basic architecture of PXC cluster, so there is room for optimization. We all know that the performance limit of mysql's single-table data processing is about 20 million. When the data reaches this magnitude, mysql's processing performance will be very low. In the above figure, each PXC node synchronizes data, so when the data magnitude of each node reaches 20 million, the performance of the whole cluster will decrease.

At this point, you need to add one more cluster, and the data between the two clusters is not synchronized. In order for different clusters to store different data, Mycat, a database middleware, must be introduced to segment the data so that data can be read and written on different clusters and the storage pressure can be distributed. In this scenario, a cluster is called a data shard. As shown in the figure:

How PXC Cluster Works

We know that consistency and persistence of database data are guaranteed by transactions, and strong consistency of PXC clusters is also ensured by transactions, but this transaction is distributed.

After the client writes data, it also needs to submit a transaction, in which synchronous replication of data takes place between nodes. The transaction is applied to all nodes in the cluster, ensuring that all nodes either write successfully or fail. Here's a timing diagram to illustrate the rough flow:

Things to note when building a PXC cluster:

The more nodes in a PXC cluster, the better. More nodes does not improve performance. Conversely, more nodes means more time is required to synchronize data, which in turn reduces performance. Generally speaking, a PXC cluster should have no more than 15 nodes. If more nodes are needed, a new cluster should be built and then shards should be done with middleware. The performance of a PXC cluster depends not on which node is best configured, but on which node is worst configured, similar to computer hardware. Therefore, we should try our best to ensure that the node configuration in the PXC cluster tends to be consistent, so as to avoid that the overall performance of the cluster is slowed down by the poor configuration of a node. About PXC

All that said, we haven't even introduced what PXC is. PXC is the abbreviation of Percona XtraDB Cluster. PXC is a multi-master cluster based on mysql's Galera cluster technology, which connects different mysql instances. Each mysql node in a PXC cluster is readable and writable, i.e., the master node in the master-slave concept, and there are no read-only nodes.

PXC can cluster any variant of mysql, such as MariaDB and Percona Server. Because Percona Server's performance is closest to MySQL Enterprise Edition, it has a significant performance improvement over the standard version of MySQL and is basically compatible with MySQL. Therefore, when building a PXC cluster, it is usually recommended to build it based on Percona Server.

Replication cluster scenario

Strong data consistency in PXC clusters comes at the expense of performance because clients need to wait for all nodes to write data. The opposite clustering scenario is Replication Cluster, which is described in this section. This scheme does not sacrifice performance, but does not have strong data consistency, which is exactly what you cannot have both fish and bear's paw.

The so-called read-write non-strong consistency means that the data was successfully written to node A and the transaction was committed. However, when reading on Node B, the written data may not be read.

Because the transaction committed here is only local to the node, it can only guarantee that the data is successfully written to the node, but not to the nodes in the entire cluster. When this node synchronizes data with other nodes, it may fail to synchronize data successfully due to various reasons, resulting in that the data cannot be read on other nodes.

Therefore, this clustering scheme is not suitable for storing high-value data, but it is suitable for non-high-value data with high requirements for read and write performance. For example, user behavior logs, operation logs, product descriptions and other such non-important data.

Similarly, the above figure is only the most basic architecture of Replication cluster, and Mycat is also needed to fragment the data when the data volume reaches a certain scale. As shown in the figure:

In fact, in large-scale system architectures, it is often not a single specific clustering scheme, but a combination of multiple schemes. For example, a PXC cluster and a Replication cluster can be used together, with the PXC cluster storing high-value data and the Replication cluster storing low-value data. Then use database middleware such as Mycat to complete data fragmentation and management between clusters, as shown in the figure:

Summary PXC cluster adopts synchronous replication, transactions are either submitted at the same time or not submitted in all cluster nodes, which can ensure strong consistency of data in the cluster. However, the performance is relatively low. The client needs to wait for synchronous replication of all nodes to complete. Replication clusters adopt asynchronous replication, which cannot guarantee strong data consistency. The performance is relatively high. The client only needs to wait for the local transaction of the target node to be successfully submitted, and does not need to wait for all nodes to successfully replicate the data. Both PXC and Replication clusters only realize data synchronization, and there is no data segmentation function. When the data volume is large, database middleware such as Mycat needs to be introduced to do data segmentation and management. PXC and Replication cluster solutions are not exclusive, and each has its own advantages and disadvantages. They can be used together to achieve the effect of complementing each other. High value data stored to PXC cluster, low value data stored to Replication cluster

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.