Introduce Mycat for PXC clusters and build a complete high-availability cluster architecture 04/28 Update SLTechnology News&Howtos

Introduce Mycat for PXC clusters and build a complete high-availability cluster architecture

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Comparison of MySQL Cluster Middleware

Building a PXC cluster under CentOS8 demonstrates how to build a three-node PXC cluster from scratch. However, building a PXC cluster is not enough, because in actual enterprise applications, there may be multiple PXC clusters, each of which exists as a data shard. Therefore, under the complete architecture, we also need to introduce database middleware for the cluster to achieve the functions of data slicing and load balancing.

There are many database middleware on the market, which are mainly divided into two types: load balancing and data sharding (usually data sharding has load balancing function):

Load balancing: HaproxyMySQL-Proxy data segmentation: MyCatAtlasOneProxyProxySQL

The role of load balancing middleware:

Load balancer provides request forwarding, which can forward requests evenly to each node in the cluster, reducing the load on a single node so that we can make full use of the resources of each node in the cluster to give full play to the performance of the cluster.

The role of data slicing middleware:

Distribute SQL statements according to different routing algorithms, so that different shards can store different data, so that data segmentation allows data to be stored evenly on different shards, so that the amount of data in a certain shard does not exceed the storage limit of the database. The fragmentation here refers to a cluster or a database node.

The following is a comparison of common middleware:

Whether the name is open source free load capacity development language function document popularity MyCat open source free based on Alibaba's Corba middleware reconstruction, high traffic inspection Java function is comprehensive, there are rich slicing algorithms and read-write separation, global primary key and distributed transactions and other functional documents are rich, not only the official "Mycat authoritative Guide", but also many community contributions of documents, telecommunications, e-commerce applications. Is the most popular MySQL middleware Atlas open source free based on MySQL Proxy, mainly used for 360 products, there are billions of requests per day to check the limited function of C language, to achieve the separation of read and write, with a small amount of data segmentation algorithm, do not support global primary keys and distributed transaction documents are less, only open source project documents, non-technical community and publication penetration rate is low, except Qihoo 360 Only in some small and medium-sized projects, there are few cases available for reference. OneProxy is divided into free version and enterprise version based on C language kernel, which has good performance. C language has limited functions, realizes read-write separation, has a small number of data segmentation algorithms, does not support less global primary keys and distributed transaction documents, does not provide usage documents on the official website, and has a low popularization rate of non-technical communities and publications. Only in the internal systems of some small and medium-sized enterprises have used ProxySQL open source free outstanding performance, Percona recommended C++ features are relatively rich, support for read-write separation, data segmentation, failover and query caching and other rich documents, there are official documents and technology community penetration rate is lower than Mycat, but many companies have tried to configure Mycat data segmentation

After the introduction and comparison in the previous section, we can see that MyCat and ProxySQL are ideal database middleware. Because MyCat is more comprehensive and more popular than ProxySQL, Mycat is used as the middleware of PXC cluster. For the introduction and installation of Mycat, you can refer to my other article on getting started with Mycat, so I won't repeat it here.

This section mainly describes how to configure Mycat's data sharding feature to enable Mycat to forward SQL requests to the backend PXC cluster sharding as the front-end data sharding middleware. Therefore, here I built two PXC clusters, each cluster is a shard, and built two Mycat nodes and two Haproxy nodes to build dual-server hot backup later. As shown in the figure:

The information of each node is as follows:

Role HostIPHaproxy-MasterHaproxy-Master192.168.190.140Haproxy-BackupHaproxy-Backup192.168.190.141Mycat:Node1mycat-01192.168.190.136Mycat:Node2mycat-02192.168.190.135PXC shard-1:Node1PXC-Node1192.168.190.132PXC shard-1:Node2PXC-Node2192.168.190.133PXC shard-1:Node3PXC-Node3192.168.190.134PXC shard-2:Node1PXC-Node1192.168.190.137PXC shard-2:Node2PXC-Node2192.168.190.138PXC shard-2:Node3PXC-Node3192.168.190.139

Create a test library in each shard, and create a t_user table in the library for testing. The specific table SQL is as follows:

CREATE TABLE `t _ user` (`id` int (11) NOT NULL, `username` varchar (20) NOT NULL, `password` char (36) NOT NULL, `tel`char (11) NOT NULL, `locked` char (10) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `idx_ username` (`username`) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

After the above preparations, then we start to configure Mycat, if you do not know the configuration file of Mycat, you can refer to another article: Mycat core configuration details, this article will not repeat.

1. Edit the server.xml file and configure the access users of Mycat:

Abc_123456 test test

2. Edit the schema.xml file and configure the logical library, logical table and connection information of the cluster node of Mycat:

Select user () select user ()

3. Edit the rule.xml file and modify the cardinality of the mod-long sharding algorithm. Since there are only two clusters as shards, you need to change the cardinality to 2:

Id mod-long 2Tips: this sharding algorithm uses the values of the id column in the table to model the modular cardinality to get the index of the data shard.

After completing the configuration of the above three files, start Mycat:

[root@mycat-01 ~] # mycat startStarting Mycat-server... [root@mycat-01 ~] # more / usr/local/mycat/logs/wrapper.log | successfully output from grep successfully# log indicates that INFO was started successfully | jvm 1 | 15:09:02 on 2020-01-19 | MyCAT Server startup successfully. See logs in logs/mycat.log test

After startup, go to Mycat and execute an insert statement to test whether the SQL can be forwarded to the correct cluster shard. The specific steps are as follows:

[root@mycat-01] # mysql-uadmin-P8066-h227.0.0.1-pmysql > use test;mysql > insert into t_user (id, username, password, tel, locked)-> values (1, 'Jack', hex (AES_ENCRYPT (' 123456, 'Test')),' 13333333333,'N')

The above insert statement inserts a record with an id of 1, while we use a slicing algorithm for modulating id columns, which is configured with a modular cardinality of 2. Therefore, the result of modulus calculation according to the value of id and the cardinality of the module is 1% 2 = 1. The resulting 1 is the sharding index, so normally Mycat will forward the insert statement to the cluster with shard index 1.

According to the configuration in the schema.xml file, the cluster corresponding to the shard with index 1 is pxc-cluster2, that is, the second PXC cluster shard. Next, we can verify that Mycat forwarded the SQL correctly as expected by comparing the data in the two clusters.

As you can see from the following figure, Mycat correctly forwards the insert statement to the second shard, where the first shard has no data:

Then we test whether Mycat can forward the id to the first shard when the SQL is 2. The specific SQL is as follows:

Insert into t_user (id, username, password, tel, locked) values (2, 'Jon', hex (AES_ENCRYPT (' 123456, 'Test')),' 1888888888888,'N')

The test results are shown in the figure:

After the above tests are completed, the data in all shards can be queried on Mycat at this time:

Four kinds of commonly used data segmentation algorithms for die slicing with primary key

In the example in the previous section, die slicing with the primary key is used, which is characterized by the following:

Die slicing with primary key is suitable for scenarios where the initial data is very large, but the data growth is not fast. For example, map products, administrative data, enterprise data, etc. The disadvantage of die slicing with primary key is that it is difficult to expand the new shard, and there is too much data to be migrated. If you need to expand the number of shards, it is recommended that the number of shards after expansion is 2n times that of the original shard. For example, it was originally two shards, but after expansion, there are four primary key ranges that are suitable for sharding in scenarios where data is growing rapidly. It is easy to increase sharding in scenarios where data is growing rapidly. It needs to have a clear primary key column, date sharding, date sharding, easy to increase sharding, and a clear date column enumeration value sharding enumeration value suitable for classifying and storing data scenarios. It is suitable for most business enumerated values to be segmented according to the mapping relationship between the value of a field (number) and mapFile configuration. The disadvantage of data enumerated value segmentation is that the data stored in slices is not uniform enough.

The enumerated value segmentation algorithm is also introduced in the article "detailed explanation of Mycat core configuration." This algorithm uses an additional mapping file (mapFile) compared to other algorithms, so here is a simple demonstration of the use of this algorithm.

Requirements: there is a column in the user table that stores the user's area code, which is required to be used as a sharding column, so that the user data under different area codes can be stored in different shards.

1. First, in the rule.xml file of Mycat, add the following configuration:

Area_id area-int area-hash-int.txt

2. Create an area-hash-int.txt file under the conf directory, and define the corresponding relationship between the area code and the sharding index:

[root@mycat-01 / usr/local/mycat] # vim conf/area-hash-int.txt020=00755=00757=00763=10768=10751=1

3. Configure schema.xml, add a logical table, and set its sharding rule to sharding-by-areafile:

4. Enter the Mycat to execute the hot load statement, which enables Mycat to apply the new configuration without restarting:

[root@mycat-01 ~] # mysql-uadmin-P9066-h227.0.0.1-pmysql > reload @ @ config_all; Test

After completing the above configuration, let's build a table and test it and create the t _ customer table in all the clusters. The specific table SQL is as follows:

Create table t_customer (id int primary key, username varchar (20) not null, area_id int not null)

Insert a record with an area_id of 020 into Mycat:

[root@mycat-01] # mysql-uadmin-P8066-h227.0.0.1-pmysql > use test;mysql > insert into t_customer (id, username, area_id)-> values (1, 'Jack', 020)

According to the configuration in the mapping file, the data with an area_id of 020 will be stored in the first shard, as shown below:

Tips: since area_id is of type int here, the preceding zero will be removed.

Then insert a record with an area_id of 0763:

Insert into t_customer (id, username, area_id) values (2, 'Tom', 0763)

According to the configuration in the mapping file, the data with an area_id of 0763 is stored in the second shard, as shown below:

After completing the above tests, you should be able to query the data in all shards in Mycat at this time:

Father-son table

When the associated data is stored in different shards, it will encounter the problem of table join, which is not allowed to do table join query across shards in Mycat. In order to solve the problem of joining across sharded tables, Mycat proposed a solution of parent-child tables.

The parent-child table stipulates that the parent table can have any segmentation algorithm, but the associated child table does not allow the segmentation algorithm, that is, the data of the child table is always stored in a slice with the data of the parent table. No matter what segmentation algorithm the parent table uses, the child table is always stored with the parent table.

For example, the user table is associated with the order table, and we can use the user table as the parent table and the order table as the child table. When user An is stored in shard 1, the order data generated by user An is also stored in shard 1, so that there is no need to cross-shard when querying the order data of user A. As shown in the following figure:

Practice

Now that we understand the concept of parent-child tables, let's look at how to configure parent-child tables in Mycat. First, configure the parent-child table relationship in the schema.xml file:

ChildTable tag description:

JoinKey attribute: define the column in the child table used to associate the parent table parentKey attribute: define the associated column in the parent table childTable tag can also continue to add childTable tags

After completing the above configuration, have Mycat reload the configuration file:

Reload @ @ config_all; Test

Then create the t_orders table in all the shards. The specific table SQL is as follows:

Create table t_orders (id int primary key, customer_id int not null, create_time datetime default current_timestamp)

Now there are two users in the shard, the user with id 1 is stored in the first shard, and the user with id 2 is stored in the second shard. At this point, insert an order record through Mycat:

Insert into t_orders (id, customer_id) values (1,1)

Since the order record is associated with a user with an id of 1, it will be stored in the first shard according to the parent-child table. As shown below:

Similarly, if the order record is associated with a user with id 2, it will be stored in the second shard:

Insert into t_orders (id, customer_id) values (2,2)

The test results are as follows:

Since the data of the parent and child tables are all stored in the same shard, there is no problem with associative queries on the Mycat:

Preface to setting up a high-availability Mycat cluster with dual-computer hot backup

In the example in the above section, we can see that the read and write operations to the back-end database cluster are done on the Mycat. As a middleware responsible for receiving client requests and forwarding them to the back-end database cluster, Mycat inevitably needs high availability. Otherwise, if the Mycat has a single point of failure, then the entire database cluster will not be able to use, which has a great impact on the whole system.

So this section will demonstrate how to build a high-availability Mycat cluster. In order to build a Mycat high-availability cluster, you need to introduce Haproxy and Keepalived components in addition to having more than two Mycat nodes.

As a load balancing component, Haproxy is located at the front end to receive requests from the client and distribute the requests to each Mycat node to ensure the high availability of Mycat. Keepalived is used to implement dual-computer hot backup, because Haproxy also needs to be highly available. When one Haproxy goes down, the other backup Haproxy can take over immediately. That is to say, only one Haproxy is running at a time, and the other Haproxy is waiting as a standby. When a running Haproxy goes down unexpectedly, Keepalived can immediately switch the standby Haproxy to the running state.

Keepalived allows hosts to compete for the same virtual IP (VIP) to achieve high availability. These hosts are divided into Master and Backup roles, and there is only one Master, while Backup can have multiple. At first, Master obtains that the VIP is running. When the Master is down, if the Backup cannot detect the Master, the VIP will be automatically obtained, and the request sent to the VIP will be received by the Backup. In this way, Backup can seamlessly take over from Master to achieve high availability.

With the introduction of these components, our cluster architecture will eventually evolve like this:

Install Haproxy

Haproxy is an established load balancing component, so the installation package with this component in CentOS's yum repository is very easy to install. The installation commands are as follows:

[root@Haproxy-Master ~] # yum install-y haproxy

Configure Haproxy:

[root@Haproxy-Master ~] # vim / etc/haproxy/haproxy.cfg# adds the following configuration item at the end of the file # Monitoring interface configuration listen admin_stats # bound ip and listening port bind 0.0.0.0etc/haproxy/haproxy.cfg# 4001 # access protocol mode http # URI relative address stats uri / dbs # Statistical report format stats realm Global\ statistics # for Login monitoring interface account password stats auth admin:abc123456# database load balancer configuration listen proxy-mysql # bound ip and listening port bind 0.0.0.0 ip 3306 # access protocol mode tcp # load balancing algorithm # roundrobin: polling # static-rr: weight # leastconn: minimum connection # source: request source ip balance roundrobin # Log format option tcplog # need to be load balanced host server mycat_01 192.168.190.136 check port 8066 weight 1 maxconn 2000 server mycat_02 192.168.190.135 maxconn 8066 check port 8066 weight 1 maxconn 2000 # use keepalive to detect dead chain option tcpka

Since port 3306 is configured for TCP forwarding and 4001 is used as the access port for the Haproxy monitoring interface, these two ports need to be opened on the firewall:

[root@Haproxy-Master] # firewall-cmd-- zone=public-- add-port=3306/tcp-- permanent [root@Haproxy-Master ~] # firewall-cmd-- zone=public-- add-port=4001/tcp-- permanent [root@Haproxy-Master ~] # firewall-cmd-- reload

After completing the above steps, start the Haproxy service:

[root@Haproxy-Master ~] # systemctl start haproxy

Then use the browser to access the monitoring interface of Haproxy. The first visit will require you to enter a user name and password, which is configured in the configuration file:

After logging in successfully, you will see the following page:

The monitoring interface of Haproxy also provides comprehensive monitoring information. In this interface, we can see the connection information of each host and its own status. When the host cannot connect, the Status column displays DOWN, and the background color changes to red. In the normal state, the value is UP and the background color is green.

The other Haproxy node is also installed and configured using the above steps, so I won't repeat it here.

Test Haproxy

After the Haproxy service is set up, let's use remote tools to test whether we can connect to Mycat properly through Haproxy. As follows:

After the connection is successful, execute some SQL statements on Haproxy to see if you can insert and query the data properly:

We set up Haproxy to make Mycat highly available, so finally, to test whether Mycat has high availability, first stop a Mycat node:

[root@mycat-01 ~] # mycat stopStopping Mycat-server...Stopped Mycat-server. [root@mycat-01 ~] #

At this point, from the monitoring interface of Haproxy, you can see that the mycat_01 node has been offline:

Now that there is one Mycat node left in the cluster, let's execute some SQL statements on Haproxy to see if the data can be inserted and queried properly:

As can be seen from the test results, insert and query statements can still be executed normally. In other words, even if a Mycat node is turned off at this time, the entire database cluster can still be used normally, indicating that the Mycat cluster is now highly available.

Using Keepalived to realize the High availability of Haproxy

After achieving the high availability of the Mycat cluster, we also have to achieve the high availability of Haproxy, because the current architecture has changed from Mycat client-oriented to Haproxy client-oriented.

However, only one Haproxy is available at the same time, otherwise the client will not know which Haproxy to connect to. This is why VIP is adopted, which allows multiple nodes to replace each other with the same IP, and the client only needs to connect to this VIP from beginning to end. Therefore, it is Keepalived's turn to achieve the high availability of Haproxy. Before installing Keepalived, you need to enable the VRRP protocol of the firewall:

[root@Haproxy-Master] # firewall-cmd-- direct-- permanent-- add-rule ipv4 filter INPUT 0-- protocol vrrp-j ACCEPT [root@Haproxy-Master] # firewall-cmd-- reload

You can then install Keepalived using the yum command, and note that Keepalived is installed on the Haproxy node:

[root@Haproxy-Master ~] # yum install-y keepalived

After the installation is complete, edit the configuration file for keepalived:

[root@Haproxy-Master ~] # mv / etc/keepalived/keepalived.conf / etc/keepalived/keepalived.conf.bak # does not use the included configuration file [root@Haproxy-Master ~] # vim / etc/keepalived/keepalived.confvrrp_instance VI_1 {state MASTER interface ens32 virtual_router_id 51 priority 100 advert_int 1 authentication {auth_type PASS auth_pass 123456} virtual_ipaddress {192.168.190.100}}

Configuration instructions:

State MASTER: defines the node role as master. When the role is master, the node can get VIP without contention. Multiple master is allowed in the cluster, and when there are multiple master, the master needs to compete for VIP. For other roles, VIPinterface ens32 can only be obtained when master is offline: define the name of the network card that can be used for external communication. The name of the network card can be viewed through the ip addr command: virtual_router_id 51: define the id of the virtual route, with a value of 0-255. the value of each node needs to be unique, that is, it cannot be configured to the same priority 100. define weight. The higher the weight, the more priority to get VIPadvert_int 1: define the detection interval of 1 second authentication: define the authentication information used in heartbeat check auth_type PASS: define the authentication type as password auth_pass 123456: define a specific password virtual_ipaddress: define virtual IP (VIP), need to be an IP under the same network segment, and each node needs to be consistent

After completing the above configuration, start the keepalived service:

[root@Haproxy-Master ~] # systemctl start keepalived

When the keepalived service starts successfully, you can use the ip addr command to see the virtual IP bound to the Nic:

The other node is also installed and configured using the above steps, so I won't repeat it here.

Test Keepalived

We have completed the installation and configuration of Keepalived above, and finally we will test whether the Keepalived service is properly available and whether Haproxy is already highly available.

First, test whether the virtual IP can ping properly on other nodes, and check the configuration if you can't ping it. As shown in the picture, I can communicate with ping normally here:

Common virtual IP ping failure situations:

Incorrect firewall configuration, virtual IP that does not correctly turn on VRRP protocol configuration is not in the same network segment as the IP of other nodes, Keepalived configuration is incorrect, or Keepalived does not start successfully at all

After confirming that you can ping the virtual IP of Keepalived from outside, use Navicat to test whether you can connect to the Mycat through the virtual IP:

After the connection is successful, execute some statements to test whether it can be inserted properly and query the data:

There is almost no problem at this point. Finally, test the high availability of Haproxy and turn off the keepalived service on one of the Haproxy nodes:

[root@Haproxy-Master ~] # systemctl stop keepalived

Then execute some statements again to test whether the data can be inserted properly and query the data. The following shows that the Haproxy node already has high availability:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.