Live broadcast Review? practice of TBase multi-center, multi-activity and high availability program 07/08 Update SLTechnology News&Howtos

Live broadcast Review? practice of TBase multi-center, multi-activity and high availability program

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Tencent Cloud Database [domestic Database topic online Technology Salon] is in full swing. Chen Aisheng's sharing has ended on April 28. For those who have not had time to participate, don't worry. The following is a live video and text review.

Follow the official account of "Tencent Cloud Database" and reply to "0428 Chen Aisheng" to download LVB and share PPT.

Practice of TBase multi-center, multi-activity and high availability scheme _ Tencent Video

Hello, everyone. I'm Chen Aisheng. I'm currently in charge of Tencent Cloud TBase product implementation and operation and maintenance.

Today's topic is the practice of TBase multi-center multi-activity and high-availability solutions, focusing on multi-center multi-activity and high-availability switching schemes.

Sharing is divided into four parts:

The first part introduces the service components of TBase, that is, which service components are made up of TBase, what they should pay attention to when they are deployed, and what the topology of deployment is.

The second part: the introduction of TBase multi-active deployment, in this chapter will involve the deployment of two places and two centers, VPC isolation, active and standby read-write separation, and the introduction of various schemes.

The third part: node-level fault switching, for example, in our stand-alone instance, the primary node fails, how to switch over the standby node. For TBase, because of its many service components, each service component also has active and standby nodes. This chapter will describe in detail the failure of each component, the scope of influence, and how it switches, and the impact on the whole service after switching.

The fourth part: central-level failover, this chapter mainly describes the application scenario, why there is central-level handover, and its switching cost. The center-level handover is the most difficult to achieve, the whole handover cost and the complexity of handover are very large.

Overview of Part Ⅰ TBase service components

Let's first take a look at the service architecture of TBase. I believe students who often follow TBase should be familiar with this picture. Today, I will focus on the various service components in this diagram in great detail.

In the figure, we can see that the entire TBase consists of three components, the leftmost is the GTM component, the upper right corner is the CN component, and the following is the DN component. Next, we will describe these three components in detail and the places that should be paid attention to when it is deployed.

First of all, let's look at the leftmost GTM component. What is this for? This is actually the same as the global transaction ID (dispenser). When using a stand-alone machine, transaction management is placed in memory. TBase is a distributed database, and there is also transaction ID management. Transaction ID management is managed through a component called GTM. All your requests, whether reading or writing, you must first get such a transaction ID to achieve read-write consistency.

The component of GTM supports one master and multiple slaves when it is deployed. The figure only shows one master and one slave, but when deploying, it can support one master and multiple slaves. The replication level of master and slave can be configured to be strong synchronous or asynchronous. Only the master of the GTM can provide the service, and the slave will never provide the service. The function of the slave is to synchronize two messages on the master node: one is the transaction ID, and the other is that there is a global sequence on the GTM. When we request sequence access, it must also access this GTM, so the load of the GTM mainly comes from these two aspects:

Requests for global transaction ID; requests for sequences.

GTM has a high demand for CPU. If all nodes request transaction ID from an instance when it has dozens of nodes or hundreds of nodes, then its main consumption is in CPU, so when deploying GTM nodes, consider that the machine must have multiple cores first. So how many are there? Depending on your request volume, if you are a highly concurrent business, then your CPU core number needs to be more, it can even run to dozens. Although only the Lord can provide services, GTM has basically been able to cover all business requests, reaching tens of millions of requests per second.

GTM also writes logs. Although GTM does not have high requirements for disk IO performance, it still has a certain guarantee of IO. When deploying GTM, we must pay attention to that if you are with other services, you should have a certain amount of IO. It is best to have a separate disk, even a relatively ordinary SAS disk. Although it does not need so many, it must be guaranteed. There is no guarantee that the request may fail. Memory can be appropriate, because the basic service is very light, and the amount of computation required for memory is not very high. Generally, like a dozens of cores of CPU, it may be equipped with 64 GB of memory, and it can be equipped with a 200 GB SSD or that kind of SAS disk to meet the requirements of a GTM deployment. The simplest thing is to achieve high availability with one master and one backup.

Next let's talk about the CN in the upper right corner, which is the orchestration node. In other distributed databases, it may also be called a PROXY node. CN can have multiple masters to provide services, which is the entrance to the business connection, that is, which IP the application wants to connect to, and which PORT is coming to the CN. For example, we may have three IP and three PORT here, then any IP or PORT can provide the application connection, and they are all peer-to-peer. So what's stored on CN? Metadata is stored, such as libraries, tables, users, views, etc., which we define. These metadata will exist on every CN. In addition to this, there is a very important information. Each CN node above also has a global routing information, so why store routing information? Because all application SQL requests need to be based on routing information to calculate which storage node the SQL will be sent to. When CN is deployed, there are generally at least two, or you can deploy one, but you need to deploy a CN slave, that is, CN requires master / slave. It is recommended that every CN has an active / standby, and CN acts as the coordinator of distributed transactions. If CN fails, there will be problems with distributed transactions. At this time, if you have a standby, it is very simple. If the slave is switched to master, you can ensure that the transaction continues to execute below.

CN has hardware requirements when it is deployed, and generally requires a large number of cores of CPU. If there is a business type that must pull data to the CN node for calculation, and so on, then it has higher requirements for memory, and most calculations can be pushed to DN for execution. The disk requirements are just what I said earlier. If you need to pull the data back to the CN disk for calculation, you need to configure a high-performance SSD disk. If you have materialized views, then the data of these materialized views will actually exist in CN, so your CN needs to have more storage space. Two or more CN deployments are recommended, and then there are both master and slave, and strong synchronization can be configured between the master and slave, so that if there is a failure, you can switch over and keep the transaction running.

The bottom Datanode is the DN node. The DN node is used to store user data. After all our users connect to CN, all his requests will be pushed to the DN below, so how is it pushed? According to the above CN routing information, it pushes this data to a node in a DN and stores this data in Datanode. Each Datanode is a shard. If there are 100 million pieces of data and 4 shards, then after reasonable shard storage, each shard will be about 25 million pieces of data. Datanode needs at least one master and one slave when deploying, or at least one master and one slave, even if it is a single center, because no matter which shard fails, if there is no slave, your whole data will be 1/4 less. In fact, the data is incomplete, which is equivalent to data missing. It doesn't matter if the CN node is broken when the CN has no standby. In fact, as long as you connect to another good CN. So how many pieces do you need? This depends on our business. The deployment of DN nodes also needs to make basic planning according to the specifications of stand-alone instance nodes. For example, you require that each node should not exceed as many Ts as possible, which is actually very high for your operation and maintenance costs.

DN is generally the most stringent on hardware, the machine requires many CPU cores, large memory, IO is better, because it is direct computing storage, in short, according to what kind of good configuration of the stand-alone can be used.

I summarize the introduction of service components as shown in the following figure, which is easy for you to understand.

Introduction to Part Ⅱ TBase Multi-activity deployment

Let's move on to the second chapter, an introduction to TBase multi-active deployment. We can take a look at the topology of the deployment. The first diagram we look at is a traditional active-standby deployment model.

The picture shows that it can be understood that no matter in different places or in the same city, it always has only one main center, that is, only one main center at the top can provide a read-write ability. other nodes can only provide a read-only capability. If strong synchronization between master and standby is required, then the control delay of the two-center office in the same city is less than 3 milliseconds, which is acceptable for most services. Although remote master / slave nodes can also be configured for strong synchronization, in fact, large delay has a great impact on business, which will lead to a great drop in TPS. For example, from the north to Guangzhou, its latency is often between 30 and 50 milliseconds. According to different network quality, a request back and forth is 100 milliseconds. Such a delay is basically unacceptable. The original data update of the server is a few milliseconds, but now it is 100 milliseconds back and forth, so no matter how your hardware is improved, it can not meet your business needs.

In addition, our application may need data to be directed to message middleware, such as the most common (KAFKA), active and standby deployment structure. Only the master node can achieve logical parsing, which itself is the technical limitation of PG. Only the primary node can achieve logical parsing, and then parse the data to (KAFKA). Whether it is in the same city or across cities, the data in different places cannot be synchronized to the message middleware. Then when you exchange messages in different places, the cost must be very high, and each message exchange can only be done in the main center. This is the architecture of the traditional active and standby deployment.

Next, I would like to share the dual-active deployment architecture, which is a dual-active deployment architecture, so what is the dual-active deployment architecture like?

You can see a complete set of instances in the south and a complete set of instances in the north. These two sets are completely independent. You can understand it as two sets of unrelated instances, but we want to bind these two sets of instances together. We think that it is two identical databases, and the data stored in it is the same. Here, the north-south two-way synchronization technology, that is, logical replication technology, is used to achieve two-way data synchronization between the machines in the south and the machines in the north. Here, both north and south can provide reading and writing. Data can be synchronized to message middleware, so that localized access can be achieved, no matter where my application is or exchange data with other applications.

For the original single-center application, after the business database is split into north and south, there is some shared data, which is more difficult to split here. Now the shared data is not split, one in the south and one in the north. Use dual-activity data synchronization to achieve dual-center multi-activity support.

In the traditional active / standby deployment and dual-active deployment architecture, they are all connected to a physical IP, like the CN node we mentioned earlier. However, if there is a failure, the physical IP of the connection will change, which is intrusive for the application. The application needs to constantly modify the IP of the database connection, so we have improved another architecture. We have replanned a new deployment architecture based on TBase, plus another component of Tencent Cloud is VPCGW.

As shown in the figure above, we will add a VPC at the front, which is equivalent to adding a virtual network. All micro-services, whether read or write, have a VIP, and then the VIP routes the CN node, so that the connection transparency can be achieved. In addition, VPCGW can also support load balancing of multiple CN access nodes.

Here I would like to summarize the difference between double active and active / standby.

Number of instances: there are two completely independent instances in double active users, while there is only one instance in the master / slave. Write attribute: dual active is bi-directional, active / standby is unidirectional. Application access: routing will become very simple, and all applications exchange data in a single center. Heterogeneous data synchronization: data exchange can be done on both sides of the dual activity, and half of the data can be exchanged with master and standby. Deployment of operation and maintenance system: double live operation and maintenance system is localized, the main and standby systems span north and south, and the operation and maintenance system also spans north and south. Operation and maintenance is more complex. When switching, you need to switch between the operation and maintenance system and the instance. The switching cost will be very high.

So what are the advantages of dual active activity and active / standby comparison?

Synchronization direction: the dual active data is synchronous and bidirectional, and the active and standby data is unidirectional. Synchronization granularity: the level of synchronization configuration, the master / slave must be based on the instance level, and the dual activity itself can be table-level or library-level. Locally writable: instances on both sides of double activity can be written, and only one center of master and slave can be written. Synchronization of local data to heterogeneous databases: both activities are supported, while master / slave is not supported. North-South switching cost: the double active database does not need to be switched, but the master / slave must be switched. If the data synchronization is configured as asynchronous, how to complete the data after switching? This is actually a very difficult task, although the double active users are also asynchronous, and after it is switched, the data at the other end may not be able to come over, but after the fault is repaired, it can automatically synchronize the differential data. This part does not need active intervention, as long as you can avoid this conflict in logic. Then you can basically use the dual-active solution to achieve a very lightweight switching cost. Hardware utilization: only one center can be written in the main and standby, the utilization rate of the hardware is low, the dual activity is writable in two centers, and the utilization rate is relatively high. Business experience: double active users are updated locally with low latency and consistent experience in different regions, while the delay of north-south access between master and slave is different, and the business experience is poor.

So what is the challenge of double work?

Performance is not as good as physical replication, which is a challenge and requires multi-channel replication. Data conflict, logical replication will bring some conflict problems, such as the order number conflict in the south and the order number conflict in the north, which need to be resolved at the business level. DDL synchronization, logical replication is not synchronous DDL, need to have a special tool to synchronize DDL, which brings challenges to the operation and maintenance. Part Ⅲ node-level failover

The next chapter will focus on the handover scheme of node-level failures.

1.GTM standby failure.

If the GTM active / standby configuration is strongly synchronous, the system chooses a new standby GTM.

If there is no standby GTM, it will affect GTM data synchronization and business access, so it is necessary to downgrade its synchronization mode from strong synchronization to async.

2.GTM master node failure

The acquisition of transaction ID is affected and the system is not available. In this case, you need to select the latest slave for log synchronization to switch to GTM master.

After switching, you have to modify the route, and the GTM routing information of each DN and CN needs to be modified.

GTM replication is degraded. If there is no GTM slave after switching, the service can be provided only after the downgrade is successful.

Failure of 3.DN standby node

DN standby failure, read-only service failure, how to switch when this fault occurs?

This is actually very simple, because we have added a VIP in front of us, and the VIP can continue to access as long as it points to the CN master.

If there are more than one standby in the fault node, add another standby to the read-only plane and you can continue to use it only if you read it.

It turns out that the master / slave is strongly synchronous. If there is no standby available here, the master / standby synchronization must be downgraded to asynchronous, otherwise the data cannot be written in.

Failure of 4.DN master node

The system carries on the active and standby switching, and selects the latest standby machine from multiple synchronous standby machines as a new master. In addition, you can also manually force the switching past.

Modify the access routing information of all nodes to point to the new DN node

If no backup node is available after the DN node is switched between master and slave, the original strong synchronization needs to be downgraded to asynchronous.

If there is a read-only plane, the read-only plane is also modified to point to the main plane.

5.CN standby node failure

If the CN node is configured for strong synchronization and a new CN standby node exists, you need to select a new standby CN to upgrade to a strong synchronous node.

If there is no new CN standby node, the replication level is degraded to asynchronous mode.

If all nodes in the read-only plane are incomplete, the read-only VIP switch points to the CN master node.

6.CN master node failure

The failed CN does not affect the service provided by other CN master nodes. VIP will delete the failed CN master node from the load balancer table.

The CN failure node carries on the active and standby switching, and selects a new primary node from the available standby node to switch.

Modify the routing information table for accessing failed CN nodes in each node.

If there is no backup CN node available after the switch, the synchronization level of the node needs to be adjusted to asynchronous replication.

If no IP is available for the CN slave list corresponding to the slave plane VIP, the VIP of the read-only plane modified by the system points to the CN master node.

Part Ⅳ central-level failover

After introducing the three nodes, we can take a look at the central level again.

Here is an example for everyone to understand, such as the external optical fiber is all cut off, or the whole example can not be used. As the picture shows, the north-south network may be impassable, the north can not be used, only the south can be used, what should we do at this time? We only need to modify the access route of the microservice at the top layer, which may be a DNS change. Because the data is nearly synchronized, the data it can read is still consistent, and it can also provide read and write services. For the data that has not been synchronized, the different data will be synchronized automatically after the network or system is normal.

If you use master / slave synchronous data, because the physical location across regions in the south determines that it is impossible to use synchronous replication, it is basically impossible to switch in the event of a failure, because your switching cost is too high. First of all, the management and control platform needs to be switched, and the database needs to be switched again. It is very difficult to repair the later data.

From the point of view of central-level failover, in the architecture of multi-active deployment, the database does not need to do anything when switching off-site. For micro-services, it is to switch DNS. After the failure is restored and the data synchronization is completed, just cut back the DNS and cut the traffic back. For business availability, the experience is good, and the switching cost of dual active users is low, which provides a relatively lightweight cost, which makes this true north-south handover possible.

Part Ⅴ Quba

Q: how to select the number of active and backup nodes?

A: double active deployment the DN nodes in the south center must be one master and one backup or one master and two backup, and the north center is also one master and one backup or one master and two backup, because both sides can be written. So when it is deployed, it is actually two sets of completely independent instances, which need to provide highly available protection locally.

Is it necessary for Q:CN to be a master and backup?

A:CN is to solve the problem of stress load balancing, while the master and standby also solve the problem of continuity of distributed transactions. CN will participate in some distributed transactions, for example, we build tables, build databases, and CN is a coordination node. If a transaction has two pieces of data, one is inserted into DN1 and the other is updated to DN2, then it belongs to distributed transactions, so CN has to manage these transactions. If there is no active and standby, and a load of CN master is completely unavailable, then the distributed transaction cannot continue. If you have distributed transactions in your business type, it is recommended that CN nodes have active and standby nodes.

Q: does multi-active users support strong synchronization and what is the delay?

A: there is no strong synchronous mode for multi-activity, which is based on logical replication. You can only use asynchronous synchronization mode and the delay. For example, from Beijing to Guangzhou, the visible delay of data from here to there is about 200 milliseconds. But in fact, it is asynchronous and does not affect your business. In other words, your front-end users feel very fast, slow is slow in the back-end data synchronization there, basically if your network is reliable, performance is not affected, you a piece of data from south to north, this is determined by the physical network, it is about 200 milliseconds.

Q: from the present point of view, what kind of problem is the biggest problem to solve in this kind of structure?

A: at present, for many businesses in a single center, hundreds of businesses may have more than 1000 instances in one center, whether they are in Beijing, Guangzhou, all deployed in Guangzhou or all stacked in Beijing. Then the cost of remote connection is relatively high, and the delay is relatively large. But if you take it apart, Beijing is in Beijing and Guangdong is in Guangdong. In this case, you have taken apart the business library, but in fact, because their interaction was completed in one center when they were in a single center, then the delay of interaction between them is very low. After you split, for applications that need to exchange data between different systems, the delay becomes larger, and the problem of user experience delay can be solved just by doubling.

VPCGW component is another product of Tencent Cloud, which is not included in TBase. You can purchase Tencent Cloud's TCE platform.

Q: is the operation and maintenance management system implemented by open source tools?

A: master / standby switching is self-developed tools, and our OPS system is self-developed, which does not use third-party current components, because at present, some external third-party tools are still very difficult to meet the entire complex OPS management and control system. Therefore, our OPS management and control system is self-developed.

Q: what's the difference between standalone and distributed?

A: distributed can achieve multiple shard online expansion, handle more concurrent requests, and store a larger amount of data to meet the expanding needs of the business. On the other hand, the stand-alone machine can only expand the number of standby machines, but if the pressure of data volume and access volume is small, the running efficiency of the stand-alone machine is better than the distributed one, and the operation and maintenance is simpler than the distributed one.

The above are the answers to today's sharing and Qaccouna. Thank you for listening.

TBase is one of the three major products of Tencent TEG Database working Group. It is an enterprise-level distributed HTAP database management system developed on the basis of open source PostgreSQL. Through a single database cluster to provide customers with highly consistent distributed database services and high-performance data warehouse services at the same time, to form a set of integrated enterprise solutions. When you encounter related problems in the database field, please feel free to leave a message.

Previous recommendation

The database used by WeChat Pay is open source.

Special experience of cloud database

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.