Live Review | the problem of database and table that has been perplexing for many years has finally been solved. 07/12 Update SLTechnology News&Howtos

Live Review | the problem of database and table that has been perplexing for many years has finally been solved.

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Tencent Cloud Database domestic database online technology salon is in full swing. Zheng Han's sharing ended on March 17. For those who don't have time to participate, don't worry. The following is a live video and text review.

Follow the official account of "Tencent Cloud Database" and reply "0317 Zheng Han" to download LVB and share PPT.

Smooth capacity expansion under 100 million-level traffic scenarios: practice of TDSQL horizontal expansion solution _ Tencent Video

Without saying much, we officially enter into today's sharing. The topic shared today is "smooth expansion in 100 million-level traffic scenarios: practice of TDSQL horizontal expansion solution".

Today's sharing will mainly include these four parts:

The first part first introduces the background of horizontal expansion, mainly introduces why horizontal expansion is needed, mainly compares it with vertical expansion, and talks about the general problems that we will encounter in horizontal expansion.

The second part will briefly introduce how to expand TDSQL horizontally to give you an intuitive impression.

The third part will introduce in detail the design principle behind TDSQL horizontal expansion, which will mainly correspond to the first part to see how TDSQL solves the problems encountered in general horizontal expansion.

The fourth part will introduce the cases in practice.

I. the background and challenges of horizontal database expansion

First of all, let's look at the background of horizontal expansion. The reason for capacity expansion is actually very intuitive. Generally speaking, it is mainly due to the expansion of the business traffic or the required scale, while the existing capacity or performance can not meet the business needs, mainly because the TPS and QPS are insufficient or the delay exceeds the business tolerance range, or the existing capacity can not meet the requirements. The latter mainly refers to disk or network bandwidth. Generally speaking, when we encounter this kind of problem, we have to expand our capacity. In terms of capacity expansion, in fact, there are two common ways, one is vertical expansion, the other is horizontal expansion. The two have different characteristics, and their advantages and disadvantages are actually very obvious.

1.1 horizontal expansion VS vertical expansion

First of all, let's look at vertical expansion. Vertical expansion is mainly to improve the configuration of the machine or the configuration of the instance. Because, we know that when you purchase a database or an instance on the cloud, it is allocated on demand, that is, for a user, the current business volume may be small, requiring only two CPU or a few gigabytes of memory. As the business grows, he may need to expand the capacity of this instance, so he may need 20 CPU or 40 gigabytes of memory.

At this point, on the cloud, we can dynamically adjust the resources to meet the needs of the business-that is, we can dynamically add CPU on the same machine. The limit of this expansion is that when the CPU and memory of the whole machine are given to it, if it is found that it is not enough, a better machine needs to be prepared for expansion. This can be switched between master and slave in MySQL: select a slave first, and then synchronize the data; after the data synchronization is completed, switch between master and slave, so that you can take advantage of the better machine.

As you can see, during the whole process, there is basically no impact on the business-switching between master and slave. If you switch to IP, it is actually through the front-end or VIP, which basically has no impact on the business. So one of its biggest disadvantages is that it depends on stand-alone resources: you can provide it with a better machine to meet a certain amount of requirements. With the more rapid development of the business, you will find that the best machines you can provide now may not be satisfied, which means that you can no longer expand. Therefore, the biggest disadvantage of vertical expansion is that it depends on the resources of a single machine.

1.2 horizontal expansion

Compared with vertical expansion, another way is called horizontal expansion. The biggest advantage of horizontal expansion is that it solves the problem of vertical expansion-in theory, horizontal expansion can be carried out indefinitely, and it can dynamically adapt to the needs of business by adding machines.

Compared with horizontal expansion, it can solve the problem of vertical expansion, but it will introduce some other problems. Because horizontal expansion is more complex than vertical expansion, let's analyze the problems that may be encountered, and we will introduce the solution of TDSQL later:

First of all, in the vertical expansion, after the system has been expanded, in fact, there is still one node for the data as a whole, and all the data is also stored on the slave computer in the one-master and multi-slave architecture. The data will be split in the process of horizontal expansion, and the first problem is, how to split the data? Because if the split is not good, when there is hot data, the result may be that even if the data has been split into many parts, the separate node that stores the hot data will become a performance bottleneck.

Second, in the whole process of horizontal capacity expansion, data relocation and routing changes will be involved. So is it possible to have no perception of the business in the whole process? Or how intrusive is it to the business?

Third, in the whole expansion process, because there are so many steps just now, if one of them fails, how can it be rolled back? At the same time, in the whole process of capacity expansion, how to ensure high data consistency in the switching process?

In addition, after the expansion, because the data is split to each node, how can the performance after the expansion be guaranteed? Because in theory, we want me to achieve a linear improvement in performance as the machine increases, which is an ideal state. In fact, in the whole process of horizontal expansion, different architectures or different ways have a great impact on performance. Sometimes it is found that there may be a lot of capacity expansion, the machine has been increased, but the performance is very difficult to achieve linear expansion.

Similarly, how can we continue to ensure the distributed nature of the database when the data has been split into multiple parts? In a stand-alone architecture, a copy of the data is stored, similar to MySQL's support for local atomicity-- ensuring that all the data in one thing either succeeds or fails. In a distributed architecture, atomicity can only guarantee that the data is consistent at a single point. Therefore, from a global point of view, since the data is now cross-node, how to ensure global consistency in the cross-node process, and how to ensure that the data is either written successfully or rolled back on multiple nodes? This will involve distributed transactions.

So you can see that the advantage of horizontal expansion is very obvious, it solves the limitation of vertical expansion machine. But it is more complex and introduces more problems. Next, with these questions, I will introduce how TDSQL can expand its capacity horizontally and how it solves the problems just mentioned.

II. TDSQL horizontal expansion practice 2.1 TDSQL architecture

First, let's take a look at the architecture of TDSQL. To put it simply, TDSQL consists of several parts:

The first part is the SQL engine layer: it mainly acts as the access end, shielding the data storage details of the entire TDSQL back-end. For the business, the business accesses the SQL engine layer.

Then there is a data storage layer composed of multiple SET: in a distributed database, the data is stored on each node, and each SET is treated as a data unit. It can be one-master, two-backup or one-master, multi-backup, which is deployed according to the needs of the business. Some business scenarios require high data security, such as one master, three slaves or one master and four slaves. This is the data store.

Another is the Scheduler module, which is mainly responsible for the monitoring and control of the whole system cluster. When the system is expanded or the active / standby switch is carried out, the Scheduler module is the same as the brain control module of the whole system. For the business, it only focuses on the SQL engine layer, and there is no need to pay attention to Scheduler, how the data is divided across nodes, how many nodes, and so on, which are imperceptible to the business.

2.2 TDSQL horizontal expansion process

You can take a look at the whole expansion process: at the beginning, the data is placed on a Set, that is, in a node. So expansion is actually expanding the data to-- there are 256 Set in it, and it will expand to 256 machines. As you can see in the whole expansion, there are several key points:

At first, although the data is on a node, on a machine, but in fact, the data has been split, in this example, has been split into 256.

Horizontal expansion is simply to migrate these shards to other Set, that is, other node machines, so that machines can be added to provide system performance.

To sum up, the data has been sliced at the beginning, and the expansion process is equivalent to moving the shard to a new node. During the whole expansion process, the number of nodes increases, from 1 to 2 to 3, and even to 256 at last, but the number of shards remains the same. At the beginning, 256 shards are on one node, and if they are expanded into two nodes, it is possible that every 128 shards are on one node; in the end, it can be expanded to 256 nodes, with data on 256 machines, and each machine is responsible for one of them. Therefore, the whole expansion is simply the relocation of slices. We'll talk about the details later.

On private or public clouds, a unified foreground page is provided for the entire expansion TDSQL, which is very convenient for users to use.

Let's take a look at this example. Now in this case there are two Set, that is, two nodes, each node is responsible for part of the routing, the first node is responsible for 0-31, the other name is 3, and the routing information is 32-63. Now there are two nodes. If you want to expand capacity, we will have a "add Set" button on the foreground page. Click "add Set", and a dialog box will pop up. By default, the previous configuration will be selected automatically, and users can customize it, including the current Set, how many resources are required, as well as the allocation of memory and disk, and so on.

In addition, because the expansion requires routing switching, we can manually select a time, which can be switched automatically, or the actual situation of the business can be judged by the business, and the routing switching can be operated manually. These can be set up according to the needs of the business.

After the first step is created, I just said that the brain module is responsible for allocating various resources, as well as the whole logic of initialization and data synchronization. Finally, you will see that the first node, which used to be two nodes, has now become three nodes. Before the expansion, the first node was responsible for 0-31, now it is only responsible for 0-15, and the new node is responsible for the other part of the routing. So throughout the process, you can see that you can quickly add from two nodes to three nodes with one click on the web page-- we can continue to add Set and continue to expand with one click according to the needs of the business.

III. The design principle behind the horizontal expansion of TDSQL

Just now mainly introduced the core architecture of TDSQL, as well as the characteristics of horizontal expansion and foreground operation, (to help) everyone to establish an intuitive impression.

In the third chapter, I will introduce in detail the design principles behind the horizontal expansion of TDSQL, mainly to take a look at some of the problems that will be encountered in the horizontal expansion mentioned in the first chapter, and how we solve these problems. These problems need to be solved no matter which system is being expanded horizontally.

3.1 Design principle: how to choose partition keys to balance compatibility and performance

First of all, we just mentioned that the first problem of horizontal expansion is how to split the data. Because data splitting is the first step, this will affect the entire subsequent use process. For TDSQL, the logic of data splitting is put into a syntax for creating tables. The business is required to specify that the shardkey "equals a field"-the business needs to select a field as the partition key when designing the table structure, so that the TDSQL will split the data according to the partition key, and if accessed, the data will be aggregated according to the partition key. We want the business to be involved in designing the table structure, specifying a field as the shardkey. In this way, compatibility and performance can be well balanced.

In fact, we can also make sure that users do not specify shardkey when creating tables, and a key is randomly selected at our bottom to split the data, but this will affect the efficiency of subsequent use, such as not giving full play to the performance of distributed databases. We believe that if the business layer has a small participation in the design of the table structure, it can bring a very large performance advantage, allowing compatibility and performance to be balanced. In addition, if the shardkey-- partitioning key is selected by the business, when the business designs the table structure, we can see multiple tables, and we can select the relevant column as the shardkey, which ensures that when the data is split, the relevant data is placed on the same node, which can avoid cross-node data interaction in many distributed cases.

When we create tables, partitioned tables are our most commonly used, splitting data into nodes. In addition, in fact, we provide two other types of tables-- a total of three types of tables, and the main thinking behind this is for performance, that is, by putting all data such as the global table on each node-- at the beginning, you will see that all the data on each node is equivalent to no distributed feature, no horizontal split feature, but in fact, this kind of table We generally use it in some configuration tables with relatively small amount of data and few changes, to ensure subsequent access through data redundancy, especially to avoid cross-node data interaction during operation. In other aspects, for shardkey, we will make a Hash according to user. The advantage is that our data will be evenly distributed on each node to ensure that there will be no hot spots.

3.2 Design principle: high availability and high reliability in capacity expansion

As mentioned just now, because the process of the whole expansion process will be more complex, can the whole expansion process ensure high availability or high reliability, and what is the perception of the business? how does TDSQL do it?

Data synchronization

The first step is the data synchronization phase. Suppose we have two Set now, and then we find that one of the SET's disk capacity is already dangerous, for example, it may reach more than 80%. To expand its capacity at this time, we will first create a new instance, copy the image, create an instance, and create a synchronization relationship. The process of establishing synchronization is not aware of the business, and this process is real-time synchronization.

Data check

In the second stage, the data is continuously leveled and checked continuously. This process may last for a period of time, when the delay between two synchronizations is infinitely close-- for example, if we set a threshold of 5 seconds, when we find that we have reached less than 5 seconds, at this time we will enter the third stage, the routing update phase.

Routing updates

In the routing update phase, first of all, we will freeze the write request. At this time, if the business has been written, we will reject it and let the business try again in two seconds. At this time, it will have an impact on the business in seconds. But this time will be very short. After freezing the write request, when the third instance is synchronized, we will soon find that all the data is catching up, and there is no problem with verification. At this time, we will modify the route and perform relevant atomic operations to shield the partition of the storage layer at the bottom, so as to ensure that the data of the SQL access layer will not be written wrong if the route is too late to update. The partition has been blocked because of the changes made at the bottom. This ensures the consistency of the data. Once the route is updated, the third SET can receive the user's request. At this time, you can find that the first SET and the third SET have full data because they have established synchronization.

Delete redundant data

The last step is to delete the redundant data. To delete redundant data, delay deletion is used to ensure that it can be deleted slowly in the process of deletion, and it will not cause large IO fluctuations and affect the business of the existing network. During the whole deletion process, we have done partition masking, and at the same time, we will rewrite SQL at the SQL engine layer to ensure that even if we have a full scan when we have redundant data at the bottom, we can ensure that there will be no more data. You can see that the whole expansion process, data synchronization, and verification and deletion redundancy will take a relatively long time, because if you want to build synchronization, if the amount of data is large, the whole copy mirror or chasing binlog will take a relatively long time. However, these stages actually have no impact on the business, and the business simply does not realize that a new synchronization relationship has been added. So if there is a problem when establishing a synchronization relationship, or if there is a problem when creating a new standby, you can change the standby machine, or after a retry, this will have no impact on the business. In the routing update phase, it is hard to avoid the impact on business write requests in seconds, but we will limit the impact time window to a very short time window. This is because freezing write requests requires a relatively small threshold of synchronization within 5 seconds. Only after synchronization has reached this stage can we initiate routing update operations. At the same time, we do partition shielding to the storage layer to ensure that between multiple modules, if there are different updates, there will be no data confusion. This is a principle process of how to ensure high availability and high reliability in the expansion, and the whole expansion has very little impact on the business.

3.3 Design principles: distributed transactions

What I just talked about is the general process of the expansion phase, and how TDSQL solves the problem. Next, we will look at how to solve the problems brought about by the horizontal expansion after the completion of the expansion. The first is distributed transactions.

Atomicity, decentralization, linear growth of performance

After the expansion, the data is cross-node, the system originally has only one node, now cross-node, how to ensure the atomicity of the data, this is based on two-phase commit, and then implement the distributed transaction. The whole processing logic completely shields the complexity behind the business. For the business, using a distributed database is the same as using a stand-alone MySQL. If the SQL of the business accesses only one node, it can be done with an ordinary transaction; if we find that a SQL or a transaction of a user operates on multiple nodes, we will use two-phase commit. At the end of the day, the atomicity of the entire distributed transaction is guaranteed by logging. At the same time, we achieve complete decentralization in the implementation process of the whole distributed transaction, we can do TM through multiple SQL, and the performance can also achieve linear growth. In addition, we have also done a large number of various exception verification mechanisms, including very robust exception handling and global trial and error mechanism, and we have also passed the standard verification of TPCC.

3.4 Design principle: how to achieve linear performance growth in capacity expansion

For horizontal expansion, splitting data into multiple nodes brings two main problems: one is the atomicity of transactions, which is solved by distributed transactions, and the other is performance.

In vertical expansion, a linear increase in performance is generally achieved by replacing a better CPU or similar method. In terms of horizontal expansion, because the data is split into multiple nodes, how to make good use of the split nodes to carry out parallel computing, and really give full play to the advantages of horizontal distributed database? it requires a lot of operations and a lot of optimization measures. TDSQL has made some optimizations.

First, the relevant data exists on the same node. When building the table structure, we want the business to participate in it, and specify some relevant keys as shardkey when designing the table structure, so that we can ensure that the relevant data at the back end is on a node. If you do a federated query on this data, you don't need to cross nodes.

Similarly, we achieve performance improvement through parallel computing and streaming aggregation-we split the SQL to each background node, and then calculate it in parallel with each node, and then do the second aggregation through the SQL engine, and then return it to the user. In order to reduce the pull of data from the backend to SQL and reduce a pull of data, we will do some push-down queries-push more conditions down to DB. In addition, we also do data redundancy to minimize cross-node data interaction through data redundancy.

Let's take a brief look at an aggregation-how TDSQL is basically unaware of the business after horizontal expansion, and uses it in the same way as using stand-alone MySQL. For the business, assume that there are seven pieces of data, regardless of whether the specific data of the table exists in one node or multiple nodes, only seven pieces of data need to be inserted. The system will parse the syntax according to the passed SQL and automatically rewrite the data. Seven pieces of data, the system will calculate according to the partition key, find that these four should be sent to the first node, the other three to the second node, and then rewrite, rewrite and insert the data. For users, this is the implementation of such a rule, but across nodes, we will use two-phase commit, thus turning into multiple SQL, thus ensuring that if there is a problem, both sides will be rolled back at the same time.

After the data insertion, if the user wants to make some queries-in fact, the user does not know that the data is split, it is a complete table for him, and he uses similar aggregate functions to query. Similarly, the SQL will also be rewritten, and the system will send the SQL to the two nodes, adding some average functions to convert accordingly. To each node, the system will first do the data aggregation, and then do the aggregation here again. The advantage of adding this step is that if we come here, we can do an aggregation, which means that we don't need to cache too much data here, and do a streaming calculation to avoid consuming too much memory at once.

For some of the more complex SQL, such as multiple tables or more subqueries, if you are interested, you can follow our later sharing-SQL engine architecture and engine query practice.

In the third chapter above, we introduce in detail some principles of the horizontal expansion of TDSQL, such as how to split the data, the practice of horizontal expansion, and how to solve the problems in the process of horizontal expansion. We also introduce some problems caused by horizontal expansion and how TDSQL is solved.

IV. Practical cases of horizontal capacity expansion

In the fourth chapter, we briefly introduce some practices and cases.

4.1 practice: how to select partitioning keys

We just said that we hope to consider the choice of partitioning keys when the business is involved in the design of the table structure when creating the table. How to choose the partition key? Here is a brief introduction according to several types.

If it is a user-oriented Internet application, we can use the corresponding field of the user, such as the user ID, as the partition key. This ensures that when you have a large number of users, the data can be split to each back-end node according to the user's ID.

For game applications, the logical main body of the business is the player, and we can use the corresponding fields of the player; for e-commerce applications, we can use some fields of the buyer or seller as the partition key. The Internet of things can be used as a partition key, such as the ID of the device. Generally speaking, selecting the partition key is to do a better job of splitting the data to avoid the final leak. In other words, by selecting this field through the partition key, the data can be evenly distributed to each node. In terms of access, when there are more SQL requests, there is actually a partition key condition. Because only in this case, can we give better play to the advantages of distribution-if there is a partition key in the condition, the SQL can be directly entered into a node; if there is no partition key, it means that the SQL needs to be sent to all the backend nodes.

As you can see, if you expand the capacity horizontally to more-- from one node to 256nodes, then if a SQL is not written well, you may need to aggregate all the data of 256nodes, and the performance will not be very good.

To sum up, we want the business to be involved in creating tables and designing table structures as much as possible. Because whether it is the aggregate function or the operation of various transactions, it is basically unaware of the business, and the participation of the business at this time means that it can achieve a great performance improvement.

4.2 practice: when will it be expanded?

When will we expand our capacity? In TDSQL, we will have a lot of monitoring data, for each module, we will monitor the running status of the whole system locally, and there will be various log reporting information on the machine. Based on this information, we can decide when to expand.

To put it simply, for example, disks-if the data disk utilization is found to be too high, the capacity can be expanded at this time; or SQL requests, or CPU utilization is close to 100%. Currently, it is necessary to expand the capacity if it reaches 80% utilization. Another situation is that at this time, the number of requests is actually relatively small, and the use of resources is relatively adequate, but if the business tells you in advance that an activity will be carried out at a certain time, the number of requests for this activity will increase several times. At this time, we can also complete the expansion ahead of schedule.

Let's take a look at a few more cluster cases on the cloud. As you can see, this cluster has four SET, each SET is responsible for a part of the shardkey, this routing information is 0-127, which means it can eventually expand to 128nodes, so it can be expanded 128times. This "128" can be determined by the initialized business estimate. Because if the pool is too big, it can eventually be expanded to thousands, but the data will be scattered. In fact, the performance of each cloud or actual machine today is already very good, and you don't need thousands of specifications.

This is another cluster-- it has a little more nodes, there are eight nodes, and each node is responsible for part of the routing information. This number is only 64, so this can eventually be expanded to 64 nodes. This is a relevant example on the cloud.

Today, my sharing is mainly about these contents. If you have any questions, please feel free to comment.

5. QroomA:

Q: before the expansion, the tables in the SET are all partition tables. Is it a partition table?

A: yes, before the expansion, it is equivalent to this. To put it simply, we only have one node now, so we tell him 256, which is the value we set during initialization. And this value cluster will not change after initialization. Suppose our cluster has set a value of 256-because he may think that the amount of data will be very large, 256. At this time, the data is all on the same node. At this time, the user creates a table according to our syntax, which is actually divided into 256 parts at the bottom. So even if he does not expand its capacity, its data is 256. Create another table, which is also 256 copies. Users may create two tables, but we have 256 partitions at the bottom of each table, and expansion is equivalent to moving it to another location.

Q: how to ensure the consistency between the backup files of each node when they are restored?

A: there is no relationship between each node. Each node is responsible for part of the routing number range, only part of the data is stored, and the horizontal expansion is only responsible for part of the data. In fact, the backups between them are not related to each other. So this backup is actually not related. We may have one master and two backups for each node. In fact, we have a strong synchronization mechanism to ensure strong data consistency during replication. You can refer to the previous sharing, which will describe in more detail "TDSQL in a single node, TDSQL one-master and multi-slave architecture is how to ensure strong data consistency."

Q: can a single point of failure be avoided when the two phases are coordinated?

A: first of all, in the two-phase commit, we use the SQL engine to coordinate transactions, which is a single transaction. If other connections are sent, other SQL engines can be used for transaction coordination. And each SQL engine is stateless and can scale horizontally. So in fact, there will not be too many failures, we can randomly expand according to the performance, can achieve the linear growth of performance, there is no centralization. The logs are all broken up, and the logs will also be recorded in the data nodes at the back end of the TDSQL, with one master and multiple backups, strong internal consistency and no single point of failure.

TDSQL is one of the three major product lines under Tencent TEG Database working Group. It is a financial-level distributed database product developed by Tencent. It is widely used in finance, government affairs, Internet of things, smart retail and other industries, and has a large number of distributed database best practices.

Special experience of cloud database

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.