What is the method of designing cloud native database 07/12 Update SLTechnology News&Howtos

What is the method of designing cloud native database

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of what is the method of cloud native database design. The content is detailed and easy to understand, easy to operate, and has a certain reference value. I believe you will gain something after reading this article on the method of cloud native database design. Let's take a look at it.

Common distributed database schools

I have classified the development of distributed database by age, which has been divided into four generations so far. The first generation is based on simple sub-library sub-table or middleware to do Data Sharding and horizontal expansion. The second generation system is the NoSQL database represented by Cassandra, HBase or MongoDB, which is generally used by Internet companies and has a good ability of horizontal expansion.

I personally think that the third generation system is a new generation of cloud database represented by Google Spanner and AWS Aurora. They are characterized by the integration of the expansion capabilities of SQL and NoSQL, expose the interface of SQL to the business layer, and can achieve horizontal expansion in use.

The fourth generation system takes the design of current TiDB as an example, and begins to enter the era of mixed business load. A system has the characteristics of not only doing transactions but also dealing with high concurrent transactions, and the ability to combine some data warehouses or analytical databases at the same time, so it is called HTAP, which is an integrated database product.

What the future looks like, in the following sharing, I will introduce some prospects for the future. From the entire timeline, from the 1970s to the present, database is also an ancient industry, the specific development of each stage, I do not expand too much.

Database middleware

For database middleware, the first generation system is the middleware system, basically there are two mainstream modes, one is to do manual database and table in the business layer, for example, the user of the database tells you in the business layer. The data from Beijing is placed in one database, while the data from Shanghai is placed in another database or written on different tables. This is the simplest manual sub-database table at the business layer. I believe all friends who have operated the database are very familiar with it.

The second is to specify the rules of Sharding through a database middleware. For example, such as the user's city, user's ID, time as slicing rules, automatic allocation through middleware, there is no need for the business layer to do.

The advantage of this approach is simplicity. If the business is very simple, for example, writing or reading can be basically degenerated to be completed on a shard, and after the application layer is fully adapted, the latency is still relatively low. On the whole, if the workload is random, the TPS of the business can also be expanded linearly.

But the disadvantages are also obvious. For some complex businesses, especially some cross-shard operations, such as querying or writing to maintain strong data consistency across shards, it is more troublesome. Another obvious disadvantage is that it is difficult for the operation and maintenance of large clusters, especially to do some similar operations such as table structure changes. Imagine that if there are 100 shards, it is troublesome to add or delete one column, which is equivalent to performing operations on all 100 machines.

NoSQL-Not Only SQL

Before and after 2010, many Internet companies found this big pain point. After thinking about the business carefully, they found that the business was simple and did not require the particularly complex functions of SQL, so they developed a school called NoSQL database. The characteristic of NoSQL is to give up the advanced SQL capability, but there must be gains and losses, or if you give up something, you can always get something in exchange for a transparent and strong horizontal expansion ability to the business, but in turn, it means that if your business is originally based on SQL, it may bring relatively large transformation costs, such as MongoDB, Cassandra, HBase and so on.

The most famous system is MongoDB,MongoDB, although it is also distributed, but it is still the same as the scheme of sub-database and sub-table. We are all familiar with the advantages of choosing sharded key, that is, we have no table structure information, we can write whatever we want, and we are friendly to document data, but the disadvantages are obvious. Since we have chosen Sharding Key, we may be doing slicing according to a fixed rule. Therefore, it will be troublesome when there are some cross-shard aggregation requirements, and the second is that there is no good support for cross-shard ACID transactions.

HBase is a well-known distributed NoSQL database under Hadoop ecology. It is a NoSQL database built on HDFS. Cassandra is a distributed KV database, which is characterized by providing a variety of consistency models in KV operation. It has the same shortcomings as many NoSQL problems, including the complexity of operation and maintenance, the requirements of KV interface for the transformation of the original business, and so on.

The third generation distributed database NewSQL

As mentioned earlier, Sharding or sub-database and sub-table, or NoSQL, are faced with an intrusive problem of business. If your business is heavily dependent on SQL, it is very uncomfortable to use these two solutions. So some companies with more advanced technologies are wondering whether they can combine the advantages of traditional databases, such as SQL expressiveness and transaction consistency, but also with the good features of the NoSQL era, such as expansibility, to develop a new, scalable, but as convenient as a stand-alone database system. Under this line of thinking, two schools have been born, one is Spanner, the other is Aurora, both of which are a choice made by top Internet companies when faced with this problem.

Shared Nothing school

Shared Nothing this school is represented by Google Spanner, the advantage is that it can achieve almost unlimited horizontal expansion, the whole system has no endpoints, whether it is 1 T, 10 T or 100 T, the business layer basically does not have to worry about scalability. The second advantage is that his design goal is to provide strong SQL support, there is no need to specify sharding rules, sharding strategy, the system will automatically help you to do expansion. The third is to support strong consistent transactions like a stand-alone database, which can be used to support financial-level businesses.

Representative products are Spanner and TiDB, this kind of system also has some shortcomings, in essence, a pure distributed database, many behaviors can not be exactly the same as stand-alone behavior. For example, for example, delay, when a stand-alone database is doing a transaction, it may be completed on a single machine, but on a distributed database, if you want to implement the same semantics, the rows that need to be operated in this transaction may be distributed on different machines and involve multiple network communication and interaction, so the response speed and performance are certainly not as fast as the last operation on a single machine. Therefore, there are some differences between stand-alone database and stand-alone database in some compatibility and behavior. Even so, for many businesses, compared with sub-database and sub-table, distributed database still has many advantages, such as being much less intrusive than sub-database sub-table in terms of ease of use.

Shared Everything school

The second school is the Shared Everything school, which represents PolarDB with AWS Aurora and Aliyun. Many databases define themselves as Cloud-Native Database, but I think the Cloud-Native here is more because these solutions are usually provided by public cloud service providers. As to whether their own technology is cloud native, there is no unified standard. From a purely technical point of view, a core point is that computing and storage in such systems are completely separated. Computing nodes and storage nodes run on different machines, and storage is equivalent to the feeling of running a MySQL on a cloud disk. I personally think that an architecture like Aurora or PolarDB is not a pure distributed architecture.

The original master-slave replication of MySQL Binlog,Aurora as a representative of Share Everything Database on the cloud, the design idea of Aurora is to copy the flow of the entire IO only through the form of redo log, rather than through the entire IO link to the last Binlog, and then to another machine, and then apply this Binlog, so the IO link of Aurora is reduced a lot, which is a great innovation.

The smaller unit of log replication means that I only send Physical log, not Binlog, nor directly send statements. Sending physical logs directly represents the path of smaller IO and smaller network packets, so the throughput efficiency of the entire database system is much better than that of traditional MySQL deployment.

Aurora is 100% compatible with MySQL, has good business compatibility, and can be used without modification. In some Internet scenarios, if consistency is not required, the database can be read horizontally. No matter whether it is Aurora or PolarDB, there is an upper limit on read performance.

You can see the shortcomings of Aurora. In essence, this is a stand-alone database, because all the data are stored together. The computing layer of Aurora is actually a MySQL instance, and you don't care about the distribution of the data below. If there is a large amount of writing or there is a need for large cross-shard queries, if you want to support a large amount of data, you still need to split databases and tables, so Aurora is a better stand-alone database on the cloud.

The fourth generation system: distributed HTAP database

The fourth generation system is the new form of HTAP database, the English name is Hybrid Transactional and Analytical Processing, through the name is also easy to understand, not only can do transactions, but also can do real-time analysis in the same system. The advantage of HTAP database is that it can scale infinitely like NoSQL and support queries and transactions of SQL like NewSQL. More importantly, in complex scenarios such as mixed business, OLAP does not affect OLTP business and saves the trouble of moving data back and forth in the same system. At present, I can see that in the industrial sector, only TiDB 4.0 plus TiFlash can meet the above requirements.

Distributed HTAP database: TiDB (with TiFlash)

Why can TiDB achieve complete isolation between OLAP and OLTP without affecting each other? Because TiDB is a separate architecture of computing and storage, the underlying storage is a multi-copy mechanism, and some of these replicas can be converted into replicas of column storage. The request of OLAP can be typed directly to the copy of the formulation, that is, the copy of TiFlash to provide high-performance analysis service, so that the same data can do both real-time transaction and real-time analysis. This is a great innovation and breakthrough in the architecture level of TiDB.

The following figure shows the test results of TiDB. Compared with MemSQL, a workload is constructed according to the user scenario. The horizontal axis is the number of concurrency, the vertical axis is the performance of OLTP, and blue, yellow and green are the number of concurrency of OLAP. The purpose of this experiment is to run both OLTP and OLAP on a system, while constantly increasing the concurrent pressure of OLTP and OLAP, so as to see if the two kinds of workload will affect each other. You can see that on the TiDB side, increasing the concurrent pressure on OLTP and OLAP at the same time, the performance of these two kinds of workload has no significant change, almost the same. However, the same experiment took place on MemSQL, and we can see that the performance of MemSQL declines greatly, and with the increase of the number of concurrency of OLAP, the performance of OLTP decreases obviously.

Next is an example of TiDB in a user's actual business scenario. When querying OLAP services, OLTP services can still achieve smooth write operations, and the latency has been maintained at a low level.

Where is the future, Snowflake?

Snowflake is a 100% data warehouse system built on the cloud. The underlying storage depends on S3. Almost every public cloud provides object storage services like S3. Snowflake is also a pure architecture of separation of computing and storage. The computing node defined in the system is called Virtual Warehouse, which can be considered as an EC2 unit. The local cache has a log disk, and the main data of Snowflake is stored on S3. The local computing node is on the virtual machine of the public cloud.

This is the characteristic of the data format stored by Snowflake in S3. The object of each S3 is 10 megabytes of a file, which is only appended. Each file contains source information and falls to disk through column storage.

One of the most important highlights of the Snowflake system is that different computing resources can be allocated to the same piece of data. For example, one query may only need two machines, and another query may need more computing resources, but it doesn't matter. In fact, these data are all on S3. To put it simply, two machines can mount the same disk to handle different workloads. This is an important example of computing decoupling from storage.

Google BigQuery

The second system is that BigQuery,BigQuery is a big data analysis service provided on Google Cloud, which is somewhat similar to Snowflake in architecture design. BigQuery's data is stored on Google's internal distributed file system Colossus, Jupiter is an internal high-performance network, and this is Google's computing node.

The processing performance of BigQuery is excellent. One two-way bandwidth in the data center can reach 1 PB per second. If 2000 dedicated computing node units are used, the cost is about US $40, 000 a month. BigQuery is a pay-on-demand model. A query may use only two slot and charge for these two slot. The storage cost of BigQuery is relatively low, about $20 a month for one TB of storage.

RockSet

The third system is RockSet, we all know that RocksDB is a well-known stand-alone KV database, the data structure of its storage engine is called LSM-Tree,LSM-Tree core idea for hierarchical design, colder data will be in the lower layer. RockSet puts the latter layer on top of S3 storage, and the upper layer actually uses local disk or local memory as the engine. It is naturally a hierarchical structure, and your application is not aware of whether it is a cloud disk or a local disk. Through a good local cache, you are not aware of the existence of cloud storage below.

So after looking at these three systems just now, I think there are several characteristics: first, they are all naturally distributed, the second is built on standard cloud services, especially S3 and EBS, and the third is pay as you go, which makes full use of the resilience of the cloud in the architecture. I think the most important of these three points is storage, the storage system determines the design direction of the database on the cloud.

Why is S3 the key?

In the storage, I think S3 may be more critical. In fact, we have also studied EBS, the first phase of TiDB is already merging with EBS block storage, but in the longer term, I think the more interesting direction is on the S3 side.

First of all, S3 is very cost-effective, the price is much lower than EBS, the second S3 provides nine 9s with high reliability, the third is linear expansion throughput, and the fourth is natural cross-cloud, each cloud has S3 API object storage service. But the problem with S3 is that the latency of random writes is very high, but the throughput performance is good, so we need to take advantage of this good throughput performance to avoid the risk of high latency. This is a test of the S3 benchmark, and you can see that with the improvement of the model, the throughput capacity continues to improve.

How to solve the problem of Latency?

If you want to solve the Latency problem of S3, here are some ideas, such as using SSD or local disk to do cache like RockSet, or writing logs through kinesis to reduce the overall write latency.

In fact, data warehouses have higher requirements for throughput and do not care so much about latency. A query may run five seconds to get a result, and there is no need to give a result within five milliseconds. Especially for some Point Lookup scenarios, the database of Shared Nothing may only need one rpc from the client, but for the architecture of computing and storage separation, you have to go to the network twice in any case, which is a core problem.

You might say it doesn't matter, anyway, computing and storage have been separated, and miracles can be done by adding computing nodes. But I don't think the new idea needs to be so extreme. Aurora is a computing storage separation architecture, but it is a stand-alone database, Spanner is a pure distributed database, and the pure Shared Nothing architecture does not take advantage of some of the advantages provided by the cloud infrastructure.

For example, in the future, our database can be designed like this, with a little bit of status in the computing layer, because each EC2 will have a local disk, and now the mainstream EC2 is SSD. Hot data can be Shared Nothing at this layer, high availability at this layer, and random reading and writing at this layer. Once the cache miss, hot data will fall on S3, you can only do the next few layers of data storage in S3, this may cause problems, once penetrated the local cache,Latency will have some jitter.

The benefits of this architecture design: first of all, have an affinity for real-time business data computing, there will be a lot of data on local disk, at this point, some performance optimization techniques of many traditional databases can be used Second, data migration will actually become very simple. In fact, the storage below is shared, all on S3. For example, data migration from A machine to B machine does not actually need to be migrated, as long as the data is read on B machine.

The disadvantages of this architecture are: first, after the cache is penetrated, the Latency will become higher; second, the computing node now has a state, and if the computing node dies, Failover has to deal with log playback, which may increase the complexity of the implementation.

This is the end of the article on "what is the method of cloud native database design". Thank you for reading! I believe you all have a certain understanding of "what is the method of cloud native database design". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.