How to analyze the current situation and cutting-edge technology of cloud database 07/13 Update SLTechnology News&Howtos

How to analyze the current situation and cutting-edge technology of cloud database

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about how to analyze the current situation and cutting-edge technology of cloud database. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Everything will run in the cloud. Now more and more businesses are moving from self-maintenance infrastructure to public (or private) clouds, which greatly reduces the operation and maintenance costs of the IaaS layer. At the database level, it used to require a strong DBA background to handle the advanced actions of flexible expansion and high availability, but now most cloud services provide similar services more or less.

Today's sharing focuses on the architecture behind cloud service providers' cloud database solutions, as well as some recent developments in relevant technologies that I have observed in the industry that are meaningful to cloud databases.

Amazon RDS

In fact, when it comes to cloud databases on public clouds, the earliest RDS of Amazon should be released in 2009. The architecture of Amazon RDS is similar to building an intermediate layer on the underlying database. (architecturally, other cloud RDS services such as Aliyun RDS,UCloud RDS are basically the same, competing for functional diversity and implementation details.) This middle tier is responsible for routing client SQL requests to the actual database storage node. Because business requests are proxied through the middle tier, a lot of operation and maintenance work can be done on the underlying database instances, such as backup, migration to physical machines with larger disks or more free IO, and so on. Because these jobs are hidden behind the middle tier, the business layer can be basically unaware of them. In addition, this intermediate routing layer basically simply forwards requests, so the underlying layer can connect to various types of databases.

So generally speaking, RDS basically supports popular databases such as MySQL / SQLServer / MariaDB / PostgreSQL, and there is basically no loss of compatibility, and when this Proxy layer is well designed, the loss of performance is relatively small. In addition, there is a middle layer that isolates the underlying resource pool, which can do a lot of things for resource utilization and scheduling, such as a simple example. For example, some less active RDS instances can be scheduled to share physical machines together. For example, if you need online expansion, you only need to build the copy on a machine with a larger disk, and redirect the request at the Proxy layer. For example, regular data backups can be placed on S3, all of which can be transparent to users.

However, the disadvantage of this architecture is also obvious: it is essentially a stand-alone master-slave architecture, and there is nothing IO can do about exceeding the capacity of * * configured physical machines and CPU load. With the growth of data concurrency of many businesses, especially the development of mobile Internet, the scalability of * * has become a very important requirement. Of course, for most libraries with less data requirements and no high concurrent access to a single instance, RDS is still very suitable.

Amazon DynamoDB

For the horizontal scaling problem just mentioned, some users are so painful that they can even accept abandoning the relational model and SQL. For example, some Internet application business models are relatively simple, but the amount of concurrency and data is huge. To deal with this situation, Amazon developed DynamoDB and released DynamoDB's cloud service at the beginning of 2012. In fact, Dynamo's paper was published in SOSP as early as 2007. This historic paper directly detonated the NoSQL movement. Let you think that the original database can still do this, about the DynamoDB model and some technical details, I mentioned in my other article "the status quo of open source database", I will not repeat it here.

Dynamo is characterized by horizontal scalability and high availability through multiple replicas (3 replicas). In addition, the design of API can support final consistent read and strong consistent read, which can improve read throughput. But please note that although DynamoDB has strong consistent read, the strong consistency here is not the C of ACID as we say in the database, and because there is no concept of timing (only vector clock), the handling of conflicts can only be left to the client, Dynamo does not support transactions. However, for some specific business scenarios, scalability and availability are the most important, not only the capacity, but also the throughput of the cluster.

Aliyun DRDS

However, the amount of data of those RDS users also continues to grow. For cloud service providers, they cannot watch these RDS users walk away as soon as they have a large amount of data or maintain the database cluster themselves, because not everyone can completely restructure the code on top of NoSQL, and sub-libraries and tables are actually a very painful thing for business developers, and there are often business opportunities in the pain.

For example, for the extension scheme of RDS, I will introduce two typical ones. * is Aliyun's DRDS (but now it seems to have been removed from Aliyun's product list?). The idea of DRDS is actually very simple, that is, it is a small step more than RDS, adding user-configured routing policies to the middle layer of RDS just mentioned. For example, users can specify certain columns of a table as sharding key to route to specific instances according to certain rules. The policy of sub-libraries can also be configured vertically.

In fact, the predecessor of DRDS is Taobao's TDDL, but originally TDDL was done in the JDBC layer, but now TDDL has been done into the Proxy layer (a bit like the feeling of stuffing TDDL into Cobar). The advantage is that the work of dividing the application layer into libraries and tables is encapsulated, but it is still a middleware solution in essence, although it can do a certain degree of SQL compatibility for simple business.

For some complex queries, multi-dimensional query, cross-Shard transaction support is limited, after all, the middle routing layer of the understanding of SQL is limited, as for the replacement of Sharding key, DDL, backup is also a very troublesome thing, from the Youtube open source middleware Vitess implementation and complexity point of view is not even simpler than the implementation of a database, but compatibility is not as good as rewriting a database.

Amazon Aurora

Later, when the time came to 2015, Amazon took a different path. In 2015, Amazon Aurora released that there is not much information about Aurora on the public network. Aurora provides read throughput capacity of 5x compared to stand-alone MySQL 5.6.But * only extends to 15 replicas. The more copies, the greater the impact on write throughput, because only one Primary Instance can provide write services, and a single copy supports 64T capacity, and supports high availability and flexible expansion.

It is worth mentioning the compatibility of Aurora, in fact, database operators all know that compatibility is a very difficult problem to solve. It is possible that a very small difference in implementation will make the migration cost of users become very large, which is why the solution of middleware and sub-database and sub-table is so anti-human. Most of us are pursuing a smooth migration experience for users.

Aurora takes a different approach, because there is not much public information, I guess Aurora has implemented an InnoDB-based distributed shared storage layer (https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/)) under the front end of MySQL, which is a good horizontal extension for reading instances, so that the workload is distributed among the various MySQL instances on the front end, somewhat similar to the Share everything architecture of Oracle RAC.

The advantage of this architecture is obvious compared with the middleware solution, and it is more compatible, because it still reuses MySQL's SQL parser, optimizer, and it doesn't matter if there are complex queries in the business layer, because what is connected is MySQL. But it is also for this reason, in the case of more nodes and larger amount of data, the query can not take advantage of the computing power of the cluster (for many complex queries, the bottleneck appears on the CPU), and the SQL optimizer capability of MySQL has always been the weakness of MySQL, and the design of the SQL engine for large data queries is very different from that of a single machine, a simple example The design of distributed Query Engine such as SparkSQL / Presto / Impala must be completely different from the stand-alone SQL optimizer, more like a distributed computing framework.

So I think Aurora is a scheme that optimizes the read performance of simple queries when the amount of data is not too large (there is a capacity limit), and the compatibility is much better than that of middleware. However, the disadvantage is that for a large amount of data, the support for complex queries is still relatively weak, and there is not much optimization for write performance Aurora (single point of writing). If there is a bottleneck in writing, horizontal or vertical split is still needed at the business layer.

Google Cloud BigTableGoogle

As the ancestor of big data, we really missed wave after wave for cloud. Virtualization missed wave after wave and let VMWare and Docker take the lead (Google started the container plan as early as a decade ago, you should know that the patch of cgroups that the container depends on is submitted by Google), cloud service missed a wave and let Amazon take the lead (Google App Engine is really a pity), big data storage missed a wave and let open source Hadoop win the de facto standard. So much so that I think the decision to be compatible with Hadoop HBase API in Google Cloud BigTable services should have been bloody in the minds of the engineers who implemented these Hadoop API for BigTable at that time:)

However, after being stimulated by Amazon / Docker / Hadoop, Google finally realized the power of community and cloud, and began to export all kinds of powerful infrastructure within Google to Google Cloud. It was officially unveiled on Google Cloud Platform in 2015. I believe that most distributed storage system engineers understand the architecture of BigTable. After all, BigTable's paper is also a must-read classic like Amazon Dynamo, so I won't repeat it.

The API of BigTable cloud service is compatible with HBase, so it is also {Key: two-dimensional table structure}. Since it is still a master-slave structure at the Tablet Server level, reading and writing to a Tablet can only be done through Tablet Master by default, which makes BigTable a strongly consistent system. Here, strong consistency refers to a write to a single Key. If the server returns successfully, the next read can be a value of *.

Since BigTable still does not support ACID transactions, the strong consistency here is only for single Key operations. In fact, BigTable has no restrictions on horizontal scalability. It is called Incredible scalability in the document, but BigTable does not provide high availability across data centers (Zone) and cross-Zone access. That is to say, a BigTable cluster can only be deployed within one data center. In fact, we can see the positioning of BigTable within Google, which is a distributed storage service with high performance and low latency. If you need to do cross-Zone high availability, you need to do your own replication at the business layer to synchronize between the two Zone and build a mirrored BigTable cluster.

In fact, many Google businesses did this before MegaStore and Spanner came out. For BigTable, it is impossible for BigTable to achieve high availability, strong consistency and low latency across data centers, and it is not in line with the positioning of BigTable. Another complaint is that the BigTable team sent a Blog (https://cloudplatform.googleblog.com/2015/05/introducing-Google-Cloud-Bigtable.html)

It blackens the delay of HBase, and the response delay of .99 is 6 ms, HBase 280ms. In fact, the gap between the average response delays will not be that big. Since BigTable was written by C++, the advantage is that the delay is quite stable. But as far as I know, the HBase community is also doing a lot of work to minimize the impact of GC. For example, after optimizations such as off-heap are done, HBase's delay performance will be better.

Google Cloud Datastore

In 2011, Google published a paper by Megastore, which * described a distributed storage system that supports high availability across data centers + can scale horizontally + supports ACID transaction semantics. Google Megastore is built on BigTable, different data centers are synchronized through Paxos, and data is sliced according to Entity Group. Entity Group itself uses Paxos replication across data centers, and cross-Entity Group ACID transactions need to be committed in two phases, realizing Timestamp-based MVCC.

However, it is precisely because the allocation of Timstamp needs to go through Paxos, and the 2PC communication between different Entity Groups needs to communicate asynchronously through a queue, so the actual Megastore 2PC delay is relatively large. The paper also mentioned that the average response delay of most write requests is about 100~400ms. According to friends within Google, Megastore is slow to use, and second delay is common.

As there should be * distributed databases within Google that support ACID transactions and SQL, there are still a large number of applications running on Megastore, mainly using SQL and transactions to write programs can be much easier. Why do you talk so much about Megastore? Because the back end of Google Cloud Datastore is Megastore...

In fact, Cloud Datastore was already launched in Google App Engine in 2011, that is, the High Replication Datastore of Data Engine, now changed its name to Cloud Datastore, and it was disrespectful not to know that the famous Megastore was behind it at that time. Although the function looks very powerful, it not only supports high availability, ACID, but also supports SQL (only a simplified version of GQL of Google), but from the principle of Megastore, the latency is very large. In addition, the interface provided by Cloud Datastore is a set of SDK similar to ORM, which is still intrusive to the business.

Although Google Spanner is slow in Megastore, it is not easy to use. It is mentioned in the Spanner paper that there are probably 300 + businesses running on Megastore in 2012. After more and more businesses built the wheels of ACID Transaction implementation on BigTable, Google really couldn't stand it and began to build a big wheel Spanner. The project is ambitious. Like Megastore, ACID transactions + horizontal scaling + SQL support, but unlike Megastore, Spanner did not choose to build a transaction layer on top of BigTable. Instead, it starts building Paxos-replicated tablet directly on top of Google's second-generation distributed file system, Colossus.

In addition, unlike the Megastore implementation transaction, the timestamp of the transaction is determined by each coordinator through the Paxos, but the hardware, that is, the TrueTime API composed of the GPS clock and the atomic clock, is introduced to implement the transaction, so that the transactions initiated by different data centers do not need to coordinate timestamps across the data center, but are allocated directly through the TrueTime API of the local data center, so the latency is greatly reduced.

Spanner is almost * * distributed storage, which is also complementary to BigTable within Google. If you want to do high availability and strong consistency and transactions across data centers, use Spanner at the cost of sacrificing a little latency, but not as much as Megastore; if you want high performance (low latency), use BigTable.

Google Spanner

There are currently no services available in Google Cloud Platform, but it is certain to look at the trend, at least as the next generation of Cloud Datastore. On the other hand, Google still has no way to open source Spanner because, like BigTable, the underlying layer relies on Colossus and a bunch of internal components of Google, and what is more difficult than BigTable is that TrueTime is a set of hardware.

Therefore, after the release of Spanner's paper at the end of December, the community also has open source implementations, such as the relatively mature TiDB and CockroachDB, which will be introduced when referring to the community's cloud database implementation. Spanner has a slightly richer interface than BigTable and supports the table structure it calls Semi-relational. You can DDL like a relational database, and although you still have to specify the primary key for each row, it is still much better than a simple kv.

Google F1

At the beginning of the Spanner project, Google launched another project F1, a distributed SQL engine used with Spanner. At the bottom, there is such a strong, consistent and high-performance Spanner, so you can try to connect OLTP with part of the OLAP at the upper level. F1 is actually a database, but it does not store data, it is all on Spanner, it is just a distributed query engine, and the bottom layer relies on the transaction interface provided by Spanner. Translate the user's SQL request into a distributed execution plan.

Google F1 provides a possibility, which has not been realized in other databases, the possibility of the integration of OLTP and OLAP, because the design goal of Google F1 is to be used by Google's advertising system, which requires high consistency and high pressure, which is a typical OLTP scenario. Second, there may be many complex queries for evaluating the effectiveness of advertising, and the more real-time such queries are, the better, which is a bit of a real-time OLAP.

The traditional practice is that the OLTP database synchronizes a copy of the data to the data warehouse at regular intervals, calculates offline in the data warehouse, and slightly better uses some streaming computing frameworks for real-time calculation. * the real-time performance of the scheme using the data warehouse is relatively poor, and it is very troublesome to dump the data. As for the solution of using streaming computing framework, on the one hand, it is not flexible, a lot of query logic needs to be written in advance, and can not do a lot of Ad-hoc things, in addition, because both sides are heterogeneous storage, ETL is also a very troublesome work.

In fact, F1 implements 100% OLTP by relying on Spanner's ACID transactions and MVCC features, and as a distributed SQL engine, it can use the computing resources of the cluster to implement distributed OLAP queries. The advantage of this is that there is no need to set up an additional data warehouse for data analysis, but directly in the same database for real-time analysis. In addition, because of the MVCC of Spanner and the characteristics of Lock-free snapshot read brought by multiple copies, this kind of OLAP query will not affect the normal OLTP operation.

For OLTP, bottlenecks often appear on IO, but for OLAP, bottlenecks often appear on CPU, that is, computing. In fact, it seems that they can be integrated to improve the resource utilization of the entire cluster. This is why I am optimistic about the combination of Google F1 and Spanner. Future databases may integrate data warehouses to provide a more complete and real-time experience.

(in fact, the following GFS is not very accurate, it should be Colossus now)

Open source cloud-native database

In 2016, a new word suddenly became popular in Silicon Valley, GIFEE,Google Infrastructure For Everyone Else. People realized that with the prosperity and development of the new generation of open source basic software, there were already many high-quality open source implementations of the infrastructure within Google, such as Docker in containers, second-generation Kubernetes in Google's active open source Borg in schedulers, traditional BigTable and GFS communities, and Hadoop that made do with shit. And many big factories think that Hadoop shit basically built their own similar wheels. Not to mention the recent Google open source addiction, not to mention Kubernetes, from the hot Tensorflow to the relatively unpopular but I personally think significant Apache Beam (the basis of Google Cloud Dataflow), basically independent open source are actively embracing the community.

As a result, the gap between the community and Google is narrowing, but for now, everything else is easy to say, but Spanner and F1 are not so easy to build. Even putting aside the hardware of TrueTime, it is not easy to achieve a stable Multi-Paxos. In addition, there is a high technical threshold for distributed SQL optimizer. The complexity of testing is no less than that of implementation (see several articles on PingCAP's philosophy of distributed testing).

At present, from a global point of view, I think there are only two teams in the open source world: one TiDB of PingCAP and one CockroachDB of CockroachLabs have sufficient technical ability and vision to build the open source implementation of Spanner. At present, TiDB is already RC1 and has many users in the production environment, which is slightly more mature than CockroachDB, and the architecture is closer to the orthodox F1 above Spanner architecture. The maturity of CockroachDB lags behind slightly, and the protocol chooses PostgreSQL. TiDB chose MySQL's protocol compatibility.

And from the TiDB sub-project TiKV, we can see the embryonic form of the new generation of distributed KV. RocksDB + Multi-Raft does not rely on the third-party distributed file system (DFS) to provide horizontal scalability, and is becoming a new generation of distributed KV storage standard architecture. In addition, I am glad to see that this level of open source project initiated and maintained by a domestic team is designed and implemented even in Silicon Valley. Judging from the activity of Github, the tools used and the process of operating the community, it is difficult to see that it is a domestic team.

Kubernetes + Operator

I just mentioned a word "Cloud-Native". In fact, this word has not been accurately defined, but my understanding is that application developers are isolated from physical facilities, that is, the business layer no longer needs to care about storage capacity performance, and everything can scale transparently and horizontally. Clusters are highly automated and even support self-repair. For a large-scale distributed storage system, it is very difficult for manual intervention, such as a distributed system with thousands of nodes, almost every day there may be a variety of node failures, instantaneous network jitter or even the whole data center directly hang up, manual data migration, data recovery is almost impossible.

Many people are very optimistic about Docker and think that it has changed the way the operation and maintenance software is deployed, but I think what is more meaningful is Kubernetes. The scheduler is the core of the Cloud-native architecture, and the container is just a carrier. It is not important. Kubernetes is equivalent to a distributed operating system, and the physical layer is the entire data center, that is, DCOS, which is why we bet heavily on Kubernetes. I don't think it is possible for large-scale distributed databases to be separated from DCOS in the future.

However, stateful service choreography on Kubernetes is a headache. And the characteristics of the general distributed system, not only does each node have stored data, but it also needs to expand and reduce capacity according to the needs of users, when the program is updated, it should be able to achieve continuous service rolling upgrade, when the data load is not balanced, the system should do Rebalance, at the same time, in order to ensure high availability, the data of each node will have multiple copies, when a single node encounters failure. You also need to automatically restore the total number of copies. All of these are very challenging for orchestrating a distributed system on Kubernetes.

Kubernetes introduced Petset in version 1.3 and has now changed its name to StatefulSet. The core idea is to give identity to Pod and to establish and maintain the link between Pod and storage. When the Pod may be scheduled, the corresponding Persistent Volume can be bound with him. But it does not completely solve our problem, PS still needs to rely on Persistent Volume, the current Kubernetes Persistent Volume only provides based on shared storage, distributed file system or NFS implementation, does not provide Local Storage support, and Petset itself is still in the Alpha version stage, we are still waiting and see.

However, there are still others trying besides the official Kubernetes community, and we are pleased to see that not long ago, CoreOS proposed a new method and idea to extend Kubernetes. CoreOS adds a new member to Kubernetes called Operator. Operator is actually an extension of Controller. I won't talk about the specific implementation because of the space. To put it simply, it is a scheme for Kubernetes to schedule stateful storage services. CoreOS officially provides an operator implementation of Etcd-cluster backup and rolling upgrade.

The above is what the editor shares with you on how to analyze the current situation and cutting-edge technologies of cloud databases. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.