How to build a distributed database 07/06 Update SLTechnology News&Howtos

How to build a distributed database

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

How to build a distributed database, in view of this problem, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

Relational database has occupied an absolutely dominant position in the database field in the past few decades, and its stability, security and ease of use have become the cornerstone of building a modern system. With the rapid development of the Internet, the database based on stand-alone system has been unable to meet the higher and higher concurrent requests and increasing data storage requirements. Therefore, distributed database has been more and more widely used.

The database field has always been dominated by Western technology companies and communities. Nowadays, more and more domestic database solutions take distribution as the fulcrum, and gradually make achievements in this field. Apache ShardingSphere is one of the distributed database solutions, and it is also the only database middleware in the Apache Software Foundation.

1 background

Fully compatible with SQL and transactions for traditional relational databases, and naturally friendly to distribution, is the design goal of distributed database solutions. Its core functions mainly focus on the following points:

Distributed storage: data storage is not limited by stand-alone disk capacity, and storage capacity can be improved by increasing the number of data servers

Computing storage separation: computing nodes are stateless and can increase computing power through horizontal expansion. Storage nodes and compute nodes can be optimized in layers

Distributed transaction: a high-performance distributed transaction processing engine that fully supports the original meaning of local transaction ACID

Auto scaling: can dynamically expand and reduce the capacity of data storage nodes anytime and anywhere without affecting existing applications

Multiple data replicas: automatically replicate data to multiple copies across computer rooms in a strongly consistent and high-performance manner to ensure the absolute security of the data

HTAP: use the same set of products to handle transactional operations of OLTP and analytical operations of OLAP.

The implementation scheme of distributed database can be divided into aggressive type and stable type. Aggressive implementation scheme refers to the development of a new architecture of NewSQL. Such products pursue higher performance in exchange for the lack of stability and the lack of operation and maintenance experience; the stable implementation scheme refers to the middleware that provides incremental capabilities on the basis of the existing database. This kind of products sacrifice part of their performance to ensure the stability of the database and the reuse of operation and maintenance experience.

2 Architecture

Apache ShardingSphere is an ecological circle of open source distributed database middleware solutions, which consists of three independent products, Sharding-JDBC, Sharding-Proxy and Sharding-Sidecar (planned). They all provide standardized data slicing, distributed transactions, distributed governance and other functions, and can be applied to a variety of application scenarios, such as Java isomorphism, heterogeneous languages, cloud native and so on. With the continuous exploration of query optimizer and distributed transaction engine, Apache ShardingSphere has gradually broken the product boundary of the implementation solution and evolved to a platform-level solution that integrates enterprising and stable solutions.

Sharding-JDBC

Positioned as a lightweight Java framework, additional services provided at the JDBC layer of Java. It uses a client-side directly connected database and provides services in the form of jar packages without additional deployment and dependency. It can be understood as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks.

Sharding-Proxy

It is positioned as a transparent database agent, and provides a server-side version that encapsulates the database binary protocol, which is used to support heterogeneous languages. At present, MySQL and PostgreSQL versions are available, which can use any access client compatible with MySQL and PostgreSQL protocols (such as MySQL Command Client, MySQL Workbench, Navicat, etc.) to manipulate data, which is more friendly to DBA.

Sharding-Sidecar (in Planning)

Locate as the cloud native database agent of Kubernetes, and proxy all access to the database in the form of Sidecar. Through the scheme of no center and zero invasion, the meshing layer that interacts with the database is provided, namely Database Mesh, which is also called database grid.

A hybrid architecture of computing and storage separation

Sharding-JDBC uses a centerless architecture, which is suitable for high-performance lightweight OLTP applications developed by Java; Sharding-Proxy provides static entry and heterogeneous language support, which is suitable for OLAP applications and scenarios where sharding databases are managed and operated.

Each architecture scheme has its own advantages and disadvantages. The following table compares the advantages and disadvantages of various architecture models in different scenarios:

Apache ShardingSphere is an ecosphere composed of multiple access terminals. Through the mixed use of Sharding-JDBC and Sharding-Proxy, and the unified configuration of sharding strategy in the same configuration center, the application system suitable for various scenarios can be built flexibly, which makes the architect more free to adjust the best system architecture suitable for the current business.

Apache ShardingSphere adopts Share Nothing architecture, and its JDBC and Proxy access terminals are stateless. As a computing node, Apache ShardingSphere is responsible for the final calculation and summary of the obtained data. Because the data is not stored itself, Apache ShardingSphere can push the calculation down to the data node to make full use of the computing power of the database itself. Apache ShardingSphere can increase the computing power by increasing the number of deployed nodes and the storage capacity by increasing the number of database nodes.

3 Core function

Data fragmentation, distributed transaction, auto-scaling and distributed governance are the four core functions of Apache ShardingSphere at present.

Data fragmentation

Divide-and-conquer is the solution that Apache ShardingSphere uses to deal with huge amounts of data. Through the data slicing scheme, Apache ShardingSphere makes the database have the ability of distributed storage.

It can automatically route SQL to corresponding data nodes according to the slicing algorithm preset by users, so as to achieve the purpose of operating multiple databases. Users can use multiple databases managed by Apache ShardingSphere in the same way as stand-alone databases. Currently supports MySQL, PostgreSQL, Oracle, SQLServer and any database that supports SQL92 standards and JDBC standard protocols. The kernel flow of data sharding is shown in the following figure:

The main process is as follows:

Get the SQL and parameters entered by the user by parsing the database protocol package or JDBC driver

Parse SQL into AST (abstract syntax tree) according to lexical analyzer and parser, and extract the information needed for slicing.

Match the sharding key according to the user preset algorithm and calculate the routing path

Rewrite SQL to distributed executable SQL

SQL is sent to each data node in parallel, and the execution engine is responsible for balancing the connection pool and memory resources.

Perform streaming or full memory result set merge calculations based on AST

Encapsulate the database protocol package or JDBC result set and return it to the client.

Distributed transaction

Transaction is the core function of database system. Distributed uncertainty and transaction complexity determine that there can not be a universal solution in the field of distributed transactions.

In the face of the current situation of letting a hundred flowers blossom, Apache ShardingSphere provides a highly open solution, using standard interfaces to uniformly integrate the third-party distributed transaction framework independently chosen by developers to meet the application needs of various scenarios. In addition, Apache ShardingSphere also provides a new distributed transaction solution JDTX to make up for the lack of existing solutions.

Standardized integrated interface

Apache ShardingSphere provides a unified adaptation interface for local transactions, two-phase transactions and flexible transactions, and connects a large number of existing third-party mature solutions. Through the standard interface, developers can easily integrate other integration solutions into the Apache ShardingSphere platform.

However, the integration of a large number of third-party solutions cannot cover all branches of distributed transaction requirements. All kinds of solutions have their own suitable and unsuitable scenarios. Solutions are mutually exclusive and their advantages cannot be superimposed. For the most common 2PC (two-phase commit) and flexible transactions, there are the following advantages and disadvantages, respectively:

Two-phase commit: the two-phase distributed transaction implemented based on XA protocol has little intrusion into the business. Its biggest advantage is that it is transparent to users, and developers can use distributed transactions based on XA protocol like local transactions. XA protocol can strictly guarantee the transaction ACID feature, but it is also a double-edged sword. Transaction execution needs to lock all the required resources in the process, which is more suitable for short transactions whose execution time is determined. For long transactions, the monopoly of resources during the whole transaction will lead to a significant decline in the concurrent performance of business systems that rely on hot data. Therefore, in a performance-first scenario with high concurrency, distributed transactions based on two-phase commit based on XA protocol are not the best choice.

Flexible transaction: if the transaction that implements the transaction elements of ACID is called rigid transaction, then the transaction based on BASE transaction elements is called flexible transaction. BASE is an abbreviation for basic availability, flexible state, and final consistency. In ACID transactions, there are high requirements for consistency and isolation, and all resources must be consumed during the execution of the transaction. The idea of flexible transaction is to move the mutex operation from the resource level to the business level through business logic. By relaxing the requirements for strong consistency and isolation, it is only required that the data be consistent when the entire transaction finally ends. During the execution of the transaction, the data obtained by any read operation may be changed. This weakly consistent design can be used in exchange for an improvement in system throughput.

ACID-based two-phase transactions and BASE-based final consistency transactions are not silver bullets, and the differences between them can be compared in detail in the following table.

The two-phase transaction without concurrency guarantee can not be called a perfect distributed transaction solution, while the flexible transaction without support for the original meaning of ACID can not even be called database transaction, it is more suitable for the transaction processing of service layer.

Looking at the current situation, it is difficult to find a distributed transaction solution that is universally applicable without tradeoffs.

A new generation of distributed transaction middleware JDTX

JDTX is a distributed transaction middleware developed by JD.com, which has not been opened yet. Its design goal is a fully distributed transaction middleware with strong consistency (supporting the transaction original meaning of ACID), high performance (no lower than local transaction performance) and 1PC (completely abandoning two-phase commit and two-phase lock). It can be used in relational database at present. It adopts the design way of completely open SPI, provides the possibility of docking with NoSQL, and can maintain multivariate and heterogeneous data in the same transaction in the future.

Through a completely self-developed transaction processing engine, JDTX converts the transaction data of SQL operations into KV (key-value pairs), and implements the transaction visibility engine of MVCC (multi-version snapshot) and the WAL (pre-written log system) storage engine similar to the database design concept. You can use the following architecture diagram to understand the composition of JDTX:

Its design feature is to separate the data in the transaction (called active data) from the data that is not in the transaction (called falling data). After the active data is dropped to WAL, it is saved to MVCC memory engine in the form of KV. Off-disk data is to synchronize the REDO logs in WAL to the final storage medium (such as relational database) in a controllable way by means of asynchronous flushing. The transaction memory query engine is responsible for retrieving the relevant data from the active data in KV format using SQL, merging with the falling data, and obtaining the data version that is visible to the current transaction according to the transaction isolation level.

JDTX reinterprets the transaction model of the database with a new architecture, and the main highlights are:

1. Internalize distributed transactions into local transactions

JDTX's MVCC engine is a centralized cache. It can internalize two-phase commit to one-phase commit to maintain the atomicity and consistency of data in a single node, that is, the scope of distributed transaction is reduced to that of local transaction. JDTX ensures the atomicity and consistency of the transaction data by ensuring that all access to the transaction data is through the merging of the active data of the MVCC engine and the final data on the data side.

2. Lossless transaction isolation mechanism

Transaction isolation is achieved in the way of MVCC. At present, it fully supports the submitted and repeatable reads in the four standard isolation levels, which can meet most of the needs.

3. High performance

By means of asynchronously brushing active data to the storage medium, the performance limit of data writing is greatly improved. Its performance bottleneck shifts from time-consuming database writes to time-consuming downloads to WAL and MVCC engines.

Similar to the WAL system of the database, JDTX's WAL is also appended by sequential logs, so it can be simply understood as the WAL time of JDTX = the WAL time of the database system. The MVCC engine uses the KV data structure, which takes less time to write than the database that maintains the BTree index. As a result, the upper limit of JDTX's data update performance can even be higher than that of unopened transactions.

4. High availability

Both WAL and MVCC engines can maintain high availability and horizontal scalability by means of active / standby + sharding. When the MVCC engine is completely unavailable, the data in WAL can be synchronized to the database through the recovery model to ensure the integrity of the data.

5. cross-multivariate database transactions

The design scheme of separating transaction active data from off-disk data, so that there is no restriction on the storage side of off-disk data. Because the transaction active data is stored to the back-end storage media through asynchronous disk actuators, it does not matter whether the back-end is a homogeneous database or not. The use of JDTX can ensure that distributed transactions across multiple storage sides (such as MySQL, PostgreSQL, and even MongoDB, Redis, etc.) are maintained in the same transaction semantics.

Through the distributed transaction unified adaptation interface provided by Apache ShardingSphere, JDTX can be easily integrated into the Apache ShardingSphere ecosystem like other third-party distributed transaction solutions, and data fragmentation and distributed transactions can be seamlessly combined, so that they have the ability to form a distributed database infrastructure. The Apache ShardingSphere at the front end of the product is used for SQL parsing, database protocol and data slicing; the JDTX located in the middle layer is used to deal with transaction active data through KV and MVCC; and the lowest database is only used as the final data storage end. The following figure is an architectural diagram of ShardingSphere + JDTX.

It can be said that the existence of JDTX makes Apache ShardingSphere break the position of stable database middleware and gradually develop to aggressive NewSQL while maintaining stability.

Elastic expansion

Different from stateless service applications, data nodes hold important user data that can not be lost. When the capacity of the data node is not enough to bear the fast-growing business, the expansion of the data node is inevitable. Depending on the sharding strategy preset by users, the expansion strategy will vary.

Auto scaling allows the database managed by Apache ShardingSphere to expand and scale down without stopping external service. Elastic expansion is divided into two components: elastic migration and scope expansion, which are currently in incubation.

Elastic migration

Data migration is a standard expansion and scaling scheme for user-customized sharding strategy. During the migration process, two sets of data nodes need to be prepared. While the original data node continues to provide services, the data is written to the new data node in the form of stock and increment, respectively. The whole migration process does not need to stop external services, and the old and new data nodes can be transferred smoothly. Apache ShardingSphere will also provide a workflow interface to make the migration process completely autonomous and controllable. The architecture of the elastic migration is as follows:

The specific process is as follows:

Modify the data sharding configuration through the configuration center to trigger the migration process.

After recording the location before the current migration data is enabled, start the historical migration operation and migrate all the data in batches.

Start the Binlog subscription job and migrate the incremental data after the point.

Compare the data according to the sampling rate setting.

Set the original data source read-only to ensure that the real-time data migration is complete.

Switch the application connection to the new data source.

The old data source is offline.

Depending on the amount of data, the migration time can range from a few minutes to several weeks. You can roll back or remigrate at any time during the migration process. The whole migration process is completely autonomous and controllable, reducing the risk in the migration process, and completely shielding manual operations through automation tools to avoid the huge workload caused by tedious operations.

Scope expansion

If elastic migration is called hard expansion, then range expansion is called soft expansion. The scope expansion of Apache ShardingSphere does not involve kernel modification, nor does it need to migrate data. It only needs to optimize the range slicing strategy to achieve the goal of automatic expansion (reduction). With scope expansion, users do not need to be aware of the necessary concepts in sharding strategies and sharding keys, which makes Apache ShardingSphere closer to an integrated distributed database.

Users who use scope expansion only need to provide database resource pool to Apache ShardingSphere. When the table capacity reaches the threshold, the capacity inspector looks for the next data node from the resource pool in turn, and modifies the range metadata of the sharding policy after the new data node creates a new table. When there are no new data nodes in the resource pool, Apache ShardingSphere adds new tables to the database that has already created tables in the same order. When a large number of table data is deleted, the data of the previous data node will no longer be compact, and the garbage collector will compress the table range regularly to free up more debris space. The structure of scope expansion is as follows:

Apache ShardingSphere provides a more flexible auto scaling strategy for different application scenarios. Two projects related to auto scaling, which are still in incubation, also strive to provide trial versions as soon as possible.

Distributed governance

The governance module is designed to better manage and use distributed databases.

Database governance

In line with the design concept of all distributed systems, divide and conquer is also the guideline of distributed database. With the existence of database governance ability, the management cost does not increase with the increase of the number of database instances.

Configuration dynamic

Apache ShardingSphere uses the configuration center to manage the configuration, which can be propagated to all access-side instances in a very short time after configuration modification. The configuration center adopts the open SPI mode, which can make full use of the capabilities of the configuration center itself, such as configuration multi-version changes and so on.

High availability

Apache ShardingSphere uses a registry to manage the running status of access and database instances. The registry also uses the open SPI approach of the configuration center. Some registry implementations can cover the capabilities of the configuration center, so users can superimpose the ability to use the registry and the configuration center.

Apache ShardingSphere provides the ability to disable a database instance and a circuit breaker at the access end, which is used to deal with scenarios where the database instance is unavailable and the access end is impacted by heavy traffic.

Apache ShardingSphere is currently incubating a highly available SPI that allows users to reuse high-availability solutions provided by the database itself. MySQL's MGR high availability solution is currently being docked. Apache ShardingSphere can automatically detect the election changes of MGR and quickly spread to all application examples.

Observability

A large number of databases and access-side examples make it impossible for DBA and operation and maintenance personnel to quickly perceive the current system situation. Through the implementation of the OpenTracing protocol, Apache ShardingSphere sends the monitoring data to the third-party APM system that implements the protocol; in addition, Apache ShardingSphere also provides an automatic probe for Apache SkyWalking, which allows users who use it as an observable product to directly observe the performance of Apache ShardingSphere, the call chain relationship and the overall topology diagram of the system.

Data governance

Thanks to Apache ShardingSphere's flexible handling of SQL and a high degree of compatibility with database protocols, data-related governance capabilities have been easily added to the product ecology.

Desensitization

Apache ShardingSphere allows users to automatically encrypt and store the specified data columns to the database without modifying the code, and decrypt the data when the application obtains the data to ensure the security of the data. When the data of the database is inadvertently leaked, the sensitive data information is completely encrypted, so it will not cause greater security risks.

Shadow library table

Apache ShardingSphere can automatically route the user-marked data to the shadow database (table) when the system carries out full-link pressure testing. The shadow library (meter) pressure test function can make the online pressure test normal, and users no longer need to care about the cleaning of the pressure test data. This function is also being hatched at a high speed.

4. Route planning

As you can see, Apache ShardingSphere is on the track of rapid development, and more and more functions that were not strongly related to the "sub-database sub-table" have been added to it. However, the functions of these products are not abrupt. On the contrary, they can help Apache ShardingSphere become a more diversified distributed database solution. The future lines of Apache ShardingSphere will focus on the following points.

Pluggable platform

More and more piecemeal functions need to be further combed. Apache ShardingSphere's existing architecture is not enough to fully absorb such a wide range of product features. The flexible function pluggable platform is the adjustment direction of the future architecture of Apache ShardingSphere.

The pluggable platform completely detaches the Apache ShardingSphere from both technical and functional aspects. The panorama of Apache ShardingSphere is as follows:

According to the technical architecture, Apache ShardingSphere is horizontally divided into four layers, namely, access layer, SQL parsing layer, kernel processing layer and storage access layer, and integrates functions into the four-tier architecture in a pluggable form.

Apache ShardingSphere's support for database types will be completely open. In addition to relational databases, NoSQL will also be fully open to support. Database dialects will not affect each other and will be completely decoupled. In terms of functions, Apache ShardingSphere uses a superimposed architecture model, so that various functions can be flexibly combined. Each functional module only needs to pay attention to its own core functions, and the Apache ShardingSphere architecture is responsible for the superimposed combination of functions. Even without any functions, Apache ShardingSphere can be launched directly as a blank access end, providing scaffolding and SQL parsing and other infrastructure for developers' customized development. The database integrated into the Apache ShardingSphere ecology will directly obtain all the basic capabilities provided by the platform, and the functions developed on the Apache ShardingSphere platform will also be directly supported by the database types that have been connected to the platform. Database types and function types will be arranged and combined in a multiplicative way, and the combination of infrastructure and Lego will provide Apache ShardingSphere with room for imagination and improvement.

Query engine

Currently, Apache ShardingSphere only distributes the SQL to the appropriate database to manipulate the data through correct routing and rewriting. Calculate the query engine that can make full use of the database, but can not effectively support complex associated queries and subqueries. The SQL on KV query engine based on relational algebra is becoming more and more mature with the development of JDTX. By feeding its accumulated experience to the SQL query engine, Apache ShardingSphere can better support complex queries such as subqueries and cross-database related queries.

Multiple data copy

The multi-data copy capability required by distributed database is not available in Apache ShardingSphere at present. In the future, Apache ShardingSphere will provide multiple copy write capabilities based on Raft.

Database Mesh

The Sharding-Sidecar access terminal mentioned above is the third access terminal form of Apache ShardingSphere in the future, which aims to better cooperate with Kubernetes to create cloud native database.

Database Mesh focuses on how to connect distributed data access applications with databases organically. It pays more attention to interaction and effectively combs the interaction between disorganized applications and databases. Using Database Mesh, applications and databases accessing databases will eventually form a huge grid system. Applications and databases only need to be seated in the grid system, and they are all objects governed by the meshing layer.

Data Federation

As more database types are supported, Apache ShardingSphere will focus on a unified query engine for multivariate and heterogeneous database types. In addition, Apache ShardingSphere will cooperate with JDTX to integrate more diverse data storage media into the same transaction.

5 Open Source and Community

Apache ShardingSphere first opened source on the GitHub platform on January 17, 2016, and the original name of the open source project was Sharding-JDBC. On November 10, 2018, ShardingSphere changed its name and officially entered the incubator of the Apache Software Foundation.

In the four years of open source, the architecture model of Apache ShardingSphere is constantly evolving, at the same time, the functional scope of the overall product is also expanding rapidly. It has gradually evolved into a distributed database solution from the sub-library and sub-table Java development framework at the beginning of open source.

With the expansion of the ecological circle of Apache ShardingSphere, the state that a small number of developers control the project has long been broken. The current Apache ShardingSphere has nearly 100 contributors and nearly 20 core submitters who work together to build this Apache Way-compliant community. Apache ShardingSphere is a standard Apache Software Foundation open source project that is not controlled by a commercial company or a few core developers.

At present, more than 100 companies have explicitly stated that they are using Apache ShardingSphere, and readers can find the company's user wall on the official website.

As the community matures, Apache ShardingSphere grows more and more rapidly. We sincerely invite interested developers to participate in the Apache ShardingSphere community to improve the expanding ecology.

This is the answer to the question about how to build a distributed database. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.