Detailed explanation of database middleware 04/21 Update SLTechnology News&Howtos

Detailed explanation of database middleware

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction: this paper introduces the middleware in detail, mainly from the process and challenge of database splitting, the design scheme of mainstream database middleware, the core points of reading and writing separation, and the core points of database and table separation.

1. Database splitting process and its Challenge

The current database splitting process of the Internet basically follows the following order: vertical split, read-write separation, database split table (horizontal split). Each split process can solve some business problems, but it also faces some challenges.

1.1 Vertical split

For an Internet project that has just been launched, due to the small number of early active users and relatively small concurrency, enterprises generally choose to store all the data in a database for access operations. For example, for an e-commerce system, the tables of both the user module and the product module are initially located in a library.

Among them: user, useraccount table belong to user module, productcategory, product table belong to product module.

At first, the company's technical team may be small and all the data is in one database. With the development of the company's business, the technical team has also been expanded, divided into different technical groups, different groups are responsible for different business modules. For example, Group An is responsible for the user module and Group B is responsible for the product module. At this time, the database also ushered in the first split: vertical split.

The vertical split here refers to a database containing many tables, which is divided into several small databases according to the function of the table, and each library contains some tables. The following figure shows that the db_ Espope library mentioned above is split into the db_ user library and the db_ product library.

Generally speaking, vertical splitting is based on the business to split the tables in a library. With regard to vertical splitting, there is another way to split a large table that contains many fields into multiple small tables, each table containing some fields, which is rarely encountered in actual development.

Another typical application scenario for vertical splitting is SOA transformation. In the context of service, in addition to the business needs to be split, the underlying storage also needs to be isolated. Vertical split will make the response time of a single user request longer, because in the case of a single application, all business can be completed within a node, and after vertical split, RPC calls are usually required. Then, although the response time of a single request increases, the throughput of the entire service does increase greatly.

1.2 Separation of read and write

With the continuous development of business, the number of users and concurrency is increasing. At this time, if you rely on a single database instance to support all the access pressure, you are almost killing yourself. Take the product library as an example, it may contain tens of thousands of goods, and dozens of new ones are added every day, while the product library may be visited hundreds of millions or even billions of times a day. The pressure of database reading is too great for a single mysql instance to handle. At this time, most Mysql DBA will set the database to read-write separation. That is, one Master node (master library) corresponds to multiple Salve nodes (slave library). The data of the slave node can be understood as a full backup of the master node data.

The master node receives the write request from the user and writes it to the local binary file (binary log). Slave establishes a connection with Master through an I / O thread and sends binlog dump instructions. Master pushes the binlog data to slave,slave and saves the received binlog to the local relay log (relay log). Finally, slave applies the local relay log through another thread SQL thread to synchronize the data to the slave library.

There are a lot of details inside about mysql master-slave replication. For example, binlog format can be divided into statement, row and mixed,binlog synchronous methods and can be divided into asynchronous, semi-synchronous and synchronous. Replication can be based on either binlogFile+position or GTID. Typically, these are maintained by DBA, and the business RD is not aware of them.

Under the background that DBA configures mysql as a master-slave replication cluster, what the developer needs to do is: when updating the data, the application writes the data to the master master library, and the master library synchronizes the data to multiple slave slave libraries. When querying data, the application selects a slave node to read the data.

1.2.1 advantages of read-write separation

In this way, by configuring multiple slave nodes, we can effectively avoid the pressure caused by excessive traffic to a single library.

1.2.1 challenges of read-write separation

For DBA, there is a lot of cluster operation and maintenance work.

For example, cluster building, master-slave switching, slave database expansion, capacity reduction, and so on. For example, master is configured with multiple slave nodes. If one of the slave nodes dies, then the subsequent read request will be forwarded to the working slave node. In addition, if a new slave node is added, the application should also be aware that the read request can be forwarded to the new slave node.

For developers,

Basic read-write separation function: judge the sql type, if it is a read request such as select, go to the slave library, if it is a write request such as insert, update, delete, etc., go to the main library.

Master-slave data synchronization delay problem: because the data is synchronized from the master node to multiple slave nodes through the network, there must be a delay. So it is possible that we have inserted data in the master node but cannot read it from the slave node. For some business scenarios with strong consistency, it must be readable after insertion, so in this case, we need to provide a way for read requests to go to the main database, and the data on the main database must be up-to-date.

Transaction problem: if a transaction contains both a read request (such as select) and a write request (such as insert), if the read request goes to the slave database and the write request to the main database, then the local transaction is out of control and belongs to the category of distributed transaction. On the other hand, distributed transactions are very complex and inefficient. Therefore, for read-write separation, the current mainstream approach is that all sql in the transaction go to the main library, and since only one library is involved, the local transaction can be done.

Perceive the change of cluster information: if the accessed database cluster information changes, for example, the master-slave switch, the write traffic will be transferred to the new master database; for example, if the number of slave libraries is increased, the traffic needs to be able to call to the new slave database; or if a slave database has a delay or a high failure rate, the slave database should be isolated and the read traffic should be sent to the normal slave database as far as possible.

1.3 sub-database sub-table

The vertically partitioned Master/Salve mode can withstand unimaginably high concurrent access operations, but can you rest easy forever? The answer is no, once the amount of data in the business table is large, from a maintenance and performance point of view, any CRUD operation is extremely resource-consuming for the database. Even if the index is set, it still cannot conceal the fact that the database performance is degraded because of the large amount of data, so maybe Mysql DBA should do horizontal partitioning (sharding) to the database at this time. After the horizontal partition setting, the business table must be able to allocate the massive data maintained by one table to N sub-tables for storage and maintenance.

Horizontal sub-table can be divided into three categories: only sub-table, only sub-database, and sub-database sub-table. The following figure shows these three situations:

Table only:

Split the user table in the db library into two sub-tables, user_0 and user_1, which are also in the same library. Applicable scenario: if only one or a few of the tables in the library have too much data, then only these tables need to be split, and the other tables will remain the same.

Only sub-libraries:

The db library is divided into db_0 and db_1 libraries, and a new user table is created in the db_0 and db_1 libraries, and only part of the data from the original db.user table is stored in the db_ 0.usertable and db_ 1.usertable respectively.

Sub-database sub-table:

The db library is divided into two libraries: db_0 and db_1. Db_0 contains two sub-tables of user_0 and user_1, and db_1 contains two sub-tables of user_2 and user_3. The following figure shows how the data is split in the case of sub-library and sub-table: assuming that there are 4000W pieces of data in the user table of the db library, now split the db library into 2 sub-libraries db_0 and db_1,user tables into user_0, user_1, user_2, user_3 four sub-tables, each storing 1000W pieces of data.

1.3.1 benefits of sub-library and sub-table

If we say that read-write separation realizes the horizontal expansion of database reading ability, then sub-database and sub-table realize the horizontal expansion of write ability.

Horizontal expansion of storage capacity

In the case of read-write separation, the master and slave data in each cluster are basically the same. In terms of storage capacity, in the case of massive data, all data may not be stored due to disk space constraints. In the case of sub-database and sub-table, we can build multiple mysql master-slave replication clusters, each of which stores only part of the fragmented data to achieve horizontal expansion of storage capacity.

Horizontal expansion of writing ability

In the case of read-write separation, because there is only one master per cluster, all the write operation pressure is concentrated on this node, which will become the bottleneck of the whole system when the write concurrency is very high.

In the case of sub-database and sub-table, each shard belongs to a cluster with a master node, which can perform write operations to achieve horizontal expansion of write capacity. In addition, reducing the overhead of indexing and reducing the time-consuming lock operations of write operations will bring a lot of obvious benefits.

1.3.2 challenges of sub-database and sub-table

The challenges of sub-database and sub-table are mainly reflected in four aspects: basic database addition, deletion and modification function, distributed id, distributed transaction, dynamic expansion, which are described one by one below.

Challenge 1: basic database addition, deletion and modification functions

As far as developers are concerned, although they are divided into libraries and tables, they still want to be able to operate the database in the same way as a single database and table. For example, we want to insert four user records in batches, and we want to determine which table in which library this record is inserted based on the user's id field. For example, record 1 is inserted into the user1 table, record 2 is inserted into the user2 table, record 3 is inserted into the user3 table, record 4 is inserted into the user0 table, and so on. The sql is as follows:

Insert into user (id,name) values (1, "tianshouzhi"), (2, "huhuamin"), (3, "wanghanao"), (4, "luyang")

Such a sql is obviously impossible because we have split the library and table, and this sql syntax can only operate on a single library and a single table of mysql. So you must change the sql to 4 as shown below, and then go to each library to execute it.

Insert into user0 (id,name) values (4, "luyang")

Insert into user1 (id,name) values (1, "tianshouzhi")

Insert into user2 (id,name) values (2, "huhuamin")

Insert into user3 (id,name) values (3, "wanghanao")

The specific process can be described in the following figure:

The explanation is as follows:

Sql parsing: first parse the sql, and get the values of the id fields of the four records that need to be inserted are 1mem2mem3 and 4, respectively.

Sql routing: sql routing includes library routing and table routing. The library route is used to determine which library the record should be inserted into, and the table route is used to determine which table the record should insert.

Sql rewriting: because a record can only be inserted into one library, and the above syntax for bulk insertion will insert four records in each library, it is obviously not appropriate, so you need to rewrite sql, and only one record is inserted in each library.

Sql execution: a sql has been rewritten into multiple sql. In order to improve efficiency, it should be executed concurrently to different libraries, rather than one by one in order.

Result set merging: after each sql is executed, there will be an execution result, and we need to merge the result sets of sub-libraries and tables to get a complete result.

Challenge 2: distributed id

After dividing the database and tables, we can no longer use mysql's self-incrementing primary key. Because the self-incrementing id of records generated by different libraries may conflict when inserting records. So you need to have a global id generator. At present, there are many schemes for distributed id, one of which is twitter's snowflake algorithm.

Challenge 3: distributed transactions

Distributed transaction is a hurdle that cannot be bypassed by database and table, because it involves updating multiple shards of data at the same time. For example, the above batch insert records to four different libraries, how to ensure that either succeed or fail at the same time. With regard to distributed transactions, mysql supports XA transactions, but it is inefficient. At present, flexible transaction is the mainstream scheme, which includes best effort notification, reliable message final consistency scheme and TCC two-phase commit. However, no matter XA transaction or flexible transaction, it is very complex to implement.

Challenge 4: dynamic expansion

Dynamic expansion refers to increasing the number of sub-databases and tables. For example, the original user table is split into four tables in two libraries. Now we want to change the number of sublibraries to 4 and the number of subtables to 8. In this case, data migration is generally accompanied. For example, in the case of four tables, the record with an id of 7 is 7% 4% 3, so this record is on the table user3. But now the number of sub-tables has changed to 8, while 7% 8% 0, and there is no record of id=7 on the user0 table, so if the data is not migrated, the record will not be found. A solution that does not require data migration during dynamic expansion will be introduced later in this tutorial.

1.4 Summary

We have seen the benefits of read-write separation and library separation, but also face great challenges. It is difficult for business developers to do this work. Therefore, some companies specialize in doing some database middleware to shield business developers from the underlying tedious details. After using these middleware, whether it is read-write separation or sub-database and sub-table, it can be operated like a single database and table.

Next, we will introduce the mainstream database middleware design and implementation.

2 the design scheme of mainstream database middleware

The main role of database middleware is to shield application developers from the challenges of read-write separation and sub-database sub-table, and to hide the underlying implementation details, so that developers can operate data as if they were operating a single database and table. Before introducing the mainstream design scheme of sub-library and sub-table, let's first review the architecture of the application in the case of a single library.

We can see that in the case of operating a single database and single table, we directly establish a connection with the database through data connection pool (connection pool) in the application for read and write operations. For read-write separation and sub-database sub-table, the application has to operate multiple database instances, in this case, we need to use database middleware.

2.1 Design proposal

There are two typical design schemes of database middleware: proxy and smart-client. The following figure illustrates the architecture of these two scenarios:

You can see that both proxy and smart-client operate on multiple database instances at the underlying level. Whether it is sub-database and sub-table, or read-write separation, it shields the business development students at the database middleware level.

2.1.1 proxy mode

We independently deploy a proxy service that manages multiple database instances behind it. In the application, we establish a connection with the proxy server through a common data source (c3p0, druid, dbcp, etc.). All sql operation statements are sent to this agent, which operates the underlying database, gets the results and returns them to the application. Under this scheme, the logic of sub-library, sub-table and read-write separation is completely transparent to developers.

Advantages:

1 multilingual support. In other words, whether you use php, java or other languages, you can support it. Take the mysql database as an example. If proxy itself implements mysql's communication protocol, then you can think of it as a mysql server. The official mysql team provides different clients for different languages, such as java, mysql-connector-java,python, mysql-connector-python, and so on. So developers in different languages can use the corresponding drivers provided by mysql to communicate with the proxy server.

(2) transparent to business developers. Because proxy can be regarded as a mysql server, in theory, business students do not need to make too many code modifications, so they can complete the access.

Disadvantages:

1 the implementation is complex. Because proxy needs to implement the communication protocol of the proxy database server, it is difficult to implement. Usually we see some proxy schema database middleware, in fact, can only proxy a certain kind of database, such as mysql. There is almost no database middleware, and you can proxy multiple databases (sqlserver, PostgreSQL, Oracle) at the same time.

2 proxy itself needs to be highly available. Since the application used to access the database directly, it has now been changed to access proxy, which means that proxy must be highly available. Otherwise, the database is not down, proxy is down, resulting in the database can not be accessed normally, it will be embarrassing.

(3) tenant isolation. Multiple applications may access the underlying database of the proxy agent, which is bound to compete for resources such as proxy's own memory, network, cpu and so on. Proxy needs to have the ability of isolation.

2.1.2 smart-client mode

The business code needs to be modified to introduce sdk that supports read-write separation or sub-library and sub-table functions. This is our smart-client. Usually smart-client is encapsulated in a layer based on connection pooling or driver, and connections are established between smart-client and different libraries. The sql generated by the application is handed over to smart-client for processing, and the sql is internally operated, such as choosing the slave library or the master database in the case of read-write separation; in the case of sub-library and sub-table, sql parsing, sql rewriting and other operations are carried out, and then routed to different sub-libraries, the results are merged and returned to the application.

Advantages:

1 the implementation is simple. Proxy needs to implement the server protocol of the database, but smart-client does not need to implement the client communication protocol. The reason is that most data database vendors have provided corresponding database driver driver for different languages. For example, mysql provides mysql-connector-java driver for java language and mysql-connector-python driver for python. The client communication protocol has been done at the driver level. Therefore, the middleware of smart-client pattern usually only needs to be encapsulated on this basis.

(2) natural decentralization. The way of smart-client, because of its own way of sdk, is directly introduced by the application. As the application is deployed to different nodes and directly connected to the database, there is no need for an agent layer in the middle. Therefore, compared with proxy, there is basically no competition for any other resources except network resources, and there is no need to consider the problem of high availability. The database can be accessed as long as all the nodes in the application are not down. (compared with proxy, the high availability of the database itself still needs to be guaranteed.)

Disadvantages:

1 is usually supported in only one language. For example, tddl, zebra, and sharding-jdbc are all developed in the Java language, so these middleware cannot be used by users who use other languages. If you want to use other languages, you need to develop multilingual clients.

Version 2 is difficult to upgrade. Because the application uses the data source agent is to introduce a jar package dependency, when multiple applications are dependent on a certain version of the jar package, once this version has bug, all applications need to be upgraded. It is relatively easy to upgrade the database agent because the service is deployed separately, and as long as you upgrade the proxy server, all applications connected to the agent will naturally be upgraded.

2.2 Industry products

Whether it is proxy or smart-client, the effects of both are similar. The existing implementations of the two schemes and their respective advantages and disadvantages are listed below:

Proxy implementation

At present, the existing implementation schemes are:

Alibaba's open source cobar

Drds on Ali Cloud

Mycat developed by mycat team on the basis of cobar

Mysql-proxy officially provided by mysql

Atlas developed by Qihoo 360 based on mysql-proxy (only supports sub-tables, not sub-libraries)

Dangdang's open source sharing-sphere

At present, in addition to mycat, sharing-sphere, several other open source projects have been basically unmaintained, and sharing-sphere has entered the Apache Software Foundation incubator some time ago.

Smart-client implementation

The current implementation schemes are:

Alibaba's open source tddl has not been maintained for a long time.

Dianping's open source zebra, Dianping's zebra open source version of the code has not been updated for a long time, but recently Meituan listed, re-open source a large number of new internal features, and plans to maintain for a long time.

Dangdang open source sharding-jdbc, at present is done relatively well, the documentation is relatively complete. Entered the Apache incubator with sharding-sphere.

Ant Financial Services Group's zal

Wait a minute.

3 the core points of the separation of reading and writing

3.1 basic routing function

The main purpose of the basic routing function is to solve the problem of how to implement some basic routing functions in the case of read-write separation. This process can usually be described by the following figure:

3.1.1 sql type judgment

The main purpose is to judge whether sql reads or writes sql, read sql to slave database, and write sql to master library.

Write statements: insert, update, delete, create, alter, truncate...

Query statements: select, show, desc, explain...

3.1.2 forcibly take the main library

Sometimes, for some highly consistent scenarios, you must be able to read data after writing. Due to the delay in master-slave synchronization, it is possible to write to the master library but not from the slave library. This time, we need to use the function of forcing the main library to go. There are two specific implementation schemes: hint or API

Hint, that is, developers make some special tags on the sql. Database middleware recognizes this tag and knows that the sql needs to go to the main library, such as:

/ * master*/select * from table_xx

Here

/ * master*/ is a hint, indicating that you need to use the main library. Different database middleware forces the hint of the main library to be different. For example, it doesn't matter what the hint of zebra is / * zebra:w+*/,hint, its function is just a tag. The reason why hint is written in / *... * / because this is the standard sql comment syntax. Even if the database middleware does not recognize this hint, it will not cause sql syntax errors.

Api: mainly through the code to add the identity of the sql main library, hint can only be added to a certain sql. If we want multiple sql to go to the main library at the same time, and do not want to add hint, we can use api, which mainly makes use of the thread local thread context features of the language, such as:

ForceMasterHelper.forceMaster () / / … Execute multiple sqlForceMasterHelper.clear ()

All sql executed within the scope of the api identity will go to the main library. What the specific API should be and how to use it are also determined by the corresponding database middleware.

In particular, for some special sql, such as select last_insert_id; or select @ @ identity, this kind of sql always needs to go to the main library. These sql are the last id to be inserted into the record, which can only happen on the main library.

3.2 Slave routing policy

Usually in a cluster, there is only one master, but there are multiple slave. When judging to be a read request, how do you decide which slave to choose?

Some simple selection strategies include:

Random selection (random)

Select by weight (weight)

Or training in rotation (round-robin)

And so on.

In particular, for some database clusters deployed across IDC (data centers), a nearest routing strategy is usually required, as shown in the following figure:

In the figure, a master is deployed in IDC2, a slave is deployed in IDC1 and IDC2, and the application app is deployed in IDC1. Obviously, when app receives a query request, it should first query the slave1 in the same data center as it is, rather than query slave2 across the data center. This is the concept of nearest routing.

Of course, multiple slave may be deployed in a data center, and selection is required, so the nearest route is usually used in conjunction with some basic routing strategies. In addition, for the nearest route, there is usually a level, such as the same computer room, the same center, the same area, cross-area, and so on.

3.3correlation between HA and Scalable

In addition to having the read-write separation function mentioned above, database middleware needs to access the underlying database cluster. There is also a need for a system that supports high availability and dynamic scaling:

From the perspective of HA, for example, if the master library is down, you should choose one from the slave library as the new master library. Open source MHA can help us do this; however, MHA can only switch between master and slave when the master library is down, and MHA is usually powerless when only a slave library is down. Therefore, modifications are usually made in MHA to support more HA capability requirements.

From the Scalable point of view, for example, reading qps is too high, so you need to add some slave libraries to share the read traffic.

In fact, for both HA and Scalable, for database middleware (whether proxy or smart-client), only the configuration information has changed.

Therefore, usually we will write all the configuration change information to a configuration center, and then the configuration heart listens for the configuration changes, such as master-slave switching, only need to set the latest master-slave information to the configuration center; add slave libraries, ip, port and other information to the configuration center. Database middleware monitors these configuration information changes and applies the latest configuration information in real time when the configuration changes.

Therefore, the highly available architecture of a simplified database middleware is usually as follows:

The monitoring service monitors the cluster. When a change occurs, it push the change information to the configuration center, and the database middleware (proxy or smart-client) receives the configuration change and applies the latest configuration. The whole process is basically unaware of the business code.

For the choice of configuration center, there are many, such as Baidu's disconf, Ali's diamond, Dianping open source lion, Ctrip open source apollo, etc., you can also use etcd, consul. Usually if there is no historical burden, it is recommended to use Ctrip open source apollo.

In particular, it is important to note that there must be some delay when the monitoring service monitors the cluster information change, pushes it to the configuration center, and then to the database middleware. For some scenarios, such as master-slave switching, there is no way to achieve complete business awareness. Of course, for multiple slave libraries, in the case of a slave crash, it is possible to be business-unaware. For example, if a slave fails, database middleware will automatically retry from other normal slave libraries.

In addition, the HA scheme in the figure above is strongly dependent on the configuration center. If many libraries are established on a database cluster, there will be a large amount of configuration information to be pushed when changes occur. Alternatively, if the database cluster is deployed in multiple data centers, there will be a large amount of configuration information to be pushed in the event of an overall outage of a data center (such as a broken optical fiber, or an outage exercise in the server room). If you configure the center and there is a push delay, the business will have a very obvious perception.

Therefore, we usually do some lightweight HA guarantees on the client side. For example, determine the severity of the exception according to the sqlstate and vendor code returned by the database, and determine whether the database instance can provide services normally. If the service cannot be provided normally, it is automatically isolated, and an asynchronous thread is started to detect whether the database instance is restored.

Finally, many database middleware will also provide some current-limiting and degraded functions to calculate the unique identity of the sql (some call it sql fingerprint). For some cases where bad sql causes greater pressure on the database, you can intercept in real time and directly throw an exception to prevent these sql from hitting the back-end database.

4 core points of sub-database and sub-table

From the point of view of business development, it does not care whether the underlying layer is divided into libraries and tables, but still wants to write sql like operating a single database instance, so database middleware needs to shield all the underlying complex logic.

The following figure shows how a batch insert sql is performed inside the database middleware in the case of a database table (user table):

Database middleware mainly shields the following processes from applications:

Sql parsing: first parse the sql, get the abstract syntax tree, and get some key sql information from the syntax tree.

Sql routing: sql routing includes library routing and table routing. The library route is used to determine which sub-library the record should operate on, and the table route is used to determine which sub-table the record should operate on.

Sql rewriting: rewrites sql to the correct execution. For example, for a bulk insert sql, 4 records are inserted at the same time. But in fact, the user wants four record subtables to be stored in one sub-table, so the sql should be rewritten into four sql, and only one record can be inserted in each sql.

Sql execution: a sql may be rewritten into multiple sql. In order to improve efficiency, it should be executed concurrently rather than one by one.

Result set merging: after each sql is executed, there will be an execution result, and we need to merge the result sets of sub-libraries and tables to get a complete result.

4.1 SQL parsing

The user executes just a sql and passes in the relevant parameters. Database middleware needs to parse sql through sql parser. Sql parsing can be compared to xml parsing. The final result of xml parsing is a document object, while sql parsing ends up with an abstract grammar tree (AST). Through this syntax tree, we can easily get some execution of sql, such as the type of sql currently executed, querying those fields, database table names, where conditions, parameters of sql, and so on.

Generally speaking, for sql parsing, you need to go through two stages: lex parsing and Syntax parsing, and finally get a syntax tree.

The internal implementation principle of SQL parser is shielded to business students, and business students are not aware of it. Some database middleware uses third-party open source sql parsers, while some self-developed sql parsers. For example, mycat and zebra all use druid parsers, while shard-jdbc also uses druid parsers at first, and then develops its own parser later. The popular sql parsers include:

FoundationDB SQL Parser

Jsqlparser

Druid SQL Parser

Among them, Fdbparser and jsqlparser are implemented based on javacc.

The mycat team once did a performance test, and the parsing performance of druid parsers is usually 10 to 20 times higher than that of javacc-based sql parsers. I have also conducted similar tests, and the conclusions are basically the same.

How to compare the quality of different sql parsers? The main consideration is the following two points:

Parsing performance: druid is the best.

Druid uses predictive analysis, which only needs to traverse from the first to the last character to complete lexical parsing and grammatical parsing at the same time, and the grammar tree has been constructed.

Database dialect: druid supports the most.

SQL-92, SQL-99 and so on are all standard SQL,mysql/oracle/pg/sqlserver/odps dialects, and sql-parser needs special treatment for different dialects. Druid's sql parser is currently the most complete SQL Parser that supports all kinds of data syntax.

Note: what we are talking about here is only the SQL parser based on Java, and druid is better. Most students probably know that druid is a connection pool created for monitoring. In fact, another major feature of druid is its SQL parser. Many open source database middleware, such as zebra, sharding-jdbc, etc., use druid parsers. Sharding-jdbc later developed his own parser. Although SQL parsing is a highlight of druid, a lot of issue has been received on github because of the bug parsed by SQL.

4.2 SQL routin

Routing rules are the basis of sub-database and sub-table, which specifies what kind of rules data should be routed to different sub-database sub-tables. For a database middleware, it usually supports users to customize any routing rules. The routing rule is essentially a script expression, and the database middleware calculates the expression through the built-in script engine to determine which sub-libraries and tables to operate. Common routing rules include hashing, according to date, and so on.

The following figure shows how the user table can be routed according to id after it is divided into two databases and two tables each:

Routing is divided into:

Library rules: used to determine which sublibrary to go to

Table rules: used to determine which subtable to go to

In the above example, we use id as the calculation table and table, so the id field is called the route field, or the partition field.

It is important to note that this route field should be included in the SQL regardless of whether the INSERT, UPDATE, DELETE, or SELECT statements are being executed. Otherwise, for insert statements, you don't know which sub-library or table to insert; for UPDATE, DELETE, and SELECT statements, it is even more serious, because not knowing which sub-table to operate on means that you have to operate on all the sub-tables. SELECT aggregates the contents of all sub-tables, easily overflows memory, updates and deletes all records by UPDATE and DELETE, and easily updates and deletes data by mistake. Therefore, some database middleware may have some restrictions on SQL, for example, UPDATE and DELETE must have partition fields, or specify filter conditions.

4.3 SQL rewriting

As mentioned earlier, such as a batch insert statement, SQL needs to be rewritten if records are to be inserted into different subtables. For example, set the following SQL

Insert into user (id,name) values (1, "tianshouzhi"), (2, "huhuamin"), (3, "wanghanao"), (4, "luyang")

Rewrite as follows:

Insert into user_1 (id,name) values (1, "tianshouzhi") insert into user_2 (id,name) values (2, "huhuamin") insert into user_3 (id,name) values (3, "wanghanao") insert into user_0 (id,name) values (4, "luyang")

This is just a simple case, usually for INSERT, UPDATE, DELETE, etc., the rewriting is relatively simple. What is more complicated is the rewriting of SELECT statements. For some complex SELECT statements, some optimizations will be made in the rewriting process, such as changing subqueries to JOIN, pushing down filter conditions, and so on. Because SQL rewriting is complex, many database middleware do not support complex SQL (usually there is a supported SQL), but can only support simple OLTP scenarios.

Of course, there are some database middleware, which are not satisfied with only supporting OLTP, but have made more efforts in the direction of OLAP. For example, Ali's TDDL, Ant's Zdal and Dianping's zebra all introduce apache calcite to try to support complex query SQL (such as nested subquery, join, etc.), push and stream through filter conditions, and combine RBO (rule-based optimization) and CBO (cost-based optimization) to support some simple OLAP scenarios.

4.4 SQL execution

After the SQL rewriting phase, multiple SQL will be generated and need to be executed in different shards. Usually, we will use a thread pool to package each SQL into a task and submit it to the thread pool for concurrent execution to improve efficiency.

If one of these executed SQL fails, the whole fails and an exception is returned to the business code.

4.5 result set merging

Result set merging is a major difficulty in database middleware, which requires case by case analysis, mainly considering the complexity of implementation and the efficiency of execution, which may not be supported for some complex SQL. For example:

For query conditions: most middleware supports =, IN as query conditions, and can be used as partition fields. But for NIT IN, BETWEEN... AND, LIKE,NOT LIKE, etc., can only be used as ordinary query conditions, because according to these conditions, it is impossible to record which sub-database or table it is in, but can only scan the whole table.

Aggregate function: most middleware supports MAX, MIN, COUNT, SUM, but it may only partially support AVG. In addition, if it is function nesting, GROUP BY aggregation, there may be some database middleware does not support.

Subquery: subquery divided into FROM part and WHERE part. Most of the support for subqueries is very limited, such as syntax compatibility, but the partition fields in the subquery cannot be recognized, or the table name of the subquery must be the same as the external query table name. or only first-level nested subqueries are supported.

JOIN: the support for JOIN is usually very complicated. If you can't do push and stream reading under filtering conditions, JOIN can hardly be supported at the middleware level, because it is impossible to put all the sub-tables of the two tables into memory for JOIN. Memory will be dead long ago. Of course, there are some tricks, one is Binding Table, the other is small table broadcast (see below).

Paging sorting: usually middleware supports ORDER BY and LIMIT. However, in the case of sub-database and sub-table, the efficiency of paging is low. For example, for limit 100, 10 ORDER BY id. Indicates that 10 records are taken from the 100th position according to the order of id. So, most database middleware actually queries 110 (100-10) records from each sub-table, takes them to memory for reordering, and then fetches 10 records. Assuming that there are 10 sub-tables, 1100 records are actually queried, and only 10 records are filtered out. Therefore, in the case of paging, it is generally recommended to use the "where id >? limit 10" approach to query, and the application remembers the largest record id for each query. When querying later, each sub-table only needs to fetch 10 records from this id, instead of offset + rows records.

Special notes on JOIN:

Binding Table:

It applies when there is an association between the two tables and the routing rules are the same. For example, there are user table and user_account table. Because user_account is strongly related to user table, we can set the routing rules of these two tables to be exactly the same. Then for the information of a particular user, the user and user_account subtables must be unique under the same subdatabase and suffixed with the same name. In join, the join in a branch library can get the complete information of this user and account, without the need for cross-database join, so there is no need to take the user's database to memory for join.

Small watch broadcast:

A small table broadcast usually means that a table has a small amount of data, such as the department table department. Another table has a large amount of data, such as user. At this point, user needs to split the database and table, but department does not need to do the database table. In order to achieve the purpose of JOIN, we can synchronize a complete copy of the department table in each sub-library in real time. In this way, in JOIN, the database middleware only needs to simply merge the results of the sub-library JOIN.

The following figure illustrates the process of small table broadcast. When users update the department table, they always update the department table of the sub-library db0, and the synchronization component synchronizes the change information to other sub-libraries.

Note: the synchronization component in the figure refers to the slave database disguised as a database, parsing the source library binlog and inserting the target library. There are some open source components, such as canal, puma, which can achieve this function, of course, these components have a wide range of application scenarios, not limited to this. The author has written a series of canal source code analysis articles, most of which have been completed so far.

4.6 second-level index

Usually, when dividing the database and table, there is only one partition field. For example, for the user table user, partition is made according to the user_id field, and then the information of a user can only be queried according to user_id as the partition field. Using other fields, all subtables need to be scanned, which is very inefficient. However, there is a need to query a user's information according to other fields, for example, according to the mobile phone number phone_id.

At this point, we can make a full copy of the data inserted according to user_id. Through the synchronization component, it is re-inserted into another sub-library and table cluster according to phone_id, and the cluster becomes a secondary index, or secondary dimension synchronization. After that, according to the operation of user_id, the operation is carried out in the original database and table cluster; according to the operation of phone_id, the operation is carried out in the secondary index cluster.

It should be noted that for update operations, only the original cluster can be operated, and the secondary index cluster can only perform query operations. The incremental data change information of the original cluster is synchronized to the secondary index cluster through the synchronization component in real time.

Note: this is a very common interview question. Some of Ali's interviewers prefer to ask. Some interviewers may have thought of this plan because they ruled it out considering that it is a waste of resources. In fact, these resources are not a big deal relative to meeting business needs.

4.7 distributed id generator

In the case of sub-database and sub-table, the self-increasing primary key of the database can no longer be used. So use a distributed id generator. The distributed transaction id generator should meet the following conditions: uniqueness, increasing trend (reducing index overhead when dropping the database), high performance, and high availability.

At present, the mainstream distributed id generation schemes are dependent on third-party components, such as:

Based on zk

Based on mysql

Based on cache

Twitter's snowflake algorithm is a completely decentralized distributed id algorithm, but the workid is limited to 1024 at most, that is, the application size cannot exceed 1024. Although minor adjustments can be made, there is always a quantity limit.

In addition, Meituan previously opened up a leaf component in github, which is used to generate distributed id, which can be studied by interested readers.

A decentralized distributed id generation scheme supporting dynamic expansion is proposed here. In addition to ensuring uniqueness, increasing trend, no third-party dependence, and supporting dynamic expansion of storage, this scheme also has the following advantages:

Support query by time range, or time range + ip query, you can directly use the primary key index

The maximum sequence id per second is the qps of an ip, etc.

12-bit date + 10-bit IP+6 bit sequence ID+4 bit database extension bit

Where:

12-bit date: the format is yyMMddHHmmss, which means that the id generation strategy for this solution can be used until 2099, with part of the time ahead, thus ensuring an increasing trend.

10-bit ip: converts a 12-bit ip to a decimal number using the ip to decimal algorithm. Global uniqueness is guaranteed through ip addresses. If the ip address is recycled, don't worry about the uniqueness of the id, because the date part is still changing.

6-bit sequence id: this means that up to 100 million id can be generated per second (zero 999999). Less than 6 leading zero, such as 000123.

4-bit database expansion bit: in order to achieve dynamic capacity expansion without data migration, 2 bits represent DB,2 bit represents TB, which can be expanded to a maximum of 10000 tables. Assuming that each table stores 10 million data, a total of 100 billion pieces of data can be stored.

Illustration of dynamic expansion of database expansion bits:

First of all, it is clear that the routing policy always determines which sub-table of which sublibrary a record is going to go to based on the last four bits of the database. Xxxx0001, for example, means that the record must be on the 01 subtable of the 00-point library.

Next, we will make an article on the generation strategy of id.

Suppose the initial state is two sub-libraries db_00,db_01, each sub-library has 10 sub-tables, tb_00~tb_09. At this point, when the business wants to ensure that the id is generated, the two digits of db are always between 000001 and the two digits of tb are always between 000009. According to these id, the routing strategy can find the correct sub-database table.

Now we need to expand the capacity to 10 sub-libraries, each with 10 sub-tables. Then DBA will first create the new sub-library: db_02~db_09, and then create 10 sub-tables in each sub-library: tb_01~tb_09. On this basis, business students change the id generation strategy to: the two digits of db are between 00009 and the two-digit rule of tb remains the same (only the number of sublibraries has changed, the number of subtables of each sublibrary has not changed). Because the routing strategy is based on the last four digits to determine which sub-library and which sub-table, when these new sub-library sub-table extension bit id appears, it can be inserted into the new sub-library sub-table naturally. It also achieves dynamic expansion without the need to migrate data.

Of course, in the new sub-database table, there is no data at the beginning, so the data is uneven. You can adjust the probability of db and tb in the id expansion bit to generate a certain value, so that the probability of falling into the new sub-database sub-table is a little bit larger (not too large), and then re-adjust to completely random after the data is uniform.

The core idea of this scheme is to pre-allocate the maximum amount of resources that may be used in the future. In general, 100 sub-libraries, each with 100 sub-tables, can meet the data storage of most applications. If 100 sub-libraries are on different mysql instances, assuming that each mysql instance is a 4T disk, then 400T of data can be stored, which can basically meet the needs of most businesses.

Of course, the plan is not perfect. If this value is exceeded, this scheme may not be feasible. However, usually a technical solution can guarantee that there is no need to change the architecture between 5 and 10 years, so it should be a good solution. If you are looking for a perfect solution, database products like TIDB, which can automatically expand capacity, may be more suitable, but at present, similar products such as TIDB can not replace traditional relational databases. Maybe after 5 or 10 years, these products will be more mature, and it's not too late for you to move there.

4.7 distributed transaction

In the case of sub-libraries and sub-tables, distributed transactions are involved because of the operation of multiple sub-libraries. For example, to perform a bulk insert SQL, if the records are to be inserted into different sub-libraries, there is no guarantee of consistency. Therefore, in general, database middleware will only guarantee the transaction of a single sub-library, that is, when the business side creates a transaction, it must ensure that all operations in the transaction must eventually be executed in a sub-library.

In fact, in the framework of micro-services, the problem of transactions is more complex.

When Service A performs some operation, it needs to operate the database. At the same time, the database that calls the underlying operation of Service B and Service C Department Service B is divided into databases and tables, and Service C also needs to operate the database.

In this scenario, it is very troublesome to ensure the consistency of transactions. Some commonly used consistency algorithms such as paxios protocol and raft protocol can not solve this problem, because these protocols are consistent at the resource level. Under the micro-service architecture, the consistency of transactions has been raised to the business level.

If you only consider sub-database and sub-table, some students may think of XA, but the performance is very poor, and there are also requirements for the version of the database, for example, you must use mysql 5.7. it is also recommended that the transaction isolation level be set to serialization, which is intolerable.

Because the application scenario of distributed transaction is not just divided into libraries and tables, there is usually a special team to do the distributed transaction, not necessarily the database middleware team. For example, sharding-jdbc uses the saga component of service comb, Huawei's open source micro-service architecture solution, to achieve the ultimate consistency of distributed transactions. Ali also has a similar component, which is called TXC internally, GTS on Aliyun, and recently open source to GitHub called fescar (Fast & Easy Commit And Rollback). Ant Financial Services Group has a similar component, called DTX, which supports FMT mode and TCC mode. The FMT pattern is similar to TXC.

Generally speaking, in fact, TCC can better meet the needs of the business, although the access is more complex. About fescar, recently quite popular, this is written by java, specific reference: https://github.com/alibaba/fescar.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.