Crossing the gap of database development and talking about the technical trend of distributed database 07/01 Update SLTechnology News&Howtos

Crossing the gap of database development and talking about the technical trend of distributed database

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Demand for structural transformation of the financial industry

With the continuous development of mobility and Internet, the business model and technology system of China's financial industry has gradually embarked on a completely different road from that of the western world. As we all know, the popularization rate of mobility in Europe and the United States is far lower than that in China, and the population base is also different by an order of magnitude, which makes huge differences in business types, data and concurrency faced by the financial industry at home and abroad, resulting in completely different requirements for the whole IT infrastructure.

In the last year or two, some leading domestic banks have taken the lead in exploring micro-services and distributed technology, and some new Internet financial services have also begun to use micro-service architecture, distributed technology and DevOps framework for application development and maintenance. Even when some banks are planning the next generation core architecture, they will try to introduce a distributed architecture appropriately to meet the future business pressure and the growing demand for data.

Compared with the new generation distributed architecture, the traditional "chimney" architecture of middleware plus database has many problems in business applications with massive data, high concurrency and high response speed.

From the point of view of business units and systems, complex business leads to a large number of systems in the enterprise, scattered, completely isolated and unable to share data.

The system lacks flexible horizontal scalability, and the performance bottleneck is obvious, so it is easy to encounter hardware bottlenecks and can not meet the business needs of elastic expansion.

The system is unable to respond quickly to a large number of homeopathic requests, such as the instantaneous explosive growth caused by services such as Singles' Day and second kill, which is difficult to deal with.

The procurement and operation and maintenance costs are high, and the minicomputer equipment and software and hardware are purchased separately, which leads to the high overall cost of ownership.

Lack of independent control, highly dependent on foreign manufacturers, there are serious problems, the local support team is difficult to solve the problem in a short time, resulting in increased risk of production and operation.

The evolution of banking structure

In the past two decades, the IT structure of Chinese banks has undergone several stages of change. The first generation of banking core system in China is built on the mainframe and adopts the typical centralized architecture. With the introduction of the concept of SOA, some banks have gradually begun to de-mechanize, moving the core business system from the mainframe or 400s down to the UNIX minicomputer. The enhancement of virtualization technology has led some banks and financial institutions to introduce virtualization mechanisms into their infrastructure to deploy applications from development environments and some production environments on virtual machines.

Nowadays, many banks have built big data platform based on distributed and PC server architecture, and some applications based on micro-service architecture encapsulate the business logic, combined with background distributed storage and database technology to achieve an end-to-end distributed architecture.

Just as the science and technology departments of many banks have experienced the hardships of the transformation of core systems from centralization to SOA, the transformation from the current minicomputer system to distributed architecture is also faced with great challenges, such as the choice of technology stack, the development of applications, and the construction of DevOps system.

In the transformation of application development from traditional architecture to distributed architecture, the first thing to face is the application framework. Today's micro-service framework is very mature, and its representative architecture often includes four layers: protocol processing, service assembly, atomic services, and underlying persistence. The business logic is disassembled from the traditional single middleware into a large number of micro-service modules, each of which is composed of a series of completely equivalent containers, which can simply expand the throughput capacity of the service by adding containers.

But the split of micro-services means that each service has its own independent execution logic and storage. From the point of view of database, the split of micro-service system poses a great challenge to database storage. If each microservice still stores data in a traditional single point database, its storage and processing capacity cannot provide the same scalability with the increase in the number of microservice containers. In this case, the database will become the biggest bottleneck of performance and scalability in the micro-service architecture.

If each microservice is stored in a separate database, the data architecture of the entire enterprise IT will become fragmented. The number of databases has been split into tens of thousands of databases from the past hundreds, and the management costs and database procurement costs of the entire operation and maintenance team are facing a geometric increase.

Therefore, the goal of distributed database is not only as a single substitute for traditional Oracle or DB2, but also to store the data that cannot be stored in one database on multiple physical machines. In the actual environment, most banks have relatively perfect data life cycle management strategies, and generally do not accumulate a large amount of historical data in the production environment, so the amount of data is generally not the most important reason for the use of distributed databases.

Distributed database architecture

The core value of distributed database is to provide a flexible and scalable pool of data service resources for distributed applications, which can also be called DBPaaS platform. Its main capability is to provide a database service platform with flexibility, high response speed and easy maintenance for tens of thousands of micro-services from different developers, different business types, different SLA security levels and different data types. At the same time, a series of data isolation and governance mechanisms such as high availability configuration, disaster recovery policy definition, multi-tenancy, logical and physical isolation of business data, mixed mode isolation of transaction analysis, cold and hot data isolation and so on must be supported.

For some Internet enterprises with micro-service architecture, a database operation and maintenance team of more than 20 people can support hundreds of thousands of different database instances, and the core of operation and maintenance is to build a unified DBPaaS platform for the enterprise, which simplifies the database management of operation and maintenance personnel on a large scale through fault self-healing and flexible expansion of distributed database.

At present, there are many distributed database products in the industry, which are mainly divided into three architectures.

 applies vertical split

The application of vertical split is the most traditional distributed concept. One way of implementation is to split the application into multiple independent sub-services, each of which corresponds to part of the data in the whole; the other is to connect multiple database connections in one service and select data sources within the application according to business rules. For example, the application is split according to the user account ID, and the ID exists in database A for users within one to 1 million, database B from 1.000001 million to 2 million, and so on.

By presupposing a rule in the application, the mechanism first filters the target database instance from the rule base for each data access, and then directly obtains the connection for access. Using this mechanism, on the one hand, cross-database transactions are extremely difficult to implement, on the other hand, the business intrusion of distributed capabilities is very strong, and a lot of customized development is needed to complete the basic business logic. at the same time, each expansion needs to do a complete end-to-end carding of the application logic, there may be a lot of risk and secondary development work.

 middleware sublibrary and table

With the popularity of the demand for distributed storage capacity, another kind of technology system, called middleware sub-database and sub-table, has gradually emerged in the industry. The idea of this kind of technical system is to build a SQL parser service between the application and the database, parse the traditional SQL and then translate it into the corresponding sub-query of each underlying database, and then send the query directly to the underlying traditional database for execution.

The advantage of this mechanism is that the data storage can continue to be based on the traditional relational database, while the upper layer encapsulates the application interface to a certain extent. However, from the perspective of the whole industry, the mechanism of middleware sub-database and sub-table can be regarded as the transition stage from traditional single-point database to distributed database. Before the popularity of new distributed databases based on PC servers, some applications that are in urgent need of data splitting can first alleviate the pressure of soaring business and data volume, but it will be difficult to maintain their advantages after the native distributed databases are mature and verified in the future. At the same time, the technology can not be completely transparent to the application. Generally speaking, it is necessary to specify some parameters or use a unique syntax when assembling the SQL, so it is difficult to be completely transparent and unaware of the application.

 native distributed database

Different from the middleware sub-library and sub-table technology, the native distributed database is reconstructed from the underlying storage engine directly based on the PC server, and optimized for distributed storage and execution in many fields, such as data storage structure, data security mechanism, distributed transaction control and so on.

The native distributed database is developed completely from scratch, abandons the minicomputer system completely, and designs the distributed database based on the hardware architecture of PC server, which naturally integrates the mechanisms of high availability, disaster recovery and distribution into all aspects of the data storage system. For example, some distributed database products can achieve distributed storage and execution capabilities that are completely transparent to applications while being 100% compatible with MySQL. From the developer's point of view, users do not need to care about whether a table is stored in hundreds of millions or billions of records. As long as the capacity and maximum physical resource consumption policy are configured when the table is created, the data will be automatically balanced among multiple physical devices in the cluster. From the point of view of the application, read and write requests are made directly like accessing standard tables.

Technology trend of Native distributed Database

In order to support the future IT micro service framework, the introduction of distributed transactional database needs to be evaluated from two dimensions: the compatibility of traditional technology and the foresight of new technology.

The support of ACID and the support of SQL integrity are two key indicators to evaluate whether a new distributed database is compatible with traditional database technology.

Support for ACID

From the security point of view, whether using new technology or traditional technology, data is good and not lost is a necessary basis for all databases. In the distributed database industry, some products designed for Internet technology aim at Partition Tolerance and Availability, which cannot guarantee the correctness of data in security consistency (Consistence), so it is difficult to be widely used in financial business. Therefore, the new distributed database concerned by banks must first ensure the security and consistency of data, in which distributed transactions, distributed locks and the support of four isolation levels are the key technical points in this index.

SQL integrity support

SQL integrity refers to the development friendliness of new distributed databases and traditional relational databases. The more mature the distributed database is, the more compatible its SQL syntax is with the traditional relational database, and the more transparent its data segmentation is to the application program. Nowadays, most distributed database technologies claim to support MySQL syntax, and mainstream new applications also take MySQL as their default database option. Therefore, the strength of the support for MySQL syntax protocols is the key to evaluate the SQL integrity support of distributed databases.

The new technology forward-looking refers to whether the distributed database is consistent with the future development method and IT architecture.

Distributed and resilient scalability

As a data service resource pool, distributed database must be flexibly expandable in order to continuously increase the type and number of micro-services in the upper layer. At the same time, for each micro-service, whether the data is stored in one physical device or multiple physical devices, the application code must be completely transparent.

Multi-mode engine

To serve the upper micro-services from different developers, different business scenarios and different data types, distributed database must support a variety of SQL protocols and computing engines. From the perspective of storage engine, both structured and semi-structured data may be used in applications at the same time. Therefore, the new generation of distributed database needs to support Multi-Model engine from access interface to storage structure.

HTAP (Hybrid Transactional/Analytical Processing)

HTAP is the ability to analyze and process mixed transactions. In the traditional bank IT architecture, the online transaction and statistical analysis system often uses different technologies and physical equipment, and migrates the online transaction data to the analysis system through the ETL executed regularly. As a data service resource pool, the same data may be shared and accessed by different types of micro-services. When some online transactions and audit services run at the same time for the same data, it is necessary to ensure that the requests are executed in a completely isolated physical environment, so that the transaction analysis business does not interfere.

Generally speaking, the trend of distributed database technology needs to be judged from the two dimensions of traditional technology compatibility and new technology foresight, in which ACID data security and SQL integrity are important indicators of traditional technology compatibility, while elastic scalability, multi-mode engine and HTAP are several important measures of new technology foresight.

Financial distributed database application scenario

In the current financial industry, distributed database has been applied in five fields: data warehouse, big data platform, content management platform, data center, and online transactions. For the use of online distributed database, the current industry mainly revolves around three types of business scenarios.

 online trading system

Online trading system is an important production environment for banks. Some banks in China, which are at the forefront of the exploration of distributed technology, have gradually moved the core business process system from the mainframe and minicomputer architecture of IBM and Oracle to the distributed environment, so that the cluster can expand flexibly and meet the needs of business growth that breaks out at any time. Some typical systems that use distributed database include online loan core, channel integration, credit card points and so on.

 data center

Nowadays, many enterprises have put forward the IT architecture which emphasizes the middle stage rather than the foreground. As the key platform of enterprise IT data integration, the data center plays the role of "data aggregation, before and after connection" for the flexible business requirements of the foreground and the relatively fixed data model in the background. For example, banks can first aim at slimming down the production system, starting with historical pipelining bill query and printing, and gradually expand to quasi-real-time data services such as user portraits and asset views.

 content management platform

The traditional content management platform is mainly built for the purpose of future supervision and audit, and the front-end business will not directly participate in the use of unstructured data. With the popularity of self-service devices and mobile applications, more and more processes require the direct participation of unstructured data. Therefore, the content management platform is also moving from the back end to the front end in many banks, and a large number of customer applications are directly connected to the content management platform. a large number of processes of account opening, credit, and even self-service equipment are highly dependent on the real-time interaction ability of the content management platform, making the content management system from the traditional internal background audit to external online services.

As we can see, as an offline analysis business scenario, distributed database has long been widely used in banks. For online business, MPP data warehouse and big data platform can not meet the demand in terms of reliability, concurrency and response speed.

Summary

Nowadays, some banks with deep research on distributed technology have begun to carry out pilot applications for distributed databases. The core value of distributed database is not only to distribute the data stored in traditional database to multiple physical devices, but also to provide a flexible scalable and multi-mode interface data service platform (DBPaaS) for future micro-service application development model, facing data from different developers, different SLA levels, different high availability disaster recovery characteristics, and different business types.

A question often asked by current scientists: can distributed databases replace Oracle in the future? The answer to this question can be said to be very intuitive. Distributed application framework and clustering of PC servers must be the development direction of IT in the future, while micro-services to replace chimney software architecture, it is necessary to transfer the database from the traditional "point" to the "face" of the platform. Every application has its own iteration cycle, and now you can see that many applications are starting to use open source databases such as MySQL as their default database options, and there will be fewer and fewer scenarios where they have to use Oracle in the future.

Therefore, distributed database will replace traditional single point database such as Oracle in the future. The science and technology department of the bank should also look ahead to the distributed database technology as soon as possible in order to adapt to the trend of the transformation of bank IT architecture from smoke mode to micro-service in the future.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.