NetEase Ma Jin: the Architectural change of DDB from distributed Database to structured data Center 07/03 Update SLTechnology News&Howtos

NetEase Ma Jin: the Architectural change of DDB from distributed Database to structured data Center

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction: this article is based on the content of Mr. Ma Jin's live speech on May 10, 2018 [the Ninth China Database Technology Conference (DTCC)].

Ma Jin NetEase DDB Project Manager

From NetEase Hangyan big data platform Group, has participated in distributed database DDB, cache NKV, NetEase data canal NDC and other projects since joining the position. Currently, he is the head of the DDB project, leading the research and development of various database middleware projects. Focus on distributed system architecture and database technology, keen to build efficient, high-performance distributed background applications.

Summary:

Distributed database DDB is the earliest distributed system developed by NetEase. Over the past decade, it has been providing stable and transparent database and table services for NetEase's major Internet products. Four years ago, we launched private cloud DDB, which provides great convenience for developers and operators in the use of DDB and auto scaling. A year and a half ago, we split up a set of platform heterogeneous database data migration, synchronization and subscription service NDC from DDB's online data migration module. Nowadays, as the network environment of NetEase's internal and external applications becomes more complex, and there are more and more application scenarios, there are more requirements and challenges for DDB's ease of use, platform, and solutions for computer rooms and multi-tenants. This sharing will take you to witness the thinking and architectural changes of DDB in the process of evolving into a structured data center.

Text:

Brief introduction to the Development of DDB

DDB: a distributed database in one step

DDB is a sub-library and sub-table database that manages thousands of data nodes and can support PB-level structured data storage. At present, the versions used in the company are basically query server mode, with the separation of query service and storage service, which can be expanded horizontally and can support millions of qps. In addition, DDB also supports online data migration, can support daily GB-TB data growth and has standardized access protocols.

The online migration function of DDB is essentially a core requirement of our distributed database. At present, our sub-library and sub-table architecture still defines it as a process that operators need to participate in. It is said in the industry that our approach has the feeling of the previous generation of databases, one of which is that it is not standardized enough, and the other is that the downsizing of the database requires the participation of operation and maintenance personnel. But I personally think this is a problem of design philosophy, we can solve these two problems very well, but in the final analysis, it depends on how low the cost of operation and use is.

The picture on the left is the cost curve of our database. With the increase of data scale and access scale of stand-alone database, the growth of cost curve is very non-linear. For example, once we have reached the cost of business machines online, the cost of upward expansion will be very high. If we use sub-database and sub-table distributed database, we can well achieve the linear expansion of storage and query functions, and its use can meet our requirements, and we will naturally get an ideal linear curve.

The development process of DDB

When the first version of DDB was launched in 2006, the function was relatively simple, which was a simple SQL compatibility and partial management function. In a nutshell, from the V1 version in 2006 to the V3 version in 2010, it is a set of driver mode, which is internally called the DBI mode. In 2012, with the launch of the private cloud, we developed a query server, which can support multiple languages and provide rich SQL statistics functions within the query server. At the same time, we provide cloud computing DDB Pass services on the cloud computing management platform, which can be deployed and monitored with one click. After 2017, we began to develop version 5.0 of DDB, gradually simplifying the architecture and modularizing related services, and further expanding the compatibility of SQL.

DDB functional features: data channel

In terms of data channel, first, we have a better way of equalization, a two-level mapping function and a hash function that applies custom configuration.

Second, we have a relatively high standardization, SQL 92 can achieve 90% compatibility, up to 95% after 5.0. And we support global self-adding ID, explain plan, data standard import and export, query server compatible with MySQL communication protocol.

Third, we support distributed transactions. Many experts in the industry may complain that the two-phase commit agreement is unreliable, but I think ACID is essentially a probability problem. The ACID function of the two-phase commit agreement is far inferior to the stand-alone transaction in probability, but it does not mean that its reliability has not been improved. In addition, we will also emphasize to the application the sensitive data transactions involved in the business scenario. And the implementation of our distributed transaction two-phase commit protocol is transparent to users, there will be some internal optimization measures, not all transactions are two-phase commit. Finally, it is emphasized that there is still a bottleneck in the performance of the distributed transaction two-phase commit protocol, and we will also emphasize it specifically with the application in the business.

Fourth, we provide Hint function, which can achieve read-write separation within DDB, and then implement SQL custom routing.

Fifth, we provide rich statistical functions at the query server level, including SQL mode statistics, SQL frequency statistics, slow SQL statistics and multi-dimensional QPS statistics.

DDB functional features: management channel

In terms of management channel, it should be noted that we are compatible with MySQL management syntax, creating tables and other operations are highly consistent with MySQL in syntax, and user management is also similar. Moreover, our online data migration function provides more scene-based online migration. When the previous application is used to create a table, if the partition field is not selected, we can change the partition field on the management tool, and the underlying data redistribution will be implemented. There are also some extended functions, such as scheduled tasks and hanging transaction alarms. In terms of high availability, our query server is a hot standby, and the underlying database is also a classic master-slave architecture, with an automatic fail-over mechanism internally.

Core advantages of DDB

The core advantages of DDB are, first, there are standardized access protocols and high SQL compatibility. Second, distributed transactions can ensure high consistency of data, and have more perfect graphical management tools and online capacity expansion and scaling functions. Third, all components are highly available.

DDB V1-V3 Architecture: DBI Model

Changes in the DDB architecture over the past decade:

This is the classic architecture of V1-V3, in which sub-library and sub-table are implemented in the driver layer. On the application side, they rely on the JDBC-driven jar package we provide to access DDB. The execution plan of syntax parsing within DDB is done in the driver layer. In addition, we have a Master component that is responsible for metadata management and synchronization throughout the cluster.

But this cluster will also bring some problems. First, these implementations within DBI are more CPU-intensive, from syntax parsing to execution planning to final result set merging. Second, in this architecture, DDB versions are difficult to manage. The third problem is more serious, especially for applications with large volume. DBI scales horizontally with the application layer. If we find that the volume is not enough, we will add the application server, DBI will also increase, and the number of connections to the underlying database node will also increase, which may lead to the problem of the underlying database being flooded.

This set of architecture is also quite classic, and many partners should be able to see that this set of architecture is similar to Ali TDDL. But a very important difference is that our cluster management is relatively independent, each cluster has a separate Master to manage. This also has something to do with the background of our company, for example, Taobao originally did middleware for Taobao's core business services, it only needs to serve one business. But NetEase is different. From the beginning, there are many different types of products within NetEase, such as NetEkoala and NetEase Music. One is a typical e-commerce scenario, and the other is a typical scenario of reading more and writing less. This requires the separation of different clusters.

DDB V4 architecture: QS mode

DDB V4 provides query servers, and this architecture is equivalent to deploying QueryServer components separately, packaging DBI, and then providing a standard MySQL protocol support for applications. At the same time, deploy LVS / HAPProxy+keepplived combination between Query Server and application to do load balancing.

As shown in the figure, an application initiates a request through LVS and QueryServer to the final database. This architecture forwards twice more than the original architecture, and the link may be a little long, but it solves the previous problems in DBI very well.

First, there is no binding relationship between the number of Query Server and the number of applications. Secondly, Query Server is deployed by the operation and maintenance staff of the platform, and we can track its version management and internal problem location.

In addition, we still use the same old set of cluster management, including management tools.

DDB V1-V4: summary

-DBI mode

Easy to deploy and save machines

Friendly to JAVA application

High cost of problem tracking and location

It takes up more CPU resources of applications.

The upgrade cycle of the version is long, and it is difficult to promote the new features.

-QS mode

Multilingual support, integration of arbitrary connection pooling and DAO

The version is easy to manage, and the cost of tracking and locating problems is greatly reduced.

Functions such as slow sentence and TopSQL facilitate SQL optimization.

Functions such as connection convergence and connection pool fuse improve availability

DDB v5:LBDriver

DDB V4 solves all the problems with the previous DBI model, but there are still some potential problems.

First of all, there is the problem of link length mentioned earlier. Secondly, there are some disadvantages in LVS load balancing through connections. First, in the case of a small number of active connections during the trough, it is impossible to guarantee that these active connections are evenly distributed to each Query Server. If cloud computing is used to deploy query servers, this imbalance can easily lead to a surge in CPU utilization. In addition, using LVS for load balancing, in the case of application-side cancel statement or query timeout, when creating a temporary connection and executing kill query instructions, the temporary connection that may be created falls on another QueryServer, so if there is no same connection id on this node, it is equivalent to that cancel or timeout is not implemented. If there is connection id, it is equivalent to losing a connection to the wrong kill.

Therefore, we provide a new balancing scheme for the application: a loadbalance driver component is developed between the connection pool and the driver on the application side, which wraps the JDBC driver, provides logical connections to the connection pool, and maps to the physical connections of each QueryServer. In the process of using connections in the connection pool, it will do load balancing according to the SQL request, which first solves the problem that LVS cannot balance according to the request. Second, in the case of a connection timeout, you can get the underlying real physical connection and use it to copy the temporary connection to execute kill instructions.

Finally, our lLBDriver does not incur migration costs to the application, because it is itself a JDBC driver, and when the application uses lLBDriver, it only needs to change the driver name and URL in the configuration.

DDB V5: go to master

This is the architecture of DDB V5. In addition to load balancing through LBDriver, we also removed the Master component. The first reason is to save costs. Second, QueryServer itself is a server that we control ourselves, so we can use it to do some management functions. The QueryServer itself is a hot backup to each other. After integrating part of the functions of master into QueryServer, we choose a leader as a management and control node among multiple QueryServer, which solves the problem of high availability of our management functions at the same time.

DDB V5: a tenant-oriented WEB management tool

DDB V5 also provides tenant-oriented WEB management tools. Because before the implementation of private cloud Pass services, the biggest feeling is that tenant-oriented management can greatly reduce the cost and cost of operation and maintenance personnel. In addition, we also provide WEB statistics, auditing, reporting and large screen services in this set of SQL management tools.

DDB V5:NDC service split

DDB V5 also split the service. We brought out the previous online data migration components of DDB to do a set of NDC services independently, the full name is Netease Data Canal, literally translated into NetEase data Canal system. It can not only do DDB online data migration, but also pull out the binlog of MySQL independently to do some complex ETL, so that our middleware product stack will be richer.

DDB V5: a solution for IDC

Finally, we provide an IDC-oriented solution in DDB V5. The biggest improvement made by DDB V5 is architectural simplicity. How do we generally evaluate a distributed system as a good distributed system? For example, Hadoop, it gives us the biggest feeling is complexity. But it will provide Standalone mode, through a machine and a few simple instructions can pull up a system, which is the so-called heavy and light, we DDB also hope to do this.

Our simplest QueryServer integrates sub-database and sub-table and related management functions, so that we only need to deploy a QueryServer and data node, the distributed database can be established, and has a complete set of DDB functions. The latter management tools and components are optional, and DDB can be used even without it.

DDB's WEB management tool initiates management operations to the QS management and control node through DDL, which can be regarded as a richer upper layer function. It is worth mentioning that this management function can be deployed across IaaS, and it can be deployed either physically or in the cloud in a set of clusters. We provide pluggable interfaces on which different IaaS layers can be adapted. The advantage is that we can use a unified interface to operate and use it on different physical or cloud platforms.

DDB V5: summary

-reduce various costs

Streamlined architecture, go to master, and reduce learning and deployment costs

Replace LVS with LBDriver to improve usability and reduce cost

QS is compatible with MySQL DDL, making a further step towards standardization.

Tenant-oriented platform solution to reduce deployment and operation and maintenance costs

Support for online data migration and synchronization through platform-based NDC

-from software packages to solutions

IDC-oriented solution to achieve remote synchronization through NDC

Cross-IaaS support, hybrid deployment of physical and cloud environments

Structured data Center-- DDB and NDC Planning and Prospect

This is our more classic multi-room database and cache architecture. There are generally application layer, cache layer and database layer in the stand-alone room. The traditional architecture is that the application layer is responsible for updating the cache after performing database operations. But if we migrate this model to a multi-computer room scenario, the cache of another computer room will be difficult to maintain. In addition, it can lead to data inconsistencies.

A better way is to pull out the binlog of each computer room through binlog, and the NDC data subscription function can perform cache elimination or update. In this scenario, we can think of NDC as a cross-system trigger for DDB. This is a more mature scheme in the industry, and it has been widely used in many Internet companies.

In this architecture, DDB and NDC can be regarded as an organic whole, and if we think of NDC as a trigger for DDB, it can be used as a system.

Architecture with NDC as the total stack

This is a more complex architecture. In addition to synchronizing data between different computer rooms, heterogeneous databases can also be migrated online through NDC in a computer room. At the same time, we can use NDC to do some data subscriptions in the computer room to meet the decoupling of some downstream applications.

There are still many points that can be excavated in this complex architecture. For example, to synchronize data between different systems, first of all, each system needs to cooperate with a management interface, and then the user rights between each system need to be connected, and the operation and maintenance staff will assign NDC tasks. But there will be many management barriers in the process, so let's make a simple summary of the requirements:

-potential demand for ODMP

There are barriers to management between data systems.

Lack of unified resource division and authentication methods

Permissions are not available between different systems, so manual synchronization is required.

There is no unified monitoring and deployment platform

The traditional work order process is easy to cause a waste of resources.

-ODMP (online data management platform)

Pluggable, extensible, easy to integrate, IDC-oriented data system management platform

Integrated tenant management, rights synchronization, resource pool management, alarm monitoring, automatic deployment

Data synchronization between heterogeneous systems and IDC based on NDC (core implementation)

This is what we want to achieve in the end. In fact, we have achieved these functions internally, but they are independent and different from each other. We hope to establish a unified standard for unified maintenance, which can greatly reduce the cost of operation and maintenance.

Two questions

Along the way, we summed up these two problems. One: why is it that when we do applications, the bigger the team, the higher the DEVOPS threshold?

With the development of our application, team growth and scale expansion, the business scenarios we are facing become more complex, and there are more and more technology selections that need to be done. Developers have been in this cycle for a long time, from research to testing, from deployment and development to final operation and maintenance. It often takes a long period for a mature system to enter a more formal operation and maintenance process.

Second: in this process to do middleware, what can be done to reduce costs?

First, automate as much as possible and reduce the work order process. Second, dig out the role of platform operation and maintenance as far as possible, that is, we developers, the developers themselves are DBVOPS, can be directly oriented to application development, so that the intervention of operation and maintenance personnel can be avoided. Fourth, pay attention to the series between various platforms and systems, which is also the essential problem of ODMP.

ODMP:hybrid console

Hybrid console is the equivalent of a command-line tool for DDB, and you can choose to access the nodes of QueryServer. We can make some prospects for this function: whether the system to be accessed can be selected before selecting the node, whether these operations can be done in the same interface, and whether the specific access to which system can be hidden. At present, many products in the industry are developing in this direction, and hybrid access may become the mainstream trend in the future.

Summary

-DDB decade of Architectural change

V1-V3: driver mode to meet the needs of JAVA applications

V4: agent mode, multilingual support, enhanced availability and operation and maintenance capabilities

V5: platform model, streamlining lvs and master, a solution for tenants and IDC

-ODMP outlook

Born out of the idea of platform by DDB and NDC

Core values: break through the barriers of heterogeneous online data systems, achieve unified deployment, monitoring, tenant authentication, access, and make it possible for DEVOPS in large teams

Core implementation: implementation of data bus between heterogeneous systems and IDC based on NDC

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.