Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Li Feifei, head of Ali database: the next generation of enterprise database system

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Abstract: nowadays, all kinds of innovative technologies of databases are being tested in "no man's land". Whether these technologies can take root and exist for a long time in enterprises is a major challenge for many enterprises. At the 2019 data Technology Carnival, Mr. Li Feifei, vice president and senior researcher of Alibaba Group and head of Aliyun Intelligent Database Division, shared the next generation of enterprise database system with you.

The following content is sorted out according to Mr. Li Feifei's speech video and PPT.

Follow the official account "data and Cloud", reply: 2019dtc, more wonderful ppt waiting for you to download! (PPT is still being updated, please stay tuned.)

Live speech by Mr. Li Feifei at 2019 data Technology Carnival

Database: a key link of cloud application

Nowadays, Shangyun has become a trend. In the process of going to the cloud, the database is considered to be a very important part of the cloud. Because cloud initially provides IaaS, and with the rise of a variety of intelligent applications, database has become an important part of the connection from IaaS to intelligent applications.

Ali Cloud Database China Cloud Database Leader

At present, Aliyun is the number one cloud database service provider in China and the entire Asia-Pacific region, and the third in the world. Aliyun's core products cover the above forms. OLTP database includes POLARDB, the next generation data warehouse for real-time analysis, OLAP--AnalyticDB. At the same time, Alibaba also has the Dharma Database Lab, which is mainly responsible for some cutting-edge exploration in the field of database, such as full-link encrypted database and intelligent database. Aliyun provides a rich range of cloud database products, from public cloud to proprietary cloud, to the database all-in-one machine just launched this year, which can provide strong support for various forms of database products and systems.

So, what should the technology and products of future-oriented enterprise databases look like? Let's take a look.

Database system evolution

The evolution of database system has experienced from the earliest relational database OLTP to semi-structured database, and then to unstructured database such as analytical database OLAP, and then to today's multi-mode database.

Mutil-Model multimode database system

Nowadays, database technology is facing many challenges, one of which is Mutil-Model multimode database system, which has many similar products in the industry. For multimode, it can be divided into south, north and south on behalf of a variety of storage, and hope to use a unified query language to query data from different data sources, which is actually the concept of data lake. The other is northward multi-mode, data storage generally has only one mode, such as KV, but wants to provide a variety of query interfaces, such as image, document. The north-oriented multimode and south-oriented multimode mentioned above are two different forms of multimode.

Database intelligence + automatic management and control platform

Today, with the development of database technology, we can think of it as a car, it has a lot of parameters, it will run to different roads in the process of running, and there are many other different cars around. How to coordinate with each other and how the advantages of the engine are reflected are actually very similar to self-driving cars, so they are called "Self-Driving Database Platform". On top of this, Ali hopes to achieve self-awareness, self-decision, self-recovery and self-optimization, which is also the basic definition of the next generation of intelligent database.

New hardware: integrated hardware and software design

In the future, the next generation of enterprise databases must be combined with the design concept of the integration of software and hardware, rather than separating the software and hardware. Only by combining software and hardware, can the advantages of the system be brought into full play. The software and hardware must be combined in order to give full play to the advantages of the database system. For example, the protection mechanism of NVM to the traditional database will bring great impact and change, and it will also cause great changes to the use and management of memory. RDMA is another example, which makes database access to remote node data very fast, which becomes the basis of storage and computing separation technology.

Cloud native architecture: resiliency x high availability x enterprise practice x open ecology

Today, there are a variety of databases in the industry, which can be roughly divided into three categories. The first category is the leftmost single-node database, where the DB box represents CPU+ memory and can be regarded as computing nodes. For single-node databases, computing nodes and storage nodes are tightly coupled together. The traditional single-node MySQL, PG and commercial databases such as Oracle and SQL Server are all based on this architecture. The advantages of single-node architecture are simple development and easy deployment, while the disadvantages are poor scalability and high availability. In the rightmost distributed architecture, the data is sliced and stored on different nodes, and there are many single-node architectures hanging at the bottom. The distributed architecture is characterized by strong horizontal scalability. When the amount of data becomes large and the concurrency becomes high, only additional nodes are needed. The disadvantage is that if the upper business logic is not changed, it must be able to handle distributed transactions and distributed queries.

The logic of cloud native architecture: the resources on the cloud are "inexhaustible". As long as they are willing to pay, as long as customers need them, they can theoretically expand their capacity indefinitely. The biggest demand on the cloud is to have very good flexibility, when resources are needed, they can be inexhaustible; when they are not needed, they can be released. The flexibility here is like the tap pipe in everyone's home. Turn on the faucet when you need it and turn it off when you don't need it. The traditional offline database is used more like a reservoir, in which the allocated resources such as servers are fixed, and when the water level is estimated, you only need to store enough water. But in the cloud, we hope to provide users with a flexible way of use, through the separation of storage and computing, the distributed storage nodes are connected through the network, so that accessing remote nodes is as fast as accessing local nodes, making users unaware.

Next Generation Enterprise Database: cloud Native + distributed

The next generation of enterprise database architecture should be a combination of cloud native architecture and distributed architecture. Each Shard below is a cloud native database, with storage and computing separated, so each Shard is very resilient and has a large amount of concurrency. Therefore, for the same demand, the number of Shard required will be greatly reduced. Aliyun's POLARDB distributed database perfectly combines cloud native capabilities with distributed capabilities.

Ali Cloud Database Technology and products are a complete ecosystem

Ali Cloud Database not only provides services on the cloud, but also supports all activities of the economy within the Alibaba Group. Double 11, 2018, in the first second just after zero, Ali's database system peak increased by about 122 times, which immediately broke out, which requires the database to have high scalability, flexibility and high availability.

Open ecology and support the open source community

Alibaba has also done a lot of work in the open source community, such as the work done in the MySQL community has also been highly recognized by the community, but also won various community awards. In addition, a lot of work has been done on PostgreSQL, such as optimizing instead of traditional statistical query, which can support OLTP and OLAP to complete mixed load tasks together.

Cloud native database: POLARDB

Next, we will focus on sharing the concepts and breakthroughs of Alibaba's self-developed database core technology, that is, how to combine the cloud native and distributed architecture mentioned above organically and perfectly. First of all, share Ali self-developed POLARDB database, the upper layer of its architecture has many computing nodes, and there is a distributed shared storage under the computing node, which is connected through the RDMA network. Currently, the POLARDB version of public cloud online can achieve 16 computing nodes, and it can be written and read many times in POLARDB version 1.0.There are many cases not only in China, but also in Southeast Asia and other countries, and customers have found through a large number of tests and attempts that the performance and stability of Aliyun POLARDB are very excellent, which shows that Aliyun database is very reliable. For databases, especially OLTP systems, customers tend to bet their lives here. Therefore, the architecture of POLARDB is very competitive. Currently, one POLARDB instance can achieve the storage capacity of 100TB, and a single node can reach 1 million QPS.

POLARDB architecture details

Specifically, high availability is achieved in shared storage, data is divided into blocks to form Data Chunk, each data block will be backed up in three copies, and high availability is achieved in distributed shared storage through Parallel Raft protocol, so that users do not have to worry about data loss. In addition, multi-write, multi-read and cross-zone high availability will be achieved in POLARDB 2.0. There is a Smart Proxy in front of POLARDB, and this Smart Proxy is responsible for load balancing, read-write distribution and so on.

POLARDB parallel queries: 27x performance improvement

In addition to architectural optimizations, POLARDB also makes a lot of optimizations in the database kernel. One of them is a parallel query. As we all know, traditional databases such as Oracle, MySQL and so on are all single-thread queries. Today, almost all CPU are multi-core. If you want to give full play to the capabilities of these hardware, you need the database engine to have a good parallel query ability. Therefore, the Ali Cloud database team has done a lot of work around this point, rewriting the SQL parser, optimizer and execution engine of the database to support multi-threaded parallel execution. It is very beneficial to improve the speed of queries such as Group By, which is typical in the database, with an average performance improvement of 27 times.

Take the cloud database home: POLARDB Box, a high-performance all-in-one machine

At the end of September this year, the POLARDB Box high-performance all-in-one machine was officially released. There have been 10 actual cases of landing in October, two of which have been officially signed. POLARDB Box high-performance all-in-one machine has the following characteristics: it is highly compatible with Oracle. Although there is no way to say it is 100% compatible with Oracle, it can be called highly compatible. Aliyun itself has optimized more than 1000 Oracle compatibility items. A box can support more than 1000 Vitual CPU,9TB memory as well as 120TB flash memory. POLARDB Box high-performance all-in-one machine also has the ability of parallel query optimization, at the same time, it also has a very powerful spatiotemporal data query engine.

In addition, a complete set of ecology formed by Ali cloud database migration tools ADAM, DTS and AnalyticDB makes it very easy for customers to bring cloud databases home. What is the difference between the POLARDB Box all-in-one and the traditional all-in-one? The core part is that the management and control platform of POLARDB Box high-performance all-in-one machine is the management and control platform of Aliyun public cloud, that is to say, the management and control of POLARDB Box high-performance all-in-one machine is connected with public cloud. If users have not decided whether to go to the cloud, then they can use the all-in-one machine to enjoy the performance of cloud database in IDC. Later, the cloud will become a seamless process, and even hybrid cloud mode can be used. Part of the data can be put on the cloud. Part of the data is not on the cloud, because the control on and under the cloud is connected, and it is a seamless management experience for users.

POLARDB-X: distributed version supports horizontal scaling

POLARDB-X distributed version is a perfect combination of distributed and cloud native capabilities. For example, in the double 11 scenario of Alibaba Group, we must do sub-database sub-table, at this time, only calculation separation and flexibility are not enough, we must do sub-database sub-table, otherwise it will not be able to support the instantaneous 122x peak flow. Aliyun POLARDB-X adds cloud native capabilities on the basis of sub-database and sub-table, which can reduce the probability of cross-database query.

Intelligent OLAP:AnalyticDB real-time interactive data warehouse

In addition to POLARDB, Aliyun also has a real-time interactive intelligent data warehouse-AnalyticDB in intelligent OLAP, which can support massive data processing and analysis, vector analysis and so on.

There is a large amount of unstructured data in the field of data analysis, but now 80% of the data generated every day are unstructured data, such as photos, videos and so on.

So, how to analyze unstructured data seamlessly and interactively in a system? Ali Cloud Database team has done a vector processing engine, first of all, the unstructured data is vectorized and processed in high-dimensional vector space. Vector processing engine is implemented in AnalyticDB, so that structured data and unstructured data can be fused in vector space.

AI for DB-DAS: intelligent Database Control and Kernel

In addition to architectural and technological breakthroughs, in addition to cloud native and distributed, the other two core keywords of the next generation of enterprise databases should be intelligence and security. The so-called intelligence means intelligent management and control of the cloud kernel. This part has been landed inside Alibaba. Hundreds of thousands of database instances in Alibaba's entire network will have Agent to take real-time running status, real-time monitoring and early warning. Part of the work is to automatically resize the database Buffer, which has a significant impact on database performance and cost. Ali Cloud Database can save memory costs by more than 15% per day by automatically resizing the database Buffer without major changes in network performance.

Data security on the cloud

In addition, cloud data security is also very important. Traditional database security methods only ensure the security and encryption of data in the process of transmission, but when the data really enters the database kernel for processing, it still needs to be decrypted, then there will be the risk of data leakage.

To solve this problem, Ali Cloud Database team implemented full-link encryption of the database. Full-link encryption database still does not need to be decrypted after the data enters the database kernel, and the database encryption key is provided by the customer and does not need to be provided to the cloud vendor. In the whole process, even as Root administrators, cloud vendors have no way to see users' data.

Data transfer Cloud Service DTS

The first step in data cloud must be to find a very good Shangyun highway. The Alibaba Cloud database team has implemented the data transfer service DTS, which can migrate different source databases to target databases. The core technical challenge involved here is how to achieve user data migration between different source and target ends in complex networks and deployment environments. Data transfer service DTS is one of the more influential products implemented by Ali Cloud Database.

Embrace open ecology and work with customers and developers to grow together

The following figure shows the layout of the entire Ali cloud database. Aliyun hopes to embrace the open ecology and grow together with customers and developers. Aliyun hopes to build a database ecology with Chinese characteristics based on its two core database development products POLARDB and AnalyticDB, as well as the aforementioned data transfer tool DTS. All database systems do not have to be open source, but they must embrace an open standard ecology to avoid moving from one closed system to another.

Editor reminded: there are two ways to obtain the "2019 data Technology Carnival PPT":

1. Reply in the official account of "data and Cloud": 2019dtc, and you can download it!

two。 In "Mo Tianlun", we have sorted out all the open PPT according to 13 venues. You can download the topics you are interested in. Details: https://www.modb.pro/db/11553, copy it to the web page and open it.

PPT is still constantly updated and improved, please continue to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report