Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The first choice to build a real-time data warehouse, cloud native data warehouse AnalyticDB for MySQL technology decryption

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Aliyun Analytical Database has launched a basic version, which greatly reduces the threshold for users to build a data warehouse. Highly compatible with MySQL, extremely low cost and high performance, small and medium-sized enterprises can easily build a set of real-time data warehouse to realize the online value of enterprise data.

AnalyticDB for MySQL's product line includes basic version (stand-alone version) and cluster version, the basic version provides services for a single node, and the minimalist architecture greatly reduces the cost of the basic version. The storage computing separation architecture, row-column mixing technology, lightweight index construction and distributed hybrid computing engine ensure the powerful analysis performance of the basic version. A set of real-time data warehouse can be built at an annual cost of less than 10,000, without the need to set up a special big data team to save millions of costs for enterprises.

1. Basic technical architecture

The following is the basic version of the framework, the overall composition of Coordinator and Worker, their respective responsibilities are described below.

1.1 Coordinator: front-end control node, responsibilities include

(1) MySQL protocol layer access and SQL parsing

(2) Authentication and authentication, provide a more perfect and detailed permission system model, whitelist and cluster-level RAM control, and audit and compliance record all SQL operations.

(3) Cluster management: member management, metadata, data consistency, routing synchronization, backup and recovery (data and log management)

(4) Asynchronous task management in background

(5) Affairs management

(6) Optimizer, execute plan generation

(7) Computational scheduling, responsible for task scheduling

1.2 Worker: storage and compute nodes, including

(1) calculation module

Distributed MPP+DAG hybrid computing engine and optimizer achieve higher complex computing power and mixed load management capability. Taking advantage of the flexible scheduling of resources on Ali cloud computing platform, the flexible scheduling of computing resources is realized. Computing Worker nodes can be pulled up separately to expand in minutes or even seconds according to business requirements to achieve the most efficient use of resources.

(2) Storage module

The enclosure is more lightweight and has the real-time write and read ability to carry larger throughput data. The write performance is about 50% higher than that of the previous version, and is visible in millisecond, which meets the real-time analysis needs of customers.

Storage nodes provide full and incremental backup and recovery capabilities, and regular snapshots and logs of cloud disks are synchronized to the OSS in real time, providing higher security for user data and helping users maximize recovery in case of database problems.

(3) Worker Group

Worker nodes with storage modules are divided into node groups (Worker Group). The cluster version provides three copies of storage and works as a whole through the Raft distributed consistency protocol, allowing some of these Worker nodes to continue to provide services in the event of failure. In the basic version, only a single copy provides services.

two。 Basic optimizer

The optimizer is responsible for processing the syntax tree generated by Parser and providing the best cost plan to the computing engine through the optimization algorithm. The planning cost directly affects the query performance, so the optimizer is one of the core modules in the database. The basic version uses an optimizer as powerful as the cluster version, and includes a variety of rule-based, cost-based, and pattern-based composite optimization techniques.

Complex analytical queries often contain multi-table join, and the join order of tables directly affects the query performance. The AnalyticDB optimizer uses a join order optimization algorithm based on cost estimation and real-time sampling information, which can sense the data distribution of the underlying storage. The optimizer uses the AnalyticDB full index feature to improve the accuracy of filter factor (filter factor) estimation. For complex join, the optimizer dynamically adjusts join order based on data distribution information and evaluates the cost of data reshuffling to select the optimal execution plan from the dimension of global cost.

The AnalyticDB optimizer adds cost estimation and iterative optimization to the classical rule-based optimizer (Rule-Based Optimizer), and integrates the Cascades CBO (Cost-Based Optimizer) optimization framework. The CBO search framework will call the Property Enforcement module to generate the distributed execution plan, and then call the cost estimation module to evaluate the cost for each candidate plan and select the best distributed execution plan. In order to further improve the optimization effect and efficiency of join order, the AnalyticDB optimizer also adopts optimization technology based on historical information (History-Based Optimizer), dynamic optimization technology based on common SQL patterns (Pattern-Based Optimizer), and data-driven intelligent technology, such as Auto Analyze module to automatically collect statistical information to provide accurate data support for the optimizer to search for the best plan.

In addition, the AnalyticDB optimizer also carries out a series of optimization processes to improve the performance of complex queries, such as combinatorial filtering conditions, aggregation operators, related subqueries and so on. For example, the push-down optimization technology pushes the planned filter conditions and aggregation operators to the bottom module of the whole link as far as possible, which not only improves the efficiency of the bottom operator, but also reduces the amount of data to be processed by the upstream operator, and improves the overall query performance. For the related subquery statement, the optimizer rewrites the related subquery into a semantically equivalent non-associative plan through relational algebra transformation, so that the computing engine can process pipelined efficiently.

3. Basic computing engine

AnalyticDB computing engine adopts massively parallel processing MPP+DAG architecture and memory-based pipeline execution mode, which has the characteristics of high concurrency and low latency. In order to speed up the evaluation of complex expressions and optimize the execution performance, the computing engine generates JVM bytecode at run time through Runtime Codegen, and dynamically loads instances of generated objects, which reduces virtual function calls in the process of execution and improves the efficiency of CPU-Intensive tasks. The computation engine also uses the vectorization execution model to process the expression evaluation and uses the CPU SIMD instruction set to speed up the evaluation.

4. Basic storage engine

The AnalyticDB storage engine adopts the design of row-column hybrid storage. As shown in the figure. For each k rows of data (Row Group) of a table, each column of data is continuously stored in a separate Data Block, and the column of each row group Block is continuously stored on disk. The data of the column Block in the row group can be sorted and stored by the specified column, and the number of disk random IO can be significantly reduced when querying by that column. The unique advantage of this design is that it has the advantages of both row storage (suitable for OLTP point query) and column storage (suitable for OLAP multidimensional analysis), and well meets the needs of different types of workload:

The point query of OLTP type requires select to produce a whole row of detail data. Under the design of column-column mixed storage, the completely random reading of column storage is transformed into sequential reading multidimensional analysis of OLAP type: it not only solves the read magnification problem of downlink storage of statistical analysis of massive data, but also converts the sequential reading of column storage into sequential skip reading when performing single column IO. When performing multi-column IO, random reads are converted to sequential read-write large throughput: random writes during column storage are converted to sequential writes.

The AnalyticDB storage engine uses an intelligent full index to build an inverted index from value to row number on each column of data. When querying, the AND and OR of multiple conditional expressions of SQL are converted into Boolean Query indexes at the same time, and the row numbers of the result sets that meet the where conditions are obtained by searching, which supports fast multiplexing, and can find the result sets that meet the conditions at the millisecond level.

5. Advantages of basic Edition

The basic version greatly reduces the threshold for users to build a data warehouse, and has a high performance-to-price ratio compared with big data (Hadoop,Spark and EMR) and OLTP.

(1) lowering the threshold for use

The minimum price of the basic version is 1.75RMB / hour and 860CNY / month. Compared with the cluster version, the starting price is reduced by about 1/3. The disk space is only 0.6 yuan / GB, and the maximum disk space is 4T, which can be expanded on demand at any time, greatly reducing the threshold for complex analysis and construction of real-time data warehouses for small and medium-sized enterprises.

(2) High performance

Under the same configuration, its data query performance is about 10 times that of MySQL, which helps users to solve the pain point of slow MySQL complex analysis.

(3) rich in specifications

The basic version supports four specifications: T8, T16, T32 and T52, which can be selected and adjusted according to the different requirements of the business.

(4) Ecological transparency

The upstream and downstream ecology is fully compatible with the cluster version and is transparent to users.

6. Suitable for customers

It is especially suitable for the following people:

(1) small and medium-sized enterprises that are too complex and want to quickly achieve digital transformation, such as Hadoop/Spark

(2) small and medium-sized enterprises with slow query of report database and interactive BI analysis

(3) users who need to quickly build a test environment for warehouse selection

(4) Learning people, who can quickly understand the users of AnalyticDB for MySQL.

Learn more

Watch LVB: https://developer.aliyun.com/live/2528

Product details: https://promotion.aliyun.com/ntms/act/adbformysqljichuban.html

Store SQL distributed computing relational database MySQL OLAP OLTP scheduling database index

Copyright notice: all content in this article belongs to Aliyun developer Community, and no media, website or individual may reprint, link, repost or otherwise reproduce / publish without the authorization of Aliyun developer Community Agreement. To apply for authorization, please email developerteam@list.alibaba-inc.com. Media and websites authorized by Aliyun developer Community Agreement must indicate "source of manuscript: Aliyun developer Community, name of original author" when reprinting. Violators will be held responsible in accordance with the law. If you find any suspected plagiarism content in this community, you are welcome to send an email to: developer2020@service.aliyun.com to report and provide relevant evidence. Once verified, the community will immediately delete the suspected infringing content.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report