In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Complex SQL query can not run and DRDS read-only solution, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Background
In the actual business production environment, after the business application system stores the data using OLTP database, there will be complex SQL query demands in scenarios such as background operation system for statistical report analysis and so on.
In order to meet the requirement of fast response to this kind of complex SQL query, based on the third generation distributed SQL engine, the DRDS team further introduced the self-developed MPP multi-computer parallel computing engine (Fireworks) and the corresponding optimization strategy, which greatly enhanced the complex query processing ability of DRDS.
The results of distributed multi-table Join, aggregation, sorting and subquery operations under tens of millions of data can be returned in seconds, which can greatly improve the response speed. It uses the same data (RDS read-only) for processing, eliminating the need for data synchronization to other data sources, reducing the overall link complexity of the business architecture, and saving business operation and maintenance and budget costs.
Cdn.com/bad94739b4a209f57ac0b01de9d57aeabf467d49.png ">
Main characteristics
Self-developed MPP multi-computer parallel computing engine Fireworks
The DRDS read-only instance is equipped with a SQL execution engine (Fireworks) with complete multi-machine parallel processing capabilities. It is significantly different from the SQL execution engine on the primary instance of DRDS.
The execution engine of the DRDS master instance adopts the stand-alone architecture, adopts the strategy of pushing the computing down to the underlying physical sub-libraries as far as possible, and relies on the computing power of the physical sub-libraries to realize the distributed computing of logical SQL.
The Fireworks engine on DRDS read-only instances is a cluster composed of multiple computing nodes, which converts a SQL query into a distributed computing task.
The limitation of physical library computing capacity greatly improves the computing speed of SQL for complex logic, and has a significant acceleration effect on Join, Aggregate and Sort computing.
Fireworks will break up the computing tasks such as Join, Aggregate and Sort through Shuffle and distribute them to multiple computing nodes in the computing cluster, and achieve the purpose of computing acceleration through parallel computing of multiple computing nodes.
An optimizer customized for multi-machine parallel execution mode
The original DRDS master instance optimizer mainly focuses on OLTP scenarios, and the core idea is to push all calculations down to the hanging physical library as far as possible. Its purpose is to make full use of the computing resources of the physical library, and at the same time avoid a large amount of data flow, so as to get a faster response speed.
When faced with the review query scenario under the order of big data, the overall performance will be limited by hanging the physical library, at the same time, it will also exert greater pressure on the physical library and affect the stability. Generally speaking, its OLAP capability has many limitations.
After the introduction of MPP multi-machine parallel computing engine Fireworks, DRDS itself has been greatly improved in computing power, and the overall optimization strategy of the optimizer has also been adjusted:
As far as possible, complex computing (such as Join, Aggregation, Sort) is brought up to perform engine computing, and computing acceleration and scalability are realized through Fireworks computing cluster.
Continue to push lightweight calculations (such as Project, Filter) down to the physical library to reduce the cost of data pull.
By optimizing the finest granularity of the execution plan, the DRDS distributed SQL optimizer can produce a multi-machine parallel execution engine-friendly execution plan and achieve better execution efficiency.
At the same time, a refined operator push-down strategy is provided, which pushes down the operators with less pressure on RDS to the physical library to achieve higher computational performance-price ratio, while protecting RDS from the influence of costly operators, so as to ensure the stability of online traffic.
Direct analysis based on online data
Emerging Internet services represented by new retail services continue to emerge, which not only have real-time OLTP requirements, but also accompanied by some complex quasi-real-time OLAP requirements to support real-time decision-making and other requirements.
At present, most of the solutions in data analysis scenarios need to export the production data of OLTP database to other data sources for offline analysis. This traditional solution is difficult to meet the needs of quasi-real-time, and there is a risk of data loss when the data is exported to offline systems.
DRDS read-only instances do not need to carry out tedious and tedious data synchronization tasks, and directly process complex data based on RDS read-only instances or RDS master instances to reduce the overall link complexity of the business architecture and save business operation and maintenance and budget costs.
DRDS read-only instance not only avoids data synchronization, but also ensures the timeliness of data processing, and can achieve the real-time performance of READ COMMITED (based on RDS master instance).
SQL compatibility with clear boundaries
The DRDS read-only instance is fully compatible with the SQL query syntax of the DRDS master instance, and is highly consistent with the SQL compatibility and SQL support boundaries of DRDS version 5.3.
Compared with similar products, it has the characteristics of high compatibility and clear supporting boundaries. Can provide almost the same experience as the primary instance of DRDS.
Complex SQL that cannot be executed or executes slowly on the DRDS primary instance can be migrated directly to a read-only instance for execution, eliminating the extra overhead of SQL rewriting.
Product experience is flexible and autonomous
DRDS read-only instances automatically synchronize the account permission information of DRDS master instances. Native VPC supports, internal and external networks can be enabled at the same time, and the data processing capacity is linearly improved according to business conditions.
Technical Architecture Overview
The overall architecture of DRDS read-only instance is basically the same as that of DRDS master instance, except for changes in the query layer, with the addition of MPP execution engine and corresponding optimizer, as shown below:
The DRDS protocol layer is responsible for handling the network interaction and the parsing of the MySQL protocol. After receiving the query request, the SQL will be transferred to the query layer for processing. The query layer is responsible for parsing the SQL and generating an optimized execution plan by the executor, which is then passed to the execution engine for query and calculation.
If you need to use the Fireworks engine to calculate, the query layer further converts the execution plan into a distributed execution plan and submits it to Fireworks Cluster as a distributed task after getting the execution plan. The remote Fireworks cluster completes the work of data query and subsequent calculation to the storage layer.
To put it simply, the DRDS read-only instance can be regarded as adding an execution link with the ability of multi-machine parallel processing on the basis of the original DRDS.
Applicable scenario
Generally speaking, DRDS read-only instances are suitable for handling complex queries of the order of magnitude of big data with low concurrency and high latency. Such as data analysis and report scenarios, the typical feature of this kind of scenario is that it contains large
The amount of correlation, aggregation and sorting operations and participate in the calculation of a large scale of data.
At present, DRDS read-only examples have landed a number of businesses within Ali Group, of which the most representative are the new retail scenes such as box horses and commercial brains. Correlation analysis is carried out around the dimensions of people, goods, market and warehouse, and several or even a dozen logical tables scattered in different logical libraries are associated and then aggregated and sorted to meet the business needs of inventory reconciliation, decision support and so on.
With the emergence of DRDS read-only instances, business developers no longer need to configure and maintain a large number of data synchronization links, and do not have to worry about poor timeliness or inaccuracy of results caused by data non-synchronization, which reduces the workload of business developers to a certain extent.
For users who are already using DRDS, DRDS read-only instances can solve the following two types of known problems:
In the process of using DRDS, you may find that some complex SQL involving Join, aggregation, and sorting need to perform secondary calculations in the DRDS execution engine because they cannot be pushed down completely, which cannot be performed because of the memory limitations of the stand-alone execution engine.
The complex computing part of SQL can be pushed down, but the large scale of data involved increases the pressure on the physical library and affects the OLTP business, or the response time is too slow to meet the requirements.
Summary
For a long time, due to the limitation of stand-alone architecture execution engine, DRDS has been unable to provide good support for complex queries based on the scale of big data, nor can it achieve linear expansion of its own local computing power by expanding physical resources.
The introduction of DRDS read-only examples completely makes up for the shortcomings of DRDS in OLAP scenarios, enabling DRDS to provide powerful OLTP capabilities as well as scalable OLAP capabilities, providing an one-stop solution for users with both OLTP requirements and medium-scale data analysis needs, bringing convenience to users.
In the next six months, DRDS read-only instances will release the associated query function across logic libraries, and continuously enhance the core competence of read-only instances through more technical means, which will have better performance in terms of parallelism, response time, data volume, interactive query and so on, which can meet the stringent requirements of enterprise applications on databases.
Currently, DRDS read-only instances and DRDS master instances share a 20% discount for limited time. Activity details https://promotion.aliyun.com/ntms/act/drdsreadonlydisc.html
Welcome to continue to follow DRDS (Aliyun distributed relational database service), product details
Https://www.aliyun.com/product/drds
This is the answer to the problem of complex SQL query and DRDS read-only solution. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.