In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you what the optimization of iterative query in MapReduce is like. The content is concise and easy to understand, which can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Optimization of iterative query in MapReduce
Summary:
The term OptIQ is proposed: a query optimization method for iterating queries in a distributed environment. (fully automated)
The method used is view materialization and incremental view evaluation.
Materialized view and incremental view evaluation
Function: reduces repeated calculations in different iterations
1. INTRODUCTION
Several new technologies:
Spark Haloop REX query optimization is not automated or framed, requiring programmers to point out which data needs to be reused and manually specify how that data is stored.
OptIQ: an overall framework is proposed to identify repeated computations in iterative queries, applying materialized views and incremental attempts to evaluate programs in traditional database fields and program analysis and transformation in compiler fields.
Process: 1. Divide the iterative query into variable and immutable views, and the immutable views will be used in the next iteration.
2. Incrementally change the view by skipping evaluating those convergent tuples.
2. Define SQL statements for iterative queries
It includes three parts.
Local table saves the data in the current iteration and stores the let statement on the local disk
Global table saves the data from the previous iteration in a distributed file system. Set statement
All tuples in the new table (update table) are compared to determine whether they converge.
R and S are input tables, schema (R) represents the attributes of the R table, and T (list) indicates that there is a list attribute in the T table, indicating a propositional formula.
Projection operation (projection) A special set of attributes in the projection input table
Select operation (selection) to select tuples that satisfy the requirements in the input table
The join operation (join) extracts the tuple where the cross product of two input tables satisfies ^ 2.
Group-by operations reassemble tuples and compute aggregate functions
PageRank:
Three tables, and the query statement defined is as follows
Src current node Dest destination node Score is equivalent to PR value count indicates the outgoing degree of the node
K-means:
Two tables
Point data point, the central point of Centro aggregation
Defined statement
3. Query optimization:
View materialization and incremental view evaluation. (materialized view and incremental view evaluation)
Materialized views reuse the results of unmodified attribute subqueries
Incremental view evaluation reuses the results of unmodified tuples
In order to materialize the view-table decomposition
Decompose the table into changing and immutable views, and reuse immutable views.
For automatic increment-increment scale (delta table)
Reduce the number of tuples according to the convergence condition.
OptIQ overview diagram
How to materialize a view
1. Decompose update table into changeable and immutable views, rewrite iterative query statements, and represent update table with changed views (changeable and immutable views have the same view, which can finally be used for join operations)
2. Immutable views in the materialized query process, and invariant views that are important to rewrite and simplify the iterative process.
Such as PageRank
Decompose Graph (src,dest,score) into VI (src,score) and IT (src,dest)
Promotion of subqueries (further optimization based on the above) such as the use of decomposed tables to form another materialized table IT _ count
IT_Count = select IT.src,IT.dest,Count.count
From IT, Count
Where IT.src = Count.src.
VT table and score table can be replaced with each other
Loop invariant code motion (loop invariant)
The final optimized statement of the materialized view
Automatic incrementalization
1. Follow the new operation Update operations
Update operations are performed more frequently than Insert and delete operations
2. Test the incremental scale Detecting delta tables
3. Get the incremental query Deriving incremental queries
A more regular statement at the beginning
T is update table,q (T) is equivalent to query statement, φ (Δ T) is the convergence condition.
Set T = Q (T ⊕ Δ T)
Suppose: Q (T ⊕ Δ T) = Q (T) ⊗ Q (Δ T).
Dscore is an increase in the score table.
The performance can be greatly improved by studying the incremental calculation in the aggregation function.
Sum function
Count function and sum function have the same distribution law, average function can be decomposed into count function and sum function.
Max and min functions
The statement after adding incrementalization:
Experiment
Using OptIQ on Hadoop and spark
PageRank
Reduced reaction time and number of iterations
K-means
View does not increase efficiency, and disk reads and writes increase during optimization.
Materialized view: materialized view (Meterialized View) provides a powerful function that can be used for pre-calculation and save the results of more time-consuming operations such as table join or table aggregation, so that when executing the query, you can avoid these time-consuming operations and get the results quickly.
Space for time
How to ensure the IO overhead, that is, whether the time spent in exchange for space can offset the IO overhead caused by reading the disk.
The above is what the optimization of iterative query in MapReduce is like. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.