Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the optimization of iterative query in MapReduce

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you what the optimization of iterative query in MapReduce is like. The content is concise and easy to understand, which can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Optimization of iterative query in MapReduce

Summary:

The term OptIQ is proposed: a query optimization method for iterating queries in a distributed environment. (fully automated)

The method used is view materialization and incremental view evaluation.

Materialized view and incremental view evaluation

Function: reduces repeated calculations in different iterations

1. INTRODUCTION

Several new technologies:

Spark Haloop REX query optimization is not automated or framed, requiring programmers to point out which data needs to be reused and manually specify how that data is stored.

OptIQ: an overall framework is proposed to identify repeated computations in iterative queries, applying materialized views and incremental attempts to evaluate programs in traditional database fields and program analysis and transformation in compiler fields.

Process: 1. Divide the iterative query into variable and immutable views, and the immutable views will be used in the next iteration.

2. Incrementally change the view by skipping evaluating those convergent tuples.

2. Define SQL statements for iterative queries

It includes three parts.

Local table saves the data in the current iteration and stores the let statement on the local disk

Global table saves the data from the previous iteration in a distributed file system. Set statement

All tuples in the new table (update table) are compared to determine whether they converge.

R and S are input tables, schema (R) represents the attributes of the R table, and T (list) indicates that there is a list attribute in the T table, indicating a propositional formula.

Projection operation (projection) A special set of attributes in the projection input table

Select operation (selection) to select tuples that satisfy the requirements in the input table

The join operation (join) extracts the tuple where the cross product of two input tables satisfies ^ 2.

Group-by operations reassemble tuples and compute aggregate functions

PageRank:

Three tables, and the query statement defined is as follows

Src current node Dest destination node Score is equivalent to PR value count indicates the outgoing degree of the node

K-means:

Two tables

Point data point, the central point of Centro aggregation

Defined statement

3. Query optimization:

View materialization and incremental view evaluation. (materialized view and incremental view evaluation)

Materialized views reuse the results of unmodified attribute subqueries

Incremental view evaluation reuses the results of unmodified tuples

In order to materialize the view-table decomposition

Decompose the table into changing and immutable views, and reuse immutable views.

For automatic increment-increment scale (delta table)

Reduce the number of tuples according to the convergence condition.

OptIQ overview diagram

How to materialize a view

1. Decompose update table into changeable and immutable views, rewrite iterative query statements, and represent update table with changed views (changeable and immutable views have the same view, which can finally be used for join operations)

2. Immutable views in the materialized query process, and invariant views that are important to rewrite and simplify the iterative process.

Such as PageRank

Decompose Graph (src,dest,score) into VI (src,score) and IT (src,dest)

Promotion of subqueries (further optimization based on the above) such as the use of decomposed tables to form another materialized table IT _ count

IT_Count = select IT.src,IT.dest,Count.count

From IT, Count

Where IT.src = Count.src.

VT table and score table can be replaced with each other

Loop invariant code motion (loop invariant)

The final optimized statement of the materialized view

Automatic incrementalization

1. Follow the new operation Update operations

Update operations are performed more frequently than Insert and delete operations

2. Test the incremental scale Detecting delta tables

3. Get the incremental query Deriving incremental queries

A more regular statement at the beginning

T is update table,q (T) is equivalent to query statement, φ (Δ T) is the convergence condition.

Set T = Q (T ⊕ Δ T)

Suppose: Q (T ⊕ Δ T) = Q (T) ⊗ Q (Δ T).

Dscore is an increase in the score table.

The performance can be greatly improved by studying the incremental calculation in the aggregation function.

Sum function

Count function and sum function have the same distribution law, average function can be decomposed into count function and sum function.

Max and min functions

The statement after adding incrementalization:

Experiment

Using OptIQ on Hadoop and spark

PageRank

Reduced reaction time and number of iterations

K-means

View does not increase efficiency, and disk reads and writes increase during optimization.

Materialized view: materialized view (Meterialized View) provides a powerful function that can be used for pre-calculation and save the results of more time-consuming operations such as table join or table aggregation, so that when executing the query, you can avoid these time-consuming operations and get the results quickly.

Space for time

How to ensure the IO overhead, that is, whether the time spent in exchange for space can offset the IO overhead caused by reading the disk.

The above is what the optimization of iterative query in MapReduce is like. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report