Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to parse the fragments of Map input in MapReduce

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to parse the fragments of Map input in MapReduce, I believe many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Review:

In a telephone interview, the other party mentioned: the slicing process in the Map phase. Due to a little tension in the communication, the other party asked: can the algorithm process of Map slicing be described simply? This problem, because the other party chose the word of the Map algorithm, the train of thought was induced for a time, and only shuffle, hash partition, and boolean filter came to mind.

In fact, an input shard, called "split" in MapReduce, is a block processed by a single Map, and each Map operation only processes one input shard. Each slice is divided into a number of records, and each record is a key-value pair. Map processes the data one by one. In fact, this is not much different from Storm if it is handled only in terms of whether the data is one piece of data or not. The Split here is also the segment in the usual data processing, and an input fragment can correspond to several rows on the same table. And a piece of data for a row.

The input fragment is packaged as a java interface

Public interface InputSplit extends Writable {long getLength () throws IOException; String [] getLocation () throws IOException}

Usually you don't have to deal with this shard yourself. Shards are created by InputFormat, and InputFormat is responsible for generating input shards and dividing them into records.

Another key point is RecordReader. RecorderReader is what we call a record iterator. The Map task uses a record iterator to produce key value queues.

And InputFormat is our real usage class.

The Java code is as follows:

Public interface IntputFormat {InputSplit [] getSplit (JobConf,int numSplits) throws IOException; RecordReader getRecordReader (InputSplit split,JobConf conf,Reporter reporter) throws IOExcetion}

In the first method, you can specify NumSplits, which is invalid in many cases.

The second way, you use a getRecordReader to get RecordReader.

How you feel about the interview: in many cases, technology-driven companies often don't have a product mind. Not to mention the logic of marketization.

A component that is about to be eliminated has become the focus of the interview.

After reading the above, do you know how to parse the fragments of Map input in MapReduce? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report