Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common tools in Hadoop development

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "what are the commonly used tools in Hadoop development", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn what are the common tools in Hadoop development "this article.

Hadoop concept

A distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. In a nutshell, Hadoop is a more

It is easy to develop and run software platforms that deal with large-scale data. Hadoop implements a distributed file system (HadoopDistributedFileSystem), referred to as HDFS. HDFS has the characteristics of high fault tolerance (fault-tolerent) and is designed to use

To deploy on cheap (low-cost) hardware. And it provides high transfer rate (highthroughput) to access the application's data, which is suitable for applications with very large data sets (largedataset). HDFS relaxed (relax) POSIX

(requirements) so that you can stream access (streamingaccess) data in the file system. Let's start by introducing the tools InputFormat and OutputFormat that are commonly used in Hadoop development.

InputFormat and OutputFormat

The MapReduce framework in Hadoop relies on InputFormat to provide data and OutputFormat to output data; every MapReduce program can't do without them.

Hadoop provides a series of InputFormat and OutputFormat to facilitate development, this article introduces several commonly used. TextInputFormat is used to read plain text files, which are divided into a series of lines ending with LF or CR, and key is the location of each line

(offset, LongWritable type), value is the content of each line, Text type. KeyValueTextInputFormat is also used to read files, if the line is split into two parts by a delimiter (the default is tab), the * * part is key, and the rest is

If the value; does not have a delimiter, the entire line is used to read the sequencefile as an empty SequenceFileInputFormat as key,value. Sequencefile is an binary file that Hadoop uses to store data in a custom format. It has two subclasses:

SequenceFileAsBinaryInputFormat, read key and value as BytesWritable; SequenceFileAsTextInputFormat, read key and value as Text. SequenceFileInputFilter according to filter from

Part of the qualified data is obtained in the sequence file, Filter is specified by setFilterClass, and there are three kinds of records in which the Filter,RegexFilter takes the key value to meet the specified regular expression; PercentFilter takes the number of rows of records by specifying the parameter f

The record of MD5 (key)% fallow 0 is taken by MD5Filter by specifying the parameter f. When NLineInputFormat0.18.x is added, you can split the file in behavior units, such as a map for each line of the file. The key you get is every line.

Location (offset, LongWritable type), value is the content of each line, Text type. CompositeInputFormat, join for multiple data sources. TextOutputFormat, output to a plain text file in the format key+ "" + value.

/ dev/null in NullOutputFormat,hadoop, which sends the output to the black hole.

SequenceFileOutputFormat, output to sequencefile format file. MultipleSequenceFileOutputFormat,MultipleTextOutputFormat, output records to different files according to key. DBInputFormat and

DBOutputFormat, read from DB and output to DB, is expected to be added in version 0.19.

These are all the contents of the article "what are the common tools in Hadoop development?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report