Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to read getSplits by FileInputFormat

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "FileInputFormat how to read getSplits", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "FileInputFormat how to read getSplits" this article.

/ * *

* Generate the list of files and make them into FileSplits.

* @ param job the job context

* @ throws IOException

, /

Public List getSplits (JobContext job) throws IOException {

Stopwatch sw = new Stopwatch () .start ()

/ / get a minimum value that an InputSplit can contain

Long minSize = Math.max (getFormatMinSplitSize (), getMinSplitSize (job))

/ / get the maximum value that an InputSplit can contain

Long maxSize = getMaxSplitSize (job)

/ / generate splits

List splits = new ArrayList ()

List files = listStatus (job)

/ *

* it can be seen that if there are a million small files, it will be looped a million times and at least a million InputSplit will be generated, which will contain at least a million map tasks

* if the default size of an InputSplit is a block size, that is, 64m

* A 20m file will generate an InputSplit, a Map task

* an 80m file will generate two InputSplit and two Map tasks

* two 20m files generate a total of two InputSplit and two Map tasks

* A 20m and 70m file will generate a total of three InputSplit and three Map tasks

, /

For (FileStatus file: files) {

Path path = file.getPath ()

Long length = file.getLen ()

If (length! = 0) {

BlockLocation [] blkLocations

If (file instanceof LocatedFileStatus) {

BlkLocations = (LocatedFileStatus) file) .getBlockLocations ()

} else {

FileSystem fs = path.getFileSystem (job.getConfiguration ())

BlkLocations = fs.getFileBlockLocations (file, 0, length)

} if (isSplitable (job, path)) {

/ / get the default block block size of hdfs

Long blockSize = file.getBlockSize ()

/ / calculate the size of an InputSplit

Long splitSize = computeSplitSize (blockSize, minSize, maxSize)

Long bytesRemaining = length

While ((double) bytesRemaining) / splitSize > SPLIT_SLOP) {

Int blkIndex = getBlockIndex (blkLocations, length-bytesRemaining)

Splits.add (makeSplit (path, length-bytesRemaining, splitSize

BlkLocations [blkIndex] .getHosts ()

BlkLocations [blkIndex]. GetCachedHosts ())

BytesRemaining-= splitSize

}

If (bytesRemaining! = 0) {

Int blkIndex = getBlockIndex (blkLocations, length-bytesRemaining)

Splits.add (makeSplit (path, length-bytesRemaining, bytesRemaining

BlkLocations [blkIndex] .getHosts ()

BlkLocations [blkIndex]. GetCachedHosts ())

}

} else {/ / not splitable

Splits.add (makeSplit (path, 0, length, blkLocations [0] .getHosts ()

BlkLocations [0] .getCachedHosts ())

}

} else {

/ / Create empty hosts array for zero length files

Splits.add (makeSplit (path, 0, length, new String [0]))

}

}

/ / Save the number of input files for metrics/loadgen

Job.getConfiguration () .setLong (NUM_INPUT_FILES, files.size ()

Sw.stop ()

If (LOG.isDebugEnabled ()) {

LOG.debug ("Total # of splits generated by getSplits:" + splits.size () + ", TimeTaken:" + sw.elapsedMillis ())

}

Return splits

}

The above is all the content of the article "how to read getSplits by FileInputFormat". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report