In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "FileInputFormat how to read getSplits", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "FileInputFormat how to read getSplits" this article.
/ * *
* Generate the list of files and make them into FileSplits.
* @ param job the job context
* @ throws IOException
, /
Public List getSplits (JobContext job) throws IOException {
Stopwatch sw = new Stopwatch () .start ()
/ / get a minimum value that an InputSplit can contain
Long minSize = Math.max (getFormatMinSplitSize (), getMinSplitSize (job))
/ / get the maximum value that an InputSplit can contain
Long maxSize = getMaxSplitSize (job)
/ / generate splits
List splits = new ArrayList ()
List files = listStatus (job)
/ *
* it can be seen that if there are a million small files, it will be looped a million times and at least a million InputSplit will be generated, which will contain at least a million map tasks
* if the default size of an InputSplit is a block size, that is, 64m
* A 20m file will generate an InputSplit, a Map task
* an 80m file will generate two InputSplit and two Map tasks
* two 20m files generate a total of two InputSplit and two Map tasks
* A 20m and 70m file will generate a total of three InputSplit and three Map tasks
, /
For (FileStatus file: files) {
Path path = file.getPath ()
Long length = file.getLen ()
If (length! = 0) {
BlockLocation [] blkLocations
If (file instanceof LocatedFileStatus) {
BlkLocations = (LocatedFileStatus) file) .getBlockLocations ()
} else {
FileSystem fs = path.getFileSystem (job.getConfiguration ())
BlkLocations = fs.getFileBlockLocations (file, 0, length)
} if (isSplitable (job, path)) {
/ / get the default block block size of hdfs
Long blockSize = file.getBlockSize ()
/ / calculate the size of an InputSplit
Long splitSize = computeSplitSize (blockSize, minSize, maxSize)
Long bytesRemaining = length
While ((double) bytesRemaining) / splitSize > SPLIT_SLOP) {
Int blkIndex = getBlockIndex (blkLocations, length-bytesRemaining)
Splits.add (makeSplit (path, length-bytesRemaining, splitSize
BlkLocations [blkIndex] .getHosts ()
BlkLocations [blkIndex]. GetCachedHosts ())
BytesRemaining-= splitSize
}
If (bytesRemaining! = 0) {
Int blkIndex = getBlockIndex (blkLocations, length-bytesRemaining)
Splits.add (makeSplit (path, length-bytesRemaining, bytesRemaining
BlkLocations [blkIndex] .getHosts ()
BlkLocations [blkIndex]. GetCachedHosts ())
}
} else {/ / not splitable
Splits.add (makeSplit (path, 0, length, blkLocations [0] .getHosts ()
BlkLocations [0] .getCachedHosts ())
}
} else {
/ / Create empty hosts array for zero length files
Splits.add (makeSplit (path, 0, length, new String [0]))
}
}
/ / Save the number of input files for metrics/loadgen
Job.getConfiguration () .setLong (NUM_INPUT_FILES, files.size ()
Sw.stop ()
If (LOG.isDebugEnabled ()) {
LOG.debug ("Total # of splits generated by getSplits:" + splits.size () + ", TimeTaken:" + sw.elapsedMillis ())
}
Return splits
}
The above is all the content of the article "how to read getSplits by FileInputFormat". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.