Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement FileSystem query File system by hadoop

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to achieve hadoop FileSystem query file system, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

File metadata: Filestatus

An important feature of any file system is the ability to locate its directory structure and retrieve its stored files and directory information. The FileStatus class encapsulates the metadata for files and directories in the file system, including file length, block size, copy, modification time, owner, group, and license information.

Code:

Public class FileStatus implements Writable, Comparable {private Path path; private long length; private boolean isdir; private short block_replication; private long blocksize; private long modification_time; private long access_time; private FsPermission permission; private String owner; private String group; public FileStatus () {this (0, false, 0,0,0,0, null, null) } public FileStatus (long length, boolean isdir, int block_replication, long blocksize, long modification_time, long access_time, FsPermission permission, String owner, String group, Path path) {this.length = length; this.isdir = isdir; this.block_replication = (short) block_replication; this.blocksize = blocksize; this.modification_time = modification_time This.access_time = access_time; this.permission = (permission = = null)? FsPermission.getDefault (): permission; this.owner = (owner = = null)? "": owner; this.group = (group = = null)? "": group; this.path = path;}.}

FileSystem's getFileStatus () provides a way to get the status object of a file or directory. Example 3-5 shows its usage.

Example 3-5: display file status information

Public class ShowFileStatusTest {private MiniDFSCluster cluster; / / use an in-process HDFS cluster for testing private FileSystem fs; @ Before public void setUp () throws IOException {Configuration conf = new Configuration (); if (System.getProperty ("test.build.data") = = null) {System.setProperty ("test.build.data", "/ tmp");} cluster = new MiniDFSCluster (conf, 1, true, null); fs = cluster.getFileSystem () OutputStream out = fs.create (new Path ("/ dir/file"); out.write ("content" .getBytes ("UTF-8")); out.close ();} @ After public void tearDown () throws IOException {if (fs! = null) {fs.close ();} if (cluster! = null) {cluster.shutdown () } @ Test (expected = FileNotFoundException.class) public void throwsFileNotFoundForNonExistentFile () throws IOException {fs.getFileStatus (new Path ("no-such-file"));} @ Test public void fileStatusForFile () throws IOException {Path file = new Path ("/ dir/file"); FileStatus stat = fs.getFileStatus (file); assertThat (stat.getPath (). ToUri (). GetPath (), is ("/ dir/file")) AssertThat (stat.isDir (), is (false)); assertThat (stat.getLen (), is (7L)); assertThat (stat.getModificationTime (), is (lessThanOrEqualTo (System.currentTimeMillis (); assertThat (stat.getReplication (), is ((short) 1)); assertThat (stat.getBlockSize (), is (64 * 1024 * 1024L)); assertThat (stat.getOwner (), is ("tom")) AssertThat (stat.getGroup (), is ("supergroup")); assertThat (stat.getPermission (). ToString (), is ("rw-r--r--"));} @ Test public void fileStatusForDirectory () throws IOException {Path dir = new Path ("/ dir"); FileStatus stat = fs.getFileStatus (dir); assertThat (stat.getPath (). ToUri (). GetPath (), is ("/ dir")) AssertThat (stat.isDir (), is (true)); assertThat (stat.getLen (), is (0L)); assertThat (stat.getModificationTime (), is (lessThanOrEqualTo (System.currentTimeMillis (); assertThat (stat.getReplication (), is ((short) 0)); assertThat (stat.getBlockSize (), is (0L)); assertThat (stat.getOwner (), is ("tom")) AssertThat (stat.getGroup (), is ("supergroup")); assertThat (stat.getPermission () .toString (), is ("rwxr-xr-x"));}}

If the file or directory does not exist, a FileNotFoundException exception is thrown. The exists () method is more convenient if you are only interested in the existence of a file or directory

Public boolean exists (Path f) throws IOException

List Fil

It is useful to find information about a file or directory, but sometimes we also need to be able to list the contents of the directory. This is what the listStatus () method does:

Public FileStatus [] listStatus (Path f) throws IOException public FileStatus [] listStatus (Path f, PathFilter filter) throws IOException public FileStatus [] listStatus (Path [] files) throws IOException public FileStatus [] listStatus (Path [] files, PathFilter filter) throws IOException

When the passed-in parameter is a file, it simply returns an array of FileStatus objects of length 1. When the passed-in parameter is a directory, it returns 0 or more FileStatus objects representing the files and directories contained in that directory.

The overloading method allows us to use PathFilter to restrict matching files and directories. For an example, see below. If you call the listStatus method with an array of paths as a parameter, the result is the same as calling the method on each path in turn and then collecting the array of FileStatus objects in a single array, but the former is more convenient. This is useful when creating a list of input files executed from different parts of the file system tree. Example 3-6 is a simple demonstration of this idea. Note the use of stat2Paths () in FIleUtil, which converts an array of FileStatus objects into an array of Path objects.

Example 3-6: displays file information for some paths in an Hadoop file system

Public class ListStatus {public static void main (String [] args) throws Exception {String uri = args [0]; Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (uri), conf); Path [] paths = new Path [args.length]; for (int I = 0; I < paths.length; iTunes +) {paths [I] = new Path (Args [I]) } FileStatus [] status = fs.listStatus (paths); Path [] listedPaths = FileUtil.stat2Paths (status); for (Path p: listedPaths) {System.out.println (p);}

file format

It is a common requirement to process batch files in one step. For example, a MapReduce job that processes logs might analyze a month's worth of files that are contained in a large number of directories. Hadoop has a wildcard operation that makes it easy to use wildcards to check multiple files in an expression without enumerating each file and directory to specify input. Hadoop provides two FileSystem methods for performing wildcards:

Public FileStatus [] globStatus (Path pathPattern) throws IOException public FileStatus [] globStatus (Path pathPattern, PathFilter filter) throws IOException

GlobStatus () returns an array of FileStatus objects whose paths match the format provided, sorted by path. The optional PathFilter command further specifies the restriction match.

Hadoop supports the same range of wildcards as Unix bash (see Table 3-2).

Table 3-2: wildcards and their roles

Wildcard character

Name

Match

*

Asterisk

Match 0 or more characters

?

Question mark

Match a single character

[ab]

Character category

Match a character in {a _ r _ b}

[^ ab]

Non-character category

The match is not a character in {aforme b}.

[aMub]

Character range

Match one within the range of {a ~ (th) b}

Characters (including ab), an in dictionary

Be less than or equal to b in order

[^ aMub]

Non-character range

Match one that is not in the range of {aforme b}.

Characters (including ab), an in the word

The canonical order should be less than or equal to b

{a,b}

Or choose

Match statements that contain one of an or b

\ c

Escape character

Match metacharacter c

Here are some file wildcards and their extensions.

Wildcard character

Expansion

/ *

/ 2007/2008

/ *

/ 2007/12 / 2008/01

/ * / 12Universe *

/ 2007-12-30 / 2007-12-31

/ 200?

/ 2007 / 2008

/ 200 [78]

/ 2007 / 2008

/ 200 [7-8]

/ 2007 / 2008

/ 200 [^ 01234569]

/ 2007 / 2008

/ * / {31 / 01}

/ 2007-12-31 / 2008-01-01

/ * / 3 {0jue 1}

/ 2007-12-30 / 2007-12-31

/ * / {12Compact 31 01Compact 01}

/ 2007-12-31 / 2008-01-01

PathFilter object

The wildcard format does not always accurately describe the collection of files we want to access. For example, it is unlikely to use a wildcard format to exclude a particular file. The listStatus () and globStatus () methods in FileSystem provide optional PathFilter objects that allow us to programmatically control the match:

Package org.apache.hadoop.fs; public interface PathFilter {boolean accept (Path path);}

PathFilter, like java.io.FileFilter, is a Path object rather than a File object.

Examples 3-7 show a PathFilter that excludes paths that match a regular expression.

Public class RegexExcludePathFilter implements PathFilter {private final String regex; public RegexExcludePathFilter (String regex) {this.regex = regex;} public boolean accept (Path path) {return! path.toString (). Matches (regex);}}

This filter leaves only files that are different from regular expressions. We use it with a wildcard that pre-culls some file collections: filters are used to optimize the results. For example:

S.globStatus (new Path ("/ 2007 new Path *), new RegexExcludeFilter (" ^. * / 2007-12-31 $") are all the contents of the article" how to query the file system on FileSystem by hadoop ". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report