Hadoop Series (7)-- HDFS Java API 07/15 Update SLTechnology News&Howtos

Hadoop Series (7)-- HDFS Java API

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. brief introduction

To use HDFS API, you need to import the dependency hadoop-client. If it is the CDH version of Hadoop, you also need to specify the address of its warehouse:

4.0.0 com.heibaiying hdfs-java-api 1.0 UTF-8 2.6.0-cdh6.15.2 cloudera https://repository.cloudera.com/artifactory/cloudera-repos/ org.apache.hadoop hadoop-client ${ Hadoop.version} junit junit 4.12test II. Use of API 2.1 FileSystem

FileSystem is the main entry for all HDFS operations. Since it is needed for every subsequent unit test, it is annotated with the @ Before annotation.

Private static final String HDFS_PATH = "hdfs://192.168.0.106:8020"; private static final String HDFS_USER = "root"; private static FileSystem fileSystem;@Beforepublic void prepare () {try {Configuration configuration = new Configuration (); / / here I am starting a single-node Hadoop, so the copy coefficient is set to 1, and the default value is 3 configuration.set ("dfs.replication", "1"). FileSystem = FileSystem.get (new URI (HDFS_PATH), configuration, HDFS_USER);} catch (IOException e) {e.printStackTrace ();} catch (InterruptedException e) {e.printStackTrace ();} catch (URISyntaxException e) {e.printStackTrace ();}} @ Afterpublic void destroy () {fileSystem = null;} 2.2 create a directory

Recursive creation of directories is supported:

@ Testpublic void mkDir () throws Exception {fileSystem.mkdirs (new Path ("/ hdfs-api/test0/"));} 2.3 create a directory with specified permissions

The three parameters of FsPermission (FsAction u, FsAction g, FsAction o) correspond to creator permissions, other user rights in the same group, other user rights, and permission values are defined in the FsAction enumeration class.

@ Testpublic void mkDirWithPermission () throws Exception {fileSystem.mkdirs (new Path ("/ hdfs-api/test1/"), new FsPermission (FsAction.READ_WRITE, FsAction.READ, FsAction.READ));} 2.4 create a file and write the content @ Testpublic void create () throws Exception {/ / if the file exists, it is overwritten by default and can be controlled by the second parameter. The third parameter controls the size of the buffer used: FSDataOutputStream out = fileSystem.create (new Path ("/ hdfs-api/test/a.txt"), true, 4096); out.write ("hello hadoop!" .getBytes ()); out.write ("hello spark!" .getBytes ()); out.write ("hello flink!" .getBytes ()) / / force the contents of the buffer to be brushed out of out.flush (); out.close ();} 2.5 to determine whether the file exists @ Testpublic void exist () throws Exception {boolean exists = fileSystem.exists (new Path ("/ hdfs-api/test/a.txt")); System.out.println (exists);} 2.6 View the contents of the file

View the contents of a small text file, convert it directly to a string and output it:

@ Testpublic void readToString () throws Exception {FSDataInputStream inputStream = fileSystem.open (new Path ("/ hdfs-api/test/a.txt")); String context = inputStreamToString (inputStream, "utf-8"); System.out.println (context);}

InputStreamToString is a custom method with the following code:

/ * convert the input stream to specified encoded characters * * @ param inputStream input stream * @ param encode specified encoding type * / private static String inputStreamToString (InputStream inputStream, String encode) {try {if (encode = = null | | (".equals (encode)) {encode =" utf-8 ";} BufferedReader reader = new BufferedReader (new InputStreamReader (inputStream, encode)) StringBuilder builder = new StringBuilder (); String str = ""; while ((str = reader.readLine ())! = null) {builder.append (str) .append ("\ n");} return builder.toString ();} catch (IOException e) {e.printStackTrace ();} return null File renaming @ Testpublic void rename () throws Exception {Path oldPath = newPath ("/ hdfs-api/test/a.txt"); Path newPath = newPath ("/ hdfs-api/test/b.txt"); boolean result = fileSystem.rename (oldPath, newPath); System.out.println (result) Delete directory or file public void delete () throws Exception {/ * * the second parameter represents whether to recursively delete * + if path is a directory and recursively deleted to true, delete the directory and all files in it; * + if path is a directory but recursively deleted to false, an exception will be thrown. * / boolean result = fileSystem.delete (new Path ("/ hdfs-api/test/b.txt"), true); System.out.println (result);} 2.9 upload files to HDFS@Testpublic void copyFromLocalFile () throws Exception {/ / if you specify a directory, the directory and all the files in it will be copied to the specified directory Path src = new Path ("D:\ BigData-Notes\\ notes\\ installation") Path dst = new Path ("/ hdfs-api/test/"); fileSystem.copyFromLocalFile (src, dst);} 2.10 upload large files and show upload progress @ Test public void copyFromLocalBigFile () throws Exception {File file = new File ("D:\\ kafka.tgz"); final float fileSize = file.length (); InputStream in = new BufferedInputStream (new FileInputStream (file)) FSDataOutputStream out = fileSystem.create (new Path ("/ hdfs-api/test/kafka5.tgz"), new Progressable () {long fileCount = 0; public void progress () {fileCount++ / / the progress method is called System.out.println ("upload progress:" + (fileCount * 64 * 1024 / fileSize) * 100 + "%") every time after uploading approximately 64KB data;}}); IOUtils.copyBytes (in, out, 4096) Download the file @ Testpublic void copyToLocalFile () throws Exception {Path src = new Path ("/ hdfs-api/test/kafka.tgz") from HDFS; Path dst = new Path ("D:\\ app\"); / * * the first parameter controls whether to delete the source file after the download is completed, and the default is true, that is, delete; * the last parameter indicates whether RawLocalFileSystem is used as the local file system * RawLocalFileSystem defaults to false and usually does not need to be set. * but if you throw a NullPointerException exception during execution, it means that your file system may be incompatible with the program (common in window). * you can set RawLocalFileSystem to true * / fileSystem.copyToLocalFile (false, src, dst, true). } 2.12 View the information of all files in the specified directory public void listFiles () throws Exception {FileStatus [] statuses = fileSystem.listStatus (new Path ("/ hdfs-api")); the toString method of for (FileStatus fileStatus: statuses) {/ / fileStatus has been overridden, and you can see all the information System.out.println (fileStatus.toString ()) directly by printing;}}

FileStatus contains basic information about a file, such as file path, whether it is a folder, modification time, access time, owner, group, file permission, symbolic link, and so on. An example of the output is as follows:

FileStatus {path=hdfs://192.168.0.106:8020/hdfs-api/test; isDirectory=true; modification_time=1556680796191; access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false} 2.13 Recursive view the information of all files in the specified directory @ Testpublic void listFilesRecursive () throws Exception {RemoteIterator files = fileSystem.listFiles (new Path ("/ hbase"), true); while (files.hasNext ()) {System.out.println (files.next ());}}

Similar to the output above, except for more text size, copy coefficient, block size information.

LocatedFileStatus {path=hdfs://192.168.0.106:8020/hbase/hbase.version; isDirectory=false; length=7; replication=1; blocksize=134217728; modification_time=1554129052916; access_time=1554902661455; owner=root; group=supergroup;permission=rw-r--r--; isSymlink=false} 2.14 View the block information of the file @ Testpublic void getFileBlockLocations () throws Exception {FileStatus fileStatus = fileSystem.getFileStatus (new Path ("/ hdfs-api/test/kafka.tgz")); BlockLocation [] blocks = fileSystem.getFileBlockLocations (fileStatus, 0, fileStatus.getLen ()) For (BlockLocation block: blocks) {System.out.println (block);}}

The block output information has three values, namely, the initial offset of the file (offset), the file size (length), and the hostname of the block (hosts).

0,57028557,hadoop001

The file I uploaded here is only 57m (less than 128m), and the copy coefficient is set to 1 in the program, all with only one block information.

Download address for all test cases above: HDFS Java API

For more articles in big data's series, please see the GitHub Open Source Project: big data's getting started Guide.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.