Shell and API operations of HDFS 07/08 Update SLTechnology News&Howtos

Shell and API operations of HDFS

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Shell operation of HDFS hadoop version / / View version hadoop fs-appendToFile src (files in Linux) dest (files in hdfs directory) / / append hadoop fs-cat file (files in hdfs directory) / / view file contents Hadoop fs-tail file (files in hdfs directory) / / view data of 1kb at the end of files hadoop fs-checksum file (files in hdfs directory) / / verify whether the current file is correct hadoop fs-copyFromLocal src dest / / copy files from local to HDFShadoop fs-copyToLocal dest src / / copy files from hdfs to local hadoop fs-count path / / count the contents of a directory hadoop fs-find path / / check whether the corresponding files in a hdfs directory hadoop fs-getmerge src dest / / merge download hadoop fs-ls path / / list hadoop fs-put file (local) dest (hdfs) / / upload local files to hdfshadoop fs-setrep num file / / set the number of copies of a file in the hdfs system (only files that have been uploaded in hdfs) hadoop fs-setrep-R num file / / set the number of copies of all files under the directory in the hdfs system hadoop fs-text compressed files / / View The compressed file hadoop fs-truncate dir / / in hdfs clears the hdfs directory hdfs getconf-confkey [key] (name of the configuration file) / / View the corresponding configuration value2. API operation of HDFS 1) build the development environment of hdfs in eclipse: download a new version of eclipse and put your integrated hadoop-eclipse-plugin-2.7.4.jar in the installation directory of eclipse. \ eclipse-hadoop\ plugins.

Plug-in download address: http://down.51cto.com/data/2457877

Then open eclipse:

The above two components can be seen, indicating that the eclipse environment is configured successfully. Configure eclipse plug-ins to connect to the cluster

After the configuration is successful, the file directory of hdfs will appear in DFS Locations

The window platform uses hadoop: configure the environment variable of hadoop under windows, place the decompression package of hadoop in a location, and then add: HADOOP_HOME:../hadoop2.7.4/, in path% HADOOP_HOME:%/bin,

Hadoop.dll,winutils.exe download address: http://down.51cto.com/data/2457878 put hadoop.dll into c:\ windows\ system32, put winutils.exe into hadoop's bin directory, and integrate eclipse with hadoop

2) there are three ways to add hadoop dependency packages: create a folder in the project, then copy all the dependent jar to this directory, right-click add build path to import jar into the project to create a user's library, and import the corresponding jar package into the project locally:

Note that all jar packages in the two lib here add two important objects that use maven to import the jar package (recommended) 3) hdfs programming:

Introduction to the Configuration class:

The Configuration class loads the corresponding configuration file in hadoop

Configuration conf=new Configuration (true); if you don't put the corresponding configuration files in source folder here: core-site.xml, hdfs-site.xml, yarn-site.xml, mapreduce-site.xml... Then the Configuration class will automatically load the configuration file in the jar package: hadoop-hdfs-2.7.6.jar/hdfs-default.xml can load the corresponding configuration only if it is placed in the sourcefolder, but only if the name of the configuration file is site or default.

uses the appropriate method to load the configuration file:

Configuration conf=new Configuration (); conf.addResource (""); / / the content of this argument specifies the name of the permission of the configuration file.

Introduction to the FileSystem class:

Get the FileSystem object:

/ / the fs object obtained in this way is the object of the local file system, if it is the object of windows under windows: Configuration conf=new Configuration (); FileSystem fs=FileSystem.get (conf); / / the file system object conf.set ("fs.defaultFS", "hdfs://hadoop01:9000") that can load your own cluster.

Resolve permission issues when programming:

because the hdfs file system is based on users, and the default user to operate hdfs in eclipse under Windows is the user of Windows, which does not have permission when operating the hdfs file system, so when doing read and write operations, it may be reported:

Solution:

(1) use run configuration at run time and configure the corresponding parameter:-DHADOOP_USER_NAME=hadoop (specify the running user)

(2) specify the user in the code: FileSystem.get (newURI ("hdfs://hadoop01:9000"), conf, "hadoop")

(3) specify the user System.setProperty ("HADOOP_USER_NAME", "hadoop") to be used when jvm is running in the code; here you need to use run configuration to run

(4) add a hadoop user to Windows, which is not recommended

If there is still a problem with the above approach, you need to configure an environment variable in the path of window:

DHADOOP_USER_NAME = hadoop is fine.

4) practical programming of hdfs: public class HDFSApp {/ / file system FileSystem fileSystem = null; / / configuration class Configuration configuration = null; @ Before public void setup () {configuration = new Configuration (); configuration.set ("fs.defautlFS", "hdfs://zzy:9000"); configuration.addResource ("core-site.xml"); configuration.addResource ("hdfs-site.xml") Try {fileSystem = FileSystem.get (configuration);} catch (IOException e) {e.printStackTrace ();}} / create directory @ Test public void mkdir () {try {System.setProperty ("HADOOP_USER_NAME", "hadoop"); boolean isMkdirs = fileSystem.mkdirs (new Path ("/ user/hadoop/test")) If (isMkdirs) {System.out.println ("created successfully!") ;} else {System.out.println ("creation failed!") ;}} catch (IOException e) {e.printStackTrace ();} / / Delete the directory @ Test public void deletedir () {System.setProperty ("HADOOP_USER_NAME", "hadoop"); try {fileSystem.delete (new Path ("/ daily task .txt"), true) } catch (IOException e) {e.printStackTrace ();}} / / copy the local file to hdfs @ Test public void CopyFromeLocal () {System.setProperty ("HADOOP_USER_NAME", "hadoop"); Path src=new Path ("C:\ Users\\ aura-bd\\ Desktop\\ daily task .txt"); Path dest=new Path ("/") Try {fileSystem.copyFromLocalFile (src,dest);} catch (IOException e) {e.printStackTrace ();}} / / copy the hdfs file locally @ Test public void CopyToLocal () {System.setProperty ("HADOOP_USER_NAME", "hadoop"); Path src=new Path ("C:\\ Users\\ aura-bd\\ Desktop\") Path dest=new Path ("/ user/hive/warehouse/test.db/pokes/data.txt"); try {fileSystem.copyToLocalFile (dest,src);} catch (IOException e) {e.printStackTrace () }} / / displays the folder information under the directory @ Test public void FSListFile () {try {RemoteIterator filelist = fileSystem.listFiles (new Path ("/ user/hive/warehouse/test.db"), true); while (filelist.hasNext ()) {LocatedFileStatus fileStatus = filelist.next (); System.out.println (fileStatus.getPath ()) System.out.println (fileStatus.getGroup ()); System.out.println (fileStatus.getPath (). GetName ()); System.out.println (fileStatus.getReplication ()); BlockLocation [] blockLocations = fileStatus.getBlockLocations (); for (BlockLocation block: blockLocations) {System.out.println (block.getHosts (). ToString ()) System.out.println (block.getNames ()); System.out.println (block.getOffset ());} catch (IOException e) {e.printStackTrace () }} / / display folder and file information @ Test public void ListFiles () {try {FileStatus [] fileStatuses = fileSystem.listStatus (new Path ("/")); for (FileStatus file:fileStatuses) {if (file.isDirectory ()) {System.out.println ("directory:" + file.getPath (). GetName ()) } else {System.out.println ("file:" + file.getPath (). GetName ());} catch (IOException e) {e.printStackTrace ();}} / / download the file @ Test public void DownLoadFileToLocal () {System.setProperty ("HADOOP_USER_NAME", "hadoop") Try {FSDataInputStream open = fileSystem.open (new Path ("/ user/hive/warehouse/test.db/pokes/data.txt")); OutputStream out=new FileOutputStream (new File ("D:\\ data.txt")); IOUtils.copyBytes (open,out,1024);} catch (IOException e) {e.printStackTrace () }} / / upload file @ Test public void upLoadFileToLocal () {System.setProperty ("HADOOP_USER_NAME", "hadoop"); try {FSDataOutputStream fsDataOutputStream = fileSystem.create (new Path ("/ a.txt")); InputStream in=new FileInputStream ("D:\\ data.txt"); IOUtils.copyBytes (in,fsDataOutputStream,4096) } catch (IOException e) {e.printStackTrace ();} 3. Classic case of HDFS:

in hadoop MR programming calculation framework, the calculation is moved to data, so how to get all the block information of a file, specify the block block to read the data?

Code implementation:

Public class RamdomRead {private static Configuration conf; private static FileSystem fs; static {conf=new Configuration (true); conf.set ("fs.defalutFS", "hdfs://zzy:9000"); conf.addResource ("core-site.xml"); conf.addResource ("hdfs-site.xml"); try {fs=FileSystem.get (conf) } catch (IOException e) {e.printStackTrace ();}} public static void main (String [] args) throws IOException {System.setProperty ("HADOOP_USER_NAME", "hadoop"); Path file=new Path ("/ data/2018-8-8/access.log"); FSDataInputStream open = fs.open (file) / / get the file information try {FileStatus [] fileStatuses = fs.listStatus (file); / / get the information of all the block of this file BlockLocation [] fileBlockLocations = fs.getFileBlockLocations (fileStatuses [0], 0L, fileStatuses [0] .getLen ()); / / the length of the first block BlockLocation fileBlockLocation = fileBlockLocations [0] / / length of the first block long length = fileBlockLocation.getLength (); / / initial offset of the first block long offset = fileBlockLocation.getOffset (); / / read data byte flush [] = new byte [4096]; FileOutputStream os = new FileOutputStream (new File ("d:/block0")) While (open.read (offset,flush,0,4096)! =-1) {os.write (flush); offset+=4096; if (offset > length) {return;}} os.flush (); os.close (); fs.close () } catch (IOException e) {e.printStackTrace ();}

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.