The use of Hadoop Integrated Spring 07/06 Update SLTechnology News&Howtos

The use of Hadoop Integrated Spring

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

A brief overview of Spring Hadoop

The official website address of Spring Hadoop is as follows:

Https://projects.spring.io/spring-hadoop/

Spring Hadoop simplifies Apache Hadoop, providing a unified configuration model and an easy-to-use API to use HDFS, MapReduce, Pig, and Hive. Other Spring ecosystem projects are also integrated, such as Spring Integration and Spring Batch.

Features:

Support for creating Hadoop applications, configuring java applications that use dependency injection and running standards, and command-line tools that use Hadoop. By integrating Spring Boot, you can simply create a Spring application to connect to HDFS to read and write data. Create and configure, using java's MapReduce,Streaming,Hive,Pig or HBase. The extended Spring Batch supports the creation of any type of Hadoop Job or HDFS operation of a Hadoop-based workflow. Scripting HDFS operations use any JVM-based scripting language. Custom basic applications can be easily created based on SpringBoot, and applications can be deployed on YARN. DAO is supported, and Hbase can be operated using templates or callbacks to support Hadoop security verification.

Official document and API address of Spring Hadoop2.5:

Https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/reference/html/

Https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/api/

Building Spring Hadoop development environment and accessing HDFS file system

Create a maven project. The configuration depends on the following:

Cloudera https://repository.cloudera.com/artifactory/cloudera-repos/ true false UTF-8 2.6.0-cdh6.7.0 org.apache.hadoop hadoop-client ${hadoop.version} provided com.kumkee UserAgentParser 0.0.1 junit junit 4.10 test org.springframework.data spring-data-hadoop 2.5.0.RELEASE maven-assembly-plugin jar-with-dependencies

Create a resource directory and configuration file in the project. The name of the configuration file can be customized. The following content is added to the configuration file:

Fs.defaultFS=$ {spring.hadoop.fsUri}

Then create a properties file application.properties (the file name can be customized), and configure some easily changed configuration information under the properties file. For example, here I configure the url of the server in the properties file, as follows:

Spring.hadoop.fsUri=hdfs://192.168.77.128:8020

After completing the above operations, our Spring Hadoop development environment is completed, after all, it is convenient to use Maven.

Next, let's create a test class to test whether it can operate on the HDFS file system properly:

Package org.zero01.spring;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.junit.After;import org.junit.Before;import org.junit.Test;import org.springframework.context.ApplicationContext;import org.springframework.context.support.ClassPathXmlApplicationContext;import java.io.IOException / * * @ program: hadoop-train * @ description: use SpringHadoop to access the HDFS file system * @ author: 01 * @ create: 2018-04-04 17:39 * * / public class SpringHadoopApp {private ApplicationContext ctx; private FileSystem fileSystem; @ Before public void setUp () {ctx = new ClassPathXmlApplicationContext ("beans.xml"); fileSystem = (FileSystem) ctx.getBean ("fileSystem") } @ After public void tearDown () throws IOException {ctx = null; fileSystem.close ();} / * create a directory on HDFS * @ throws Exception * / @ Test public void testMkdirs () throws Exception {fileSystem.mkdirs (new Path ("/ SpringHDFS/"));}}

The above code is executed successfully, and then go to the server to see if there is a SpringHDFS directory under the root directory:

[root@hadoop000] # hdfs dfs-ls / Found 7 items-rw-r--r-- 3 root supergroup 2769741 2018-04-02 21:13 / 10000_access.logdrwxr-xr-x-root supergroup 0 2018-04-04 17:50 / SpringHDFSdrwxr-xr-x-root supergroup 0 2018-04-02 21:22 / browseroutdrwxr-xr-x-root supergroup 0 2018-04-02 20:29 / datadrwxr-xr-x-root Supergroup 0 2018-04-02 20:31 / logsdrwx--root supergroup 0 2018-04-02 20:39 / tmpdrwxr-xr-x-root supergroup 0 2018-04-02 20:39 / user [root@hadoop000 ~] # hdfs dfs-ls / SpringHDFS [root@hadoop000 ~] #

You can see that the SpringHDFS directory has been created successfully, which means that there is no problem with the project we configured.

Since there is no problem with creating a directory, let's write another test method to read the contents of a file on HDFS, as follows:

/ * read file contents on HDFS * @ throws Exception * / @ Testpublic void testText () throws Exception {FSDataInputStream in = fileSystem.open (new Path ("/ browserout/part-r-00000")); IOUtils.copyBytes (in, System.out, 1024); in.close ();}

The above code is executed successfully. The output from the console is as follows:

Chrome 2775Firefox 327MSIE 78Safari 115Unknown 6705

There is no problem with reading and writing, so we can happily use Spring Hadoop in the project to simplify our development.

Spring Boot accesses HDFS file system

Spring Hadoop access to HDFS is described above, and then a brief introduction to using Spring Boot to access HDFS, using Spring Boot will be easier.

First of all, you need to add the dependency of Spring Boot in the pom.xml file:

Org.springframework.data spring-data-hadoop-boot 2.5.0.RELEASEpackage org.zero01.spring;import org.apache.hadoop.fs.FileStatus;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.boot.CommandLineRunner;import org.springframework.boot.SpringApplication;import org.springframework.boot.autoconfigure.SpringBootApplication;import org.springframework.data.hadoop.fs.FsShell / * * @ program: hadoop-train * @ description: use spring boot to access HDFS * @ author: 01 * @ create: 2018-04-04 18:45 * * / @ SpringBootApplicationpublic class SpringBootHDFSApp implements CommandLineRunner {@ Autowired FsShell fsShell; / / the object public void run (String...) used to execute the hdfs shell command Strings) throws Exception {/ / View all files in the root directory for (FileStatus fileStatus: fsShell.ls ("/")) {System.out.println (">" + fileStatus.getPath ());}} public static void main (String [] args) {SpringApplication.run (SpringBootHDFSApp.class, args);}}

The console output is as follows:

> hdfs://192.168.77.128:8020/ > hdfs://192.168.77.128:8020/10000_access.log > hdfs://192.168.77.128:8020/SpringHDFS > hdfs://192.168.77.128:8020/browserout > hdfs://192.168.77.128:8020/data > hdfs://192.168.77.128:8020/logs > hdfs://192.168.77.128:8020/tmp > hdfs://192.168.77.128:8020/user

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.