Hadoop reads environment variables and setup functions 10/24 Update SLTechnology News&Howtos

Hadoop reads environment variables and setup functions

2025-10-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Original code of setup function: (excerpt from "hadoop practice")

* Called once at the start of the task.

Protected void setup (Context context) throws IOException,InterruptedException {}

As you can see from the comments, the setup function is called when Task starts.

Jobs in MapReduce are organized into MapTask and ReduceTask.

Each Task takes the Map class or reduce class as the body of the processing method

The input shard is the input of the processing method, and the Task is destroyed after its own shard is processed.

As you can see here, the setup function is called once before data processing after task starts.

The overridden Map and Reduce functions are called once for each Key of the input fragment.

So the setup function can be treated as a global processing on Task.

Taking advantage of the characteristics of the setup function, you can put repeated processing in the Map or Reduce function into the setup function.

Such as the "name" in the Exercise_2 given by the teacher

It is important to note, however, that calling the setup function is only a global operation on the corresponding Task, not a global operation of the entire job.

You can first use api to transfer local files to / user/hadoop/test in hdfs.

/ / upload local files to HDFS

Public static void upload (String src,String dst) throws FileNotFoundException,IOException {

InputStream in = new BufferedInputStream (new FileInputStream (src))

/ / get the configuration object

Configuration conf = new Configuration ()

/ / File system

FileSystem fs = FileSystem.get (URI.create (dst), conf)

/ / output stream

OutputStream out = fs.create (new Path (dst), new Progressable () {

Public void progress () {

System.out.println ("upload a file that sets the size and capacity of the cache!")

}

});

/ / connect two streams to form a channel to transfer data from the input to the output stream

IOUtils.copyBytes (in, out, 4096 dint true)

}

Just call this function when uploading.

For example

Upload ("/ home/jack/test/test.txt", "/ user/hadoop/test/test")

The first is the file in the local directory, followed by the file in the hdfs

Note that both must be "path + file name" and cannot be without a file name

Configuration conf = new Configuration ()

Conf.setStrings ("job_parms", "aaabbc"); / / this is the key sentence.

Job job = new Job (conf, "load analysis")

Job.setJarByClass (LoadAnalysis.class)

Job.setMapperClass (LoadMapper.class)

Job.setReducerClass (LoadIntoHbaseReduce.class)

Job.setMapOutputKeyClass (Text.class)

Job.setMapOutputValueClass (Text.class)

FileInputFormat.addInputPath (job, new Path (otherArgs [0]))

@ Override

Protected void setup (Context context)

Throws IOException, InterruptedException {

Try {

/ / obtain configuration parameters from global configuration

Configuration conf = context.getConfiguration ()

String parmStr = conf.get ("job_parms"); / / so you get it.

} catch (SQLException e) {

E.printStackTrace ()

}

Global file: hadoop has a distributed cache to save the global file, ensuring that all node can access it, using the class name DistributedCache

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.