IBM big data processing platform BigInsights (2) 07/06 Update SLTechnology News&Howtos

IBM big data processing platform BigInsights (2)

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Following the previous article, "exploring the IBM big data processing platform BigInsights (1)", this article describes some basic commands of Hadoop and runs a simple WordCount program using MapReduce.

1, create a test directory on the HDFS file system

Hadoop fs-mkdir / user/biadmin/test

2, copy the file to the test directory

Hadoop fs-put / var/adm/ibmvmcoc-postinstall/BIlicense_en.txt / user/biadmin/test

3. Check whether this file is added to the test directory.

Biadmin@bivm:/etc/ibmvmcoc-postinstall > hadoop fs-ls / user/biadmin/test

Found 1 items

-rw-r--r-- 1 biadmin biadmin 62949 2016-01-01 22:34 / user/biadmin/test/BIlicense_en.txt

4. Run a simple MapReduce program

WordCount is a Mini Program for Hadoop MapReduce written in JAVA, which is used to count the number of occurrences of each word in the text. For more information about WordCount, please refer to-http://wiki.apache.org/hadoop/WordCount

The executor is hadoop-example.jar, and the content is output to the WordCount_ output subdirectory under the test directory you just created. If there is no such directory, it will be created automatically.

Biadmin@bivm:/etc/ibmvmcoc-postinstall > hadoop jar / opt/ibm/biginsights/IHC/hadoop-example.jar wordcount / user/biadmin/test WordCount_output

16-01-01 22:36:08 INFO input.FileInputFormat: Total input paths to process: 1

16-01-01 22:36:18 INFO mapred.JobClient: Running job: job_201601012120_0001

16-01-01 22:36:19 INFO mapred.JobClient: map 0 reduce 0

16-01-01 22:37:58 INFO mapred.JobClient: map 100% reduce 0

16-01-01 22:39:07 INFO mapred.JobClient: map 100 reduce 100%

16-01-01 22:39:14 INFO mapred.JobClient: Job complete: job_201601012120_0001

16-01-01 22:39:15 INFO mapred.JobClient: Counters: 29

16-01-01 22:39:15 INFO mapred.JobClient: File System Counters

16-01-01 22:39:15 INFO mapred.JobClient: FILE: BYTES_READ=33219

16-01-01 22:39:15 INFO mapred.JobClient: FILE: BYTES_WRITTEN=419738

16-01-01 22:39:15 INFO mapred.JobClient: HDFS: BYTES_READ=63073

16-01-01 22:39:15 INFO mapred.JobClient: HDFS: BYTES_WRITTEN=24073

16-01-01 22:39:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.JobCounter

16-01-01 22:39:15 INFO mapred.JobClient: TOTAL_LAUNCHED_MAPS=1

16-01-01 22:39:15 INFO mapred.JobClient: TOTAL_LAUNCHED_REDUCES=1

16-01-01 22:39:15 INFO mapred.JobClient: DATA_LOCAL_MAPS=1

16-01-01 22:39:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=95300

16-01-01 22:39:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=50249

16-01-01 22:39:15 INFO mapred.JobClient: FALLOW_SLOTS_MILLIS_MAPS=0

16-01-01 22:39:15 INFO mapred.JobClient: FALLOW_SLOTS_MILLIS_REDUCES=0

16-01-01 22:39:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.TaskCounter

16-01-01 22:39:15 INFO mapred.JobClient: MAP_INPUT_RECORDS=755

16-01-01 22:39:15 INFO mapred.JobClient: MAP_OUTPUT_RECORDS=9865

16-01-01 22:39:15 INFO mapred.JobClient: MAP_OUTPUT_BYTES=102036

16-01-01 22:39:15 INFO mapred.JobClient: MAP_OUTPUT_MATERIALIZED_BYTES=33219

16-01-01 22:39:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=124

16-01-01 22:39:15 INFO mapred.JobClient: COMBINE_INPUT_RECORDS=9865

16-01-01 22:39:15 INFO mapred.JobClient: COMBINE_OUTPUT_RECORDS=2322

16-01-01 22:39:15 INFO mapred.JobClient: REDUCE_INPUT_GROUPS=2322

16-01-01 22:39:15 INFO mapred.JobClient: REDUCE_SHUFFLE_BYTES=33219

16-01-01 22:39:15 INFO mapred.JobClient: REDUCE_INPUT_RECORDS=2322

16-01-01 22:39:15 INFO mapred.JobClient: REDUCE_OUTPUT_RECORDS=2322

16-01-01 22:39:15 INFO mapred.JobClient: SPILLED_RECORDS=4644

16-01-01 22:39:15 INFO mapred.JobClient: CPU_MILLISECONDS=22130

16-01-01 22:39:15 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=538050560

16-01-01 22:39:15 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=3549384704

16-01-01 22:39:15 INFO mapred.JobClient: COMMITTED_HEAP_BYTES=2097152000

16-01-01 22:39:15 INFO mapred.JobClient: File Input Format Counters

16-01-01 22:39:15 INFO mapred.JobClient: Bytes Read=62949

16-01-01 22:39:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat$Counter

16-01-01 22:39:15 INFO mapred.JobClient: BYTES_WRITTEN=24073

The WordCount_output directory is automatically created

Biadmin@bivm:/etc/ibmvmcoc-postinstall > hadoop fs-ls WordCount_output

Found 3 items

-rw-r--r-- 1 biadmin biadmin 0 2016-01-01 22:39 WordCount_output/_SUCCESS

Drwx--x--x-biadmin biadmin 0 2016-01-01 22:36 WordCount_output/_logs

-rw-r--r-- 1 biadmin biadmin 24073 2016-01-01 22:39 WordCount_output/part-r-00000

Biadmin@bivm:~ > hadoop fs-cat WordCount_output/*00

Names, 1

National 1

Nature 1

Necessary 4

Negligence 5

Negligence, 4

Negligence. one

Negligence; 2

Neither 3

Net 1

The above is to use the command line to MapReduce, in addition, IBM BigInsights also provides a way based on the Web interface, open the applications sub-option, switch to Manage, you can see some predefined applications. Under Test, there is a WordCount app. Click on it and select "Deploy".

When you switch to Run, you can see that WordCount is already available.

Select WordCount, enter the directory and output directory where you want to count the files, and click Run to start running.

Similarly, you can manipulate the HDFS file system through the Web interface, including creating, deleting, and modifying directories or files

Open JobTracker (http://192.168.133.135:50030/jobtracker.jsp) in a browser to show the most recently run MapReduce task. Click JobID to see more details.

The so-called JobTracker is a master service. After the Hadoop is started, the JobTracker receives the Job, is responsible for scheduling each subtask of the Job task to run on the TaskTracker, and monitors them, and reruns it if a failed task is found.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.