How to build hadoop platform 10/19 Update SLTechnology News&Howtos

How to build hadoop platform

2025-10-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "how to build a hadoop platform", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to build a hadoop platform" this article.

Virtual machine and system installation

1. Download vmware and install it successfully

two。 Install CentOS system in vmware

Configure the JAVA environment in the virtual machine

1. Install the java virtual machine (jdk-6u31-linux-i586.bin)

two。 Configure environment variables

(1) vi / etc/profile (edit file)

(2) add

(3) source / etc/profile (injected environment variables)

Note: use ROOT and user

3. Modify hosts

Vim / etc/hosts is modified to: 127.0.0.1 qiangjin

Note: use ROOT and user

IV. Modify hostname vim / etc/sysconfig/network

Modified to: NETWORKING=yes HOSTNAME=qiangjin

Temporarily modify hostname to use the

Hostname qiangjin to view the current hostname, using the

Hostname Note: use ROOT and customer

5. Configure ssh

1. Execute under the current user's home directory

(1) ssh-keygen

(2) cat .ssh / id_rsa.pub "" .ssh / authorized_keys

(3) chmod 700.ssh

(4) chmod 600.ssh / authorized_keys

(5) ssh qiangjin is successful

VI. Decompression of compressed package

1. Decompress hadoop-0.20.2-cdh4u3.tar.gz

two。 Decompress hbase-0.90.4-cdh4u3.tar.gz

3. Decompress hive-0.7.1-cdh4u3.tar.gz

4. Decompress zookeeper-3.3.4-cdh4u3.tar.gz

5. Decompress sqoop-1.3.0-cdh4u3.tar.gz

6. Decompress mahout-0.5-cdh4u3.tar.gz; (for data mining algorithms)

Note: tar-xvf xxxx.tar.gz

7. Modify hadoop configuration file

(1) enter cdh4/hadoop-0.20.2-cdh4u3/conf

(2) Modification

Core-site.xml

Note: self-equipped hostname is used in fs.default.name configuration.

(3) modify hdfs-site.xml

Note: when you are on a stand-alone machine, dfs.replicaTIon is generally set to 1.

(4) Modification

Mapred-site.xml

Note: self-matching hostname is used in mapred.job.tracker.

(5) Modification

Masters

(6) Modification

Slaves

(7) Modification

Hadoop-env.sh

Environment variables need to be added

VIII. Modify HBase configuration

(1) enter cdh4/hbase-0.90.4-cdh4u3/conf

(2) modify hbase-site.xml

(3) Modification

Regionserver

(4) Modification

Hbase-env.sh

Environment variables need to be added

IX. Modify hive configuration

(1) enter cdh4/hive-0.7.1-cdh4u3/conf

(2) add hive-site.xml and configure

Note: need to pay attention to hbase.zookeeper.quorum, mapred.job.tracker, hive.exec.scratchdir, javax.jdo.opTIon.ConnecTIonURL,

Environment variables need to be added for configuration at javax.jdo.opTIon.ConnectionUserName and javax.jdo.option.ConnectionPassword.

10. Modify sqoop configuration

Environment variables need to be added

11. Modify zookeeper configuration

(1) enter cdh4/zookeeper-3.3.4-cdh4u3

(2) create a new directory zookeeper-data

(3) enter zookeeper-data, and create a new myid. Enter 0.

(4) enter cdh4/zookeeper-3.3.4-cdh4u3/conf

(5) Modification

Zoo.cfg

Note: configuration of dataDir and server.0

Environment variables need to be added

To modify mahout configuration, you need to add environment variables.

XIII. Database JAR package

(1) put mysql-connector-java-5.1.6.jar into cdh4/hive-0.7.1-cdh4u3/lib

(2) put ojdbc14.jar into cdh4/sqoop-1.3.0-cdh4u3/lib

Hadoop format for the first time and start and stop

Format hadoop namenode-format of 1.hadoop

Startup start-all.sh of 2.hadoop

Stop stop-all.sh of 3.hadoop

Note: use jps or ps to see if hadoop is started. If there is a problem at startup, it will be displayed on the screen. You can enter the URL: http://qiangjin:50070 to view the operation of hadoop

15. Start hbase

(1) start

Hbase, the command is as follows: start-hbase.sh (2) stop

Hbase, the command is as follows: stop-hbase.sh (3) enter the hbase

Shell, the command is as follows: hbase shell

(4) View the table in hbase, and command as follows (need to enter into hbase shell) list

(5) Note: hadoop is required to be started.

Note: hadoop is required to be started. You can enter the URL: http://qiangjin:60010

Check the operation of hbase 16. Start zookeeper

(1) start zookeeper and command zkServer.sh start as follows

(2) stop zookeeper and command as follows: zkServer.sh stop

Note: if it is a stand-alone machine, the startup of hbase will drive the startup of zookeeper.

17. Start hive

(1) start hive and command hive as follows

(2) View the table, and the command is as follows: (must be executed under the hive command window) show tables

18. Run wordcount instance

(1) create new file01 and file02, and set the content

(2) create an input directory in hdfs: Hadoop fs-mkdir input

(3) combine file01 and file02

Copy to hadoop fs-copyFromLocal file0* input in hdfs

(4) execute wordcount hadoop jar hadoop-examples-0.20.2-cdh4u3.jar wordcount input output

(5) View the result hadoop fs-cat output/part-r-00000

Import oracle data into hive

(1) enter cdh4/sqoop-1.3.0-cdh4u3/bin

(2) create a new directory importdata

(3) enter the directory importdata

(4) create a new sh file

Oracle-test.sh

(5) execution. / oracle- test.sh

(6) enter hive to check whether the import is successful.

Note: parameters used in hive import. . / sqoop import-- append-- connect $CONNECTURL-- username $ORACLENAME-- password $ORACLEPASSWORD-- M1-- table $oracleTableName-- columns $columns-- hive-import

Import oracle data into hbase

(1) enter cdh4/sqoop-1.3.0-cdh4u3/bin

(2) create a new directory importdata

(3) enter the directory importdata

(4) create a new sh file

Oracle-hbase.sh

(5) execution. / oracle-hbase.sh

(6) enter hbase shell to check whether the import is successful.

Note: parameters used in hbase import. . / sqoop import-append-- connect $CONNECTURL-- username $ORACLENAME-- password $ORACLEPASSWORD-- M1-- table $oracleTableName-- columns $columns-hbase-create-table-- hbase-table $hbaseTableName-- hbase-row-key ID-- column-family cf1

21. Configure hbase to hive mapping

(1) enter cdh4/hive-0.7.1-cdh4u3/bin

(2) create a new directory mapdata

(3) enter mapdata

(4) New

Hbasemaphivetest.q

(5) execution

Hive-f hbasemaphivetest.q

Note: the columns should correspond and the types should match.

22, mahout running

1. Run example

(1) Import the data "synthetic_control.data" used by the instance, and run hadoop fs-put synthetic_control.data / user/liuhx/testdata/ on the console.

(2) run the instance program on the console for a long time and need to iterate 10 times.

Hadoop jar mahout-examples-0.5-cdh4u3-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

2. View the running result and enter the command

Mahout vectordump-seqFile / user/liuhx/output/data/part-m-00000

3. To display graphically, enter the following command

Hadoop jar mahout-examples-0.5-cdh4u3-job.jar org.apache.mahout.clustering.display.DisplayKMeans

23, Eclipse configuration

1. Install Eclipse

2. Import cdh4/hadoop-0.20.2-cdh4u3/src/contrib/eclipse-plugin project

3. Modify plugin.xml mainly to change the configuration of jar package in runtime

4. Run Run As- "Eclipse Application

5. Configure the running environment of map/reduce location in the running eclipse sdk where hadoop is configured.

The above is all the contents of the article "how to build a hadoop platform". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.