Build and deploy the HDFS of Hadoop 07/01 Update SLTechnology News&Howtos

Build and deploy the HDFS of Hadoop

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

HDFS Hadoop distributed file system

Distributed file system

Distributed file system can effectively solve the problem of data storage and management.

-extend a file system fixed in a location to any number of locations / file systems

-many nodes form a file system network

-each node can be distributed in different locations for communication and data transmission between nodes through the network

-when using a distributed file system, people do not need to care about which node the data is stored on or from, but only need to manage and store the data in the file system as if they were using a local file system

The role and concept of HDFS

It is the foundation of data storage management in Hadoop system. It is a highly fault-tolerant system for running on low-cost general-purpose hardware.

Roles and concepts

-Client

-Namenode

-Secondarynode

-Datanode

NameNode

-Master node, manages HDFS namespace and block mapping information, configures replica policy, and handles all client requests.

Secondary NameNode

-merge fsimage and fsedits periodically and push them to NameNode

-in case of emergency, it can help restore NameNode.

But Secondary NameNode is not a hot backup for NameNode.

DataNode

-data storage node to store actual data

-report the stored information to NameNode.

Client

-split the file

-access HDFS

-interact with NameNode to get file location information

-interacts with DataNode to read and write data.

Block

-default 64MB size per block

-each piece can have more than one copy.

Build and deploy HDFS distributed file system

The lab environment is ready:

# vim / etc/hosts

.. ..

192.168.4.1master

192.168.4.2node1

192.168.4.3node2

192.168.4.4node3

# sed-ri "/ Host * / aStrictHostKeyChecking no" / etc/ssh/ssh_config

# ssh-keygen

# for i in {1..4}

> do

> ssh-copy-id 192.168.4.$ {I}

> done

# for i in {1.. 4} / / synchronize local domain names

> do

> rsync-a / etc/hosts 192.168.4.$ {I}: / etc/hosts

> done

# rm-rf / etc/yum.repos.d/*

# vim / etc/yum.repos.d/yum.repo / / configure network yum

[yum]

Name=yum

Baseurl= http://192.168.4.254/rhel7

Gpgcheck=0

# for i in {2..4}

> do

> ssh 192.168.4.$ {I} "rm-rf / etc/yum.repos.d/*"

> rsync-a / etc/yum.repos.d/yum.repo 192.168.4.$ {I}: / etc/yum.repos.d/

> done

# for i in {1..4}

> do

> ssh 192.168.4.$ {I} 'sed-ri "s / ^ (SELINUX=). * /\ 1disabled/" / etc/selinux/config; yum-y remove firewalld'

> done

/ / restart all machines

Build fully distributed

System planning:

Host role software

192.168.4.1 master NameNode SecondaryNameNode HDFS

192.168.4.2 node1 DataNode HDFS

192.168.4.3 node2 DataNode HDFS

192.168.4.4 node3 DataNode HDFS

Install the java environment and debugging tool jtarps on all systems

# for i in {1..4}

> do

> ssh 192.168.4.$ {I} "yum-y install java-1.8.0-openjdk-devel.x86_64"

> done

# which java

/ usr/bin/java

# readlink-f / usr/bin/java

/ usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java

Install hadoop

# tar-xf hadoop-2.7.3.tar.gz

# mv hadoop-2.7.3 / usr/local/hadoop

Modify configuration

# cd / usr/local/hadoop/

# sed-ri "s; (export JAVA_HOME=). *;\ 1/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86x64andJRE;" etc/hadoop/hadoop-env.sh

# sed-ri "s; (export HADOOP_CONF_DIR=). *;\ 1Universe Hadoop;" etc/hadoop/hadoop-env.sh

# sed-n "25pscape 33p" etc/hadoop/hadoop-env.sh

Export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre

Export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

/ / configuration parameters describe the website http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml

# vim etc/hadoop/core-site.xml

.. ..

Fs.defaultFS / / default file system

Hdfs://master:9000

Hadoop.tmp.dir / / hadoop root directory where all programs are stored

/ var/hadoop

/ / create a root directory on all machines

# for i in {1..4}

> do

> ssh 192.168.4.$ {I} "mkdir / var/hadoop"

> done

/ / configuration parameters describe the website http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

# vim etc/hadoop/hdfs-site.xml

Dfs.namenode.http-address / / configure namenode address

Master:50070

Dfs.namenode.secondary.http-address / / configure secondarynamenode address

Master:50090

Several copies of dfs.replication / / configuration data storage

two

# vim etc/hadoop/slaves / / configure to look for DataNode on those hosts

Node1

Node2

Node3

After the configuration is complete, copy the hadoop folder to all machines

# for i in {2..4}

> do

> rsync-azSH-- delete / usr/local/hadoop 192.168.4.$ {I}: / usr/local/-e "ssh"

> done

/ / execute formatted Hadoop under NameNode

#. / bin/hdfs namenode-format

See successfully formatted. It means that the formatting is successful.

/ / start the cluster without reporting an error

#. / sbin/start-dfs.sh

Execute commands in namenode and datanode respectively after startup

# for i in master node {1..3}

> do

> echo $I

> ssh ${I} "jps"

> done

Master

4562 SecondaryNameNode

4827 NameNode

5149 Jps

Node1

3959 DataNode

4105 Jps

Node2

3957 Jps

3803 DataNode

Node3

3956 Jps

3803 DataNode

#. / bin/hdfs dfsadmin-report / / View nodes that have been successfully registered

Configured Capacity: 160982630400 (149.93 GB)

Present Capacity: 150644051968 (140.30 GB)

DFS Remaining: 150644039680 (140.30 GB)

DFS Used: 12288 (12 KB)

DFS Used%: 0.005%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Live datanodes (3):

Name: 192.168.4.2 50010 node1)

Hostname: node1

Decommission Status: Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3446755328 (3.21 GB)

DFS Remaining: 50214117376 (46.77 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.58%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Mon Jan 29 21:17:39 EST 2018

Name: 192.168.4.4 50010 (node3)

Hostname: node3

Decommission Status: Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3445944320 (3.21 GB)

DFS Remaining: 50214928384 (46.77 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.58%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Mon Jan 29 21:17:39 EST 2018

Name: 192.168.4.3 50010 (node2)

Hostname: node2

Decommission Status: Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3445878784 (3.21 GB)

DFS Remaining: 50214993920 (46.77 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.58%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Mon Jan 29 21:17:39 EST 2018

Namenode

Secondarynamenode

Datanode

Basic use of HDFS

HDFS basic commands are almost the same as shell commands

#. / bin/hadoop fs-ls hdfs://master:9000/

#. / bin/hadoop fs-mkdir / test

#. / bin/hadoop fs-ls /

Found 1 items

Drwxr-xr-x-root supergroup 0 2018-01-29 21:35 / test

#. / bin/hadoop fs-rmdir / test

#. / bin/hadoop fs-mkdir / input

#. / bin/hadoop fs-put * .txt / input / / upload files

#. / bin/hadoop fs-ls / input

Found 3 items

-rw-r--r-- 2 root supergroup 84854 2018-01-29 21:37 / input/LICENSE.txt

-rw-r--r-- 2 root supergroup 14978 2018-01-29 21:37 / input/NOTICE.txt

-rw-r--r-- 2 root supergroup 1366 2018-01-29 21:37 / input/README.txt

#. / bin/hadoop fs-get / input/README.txt / root/ download the file

# ls / root/README.txt

/ root/README.txt

HDFS adds nodes

-1. Configure all hadoop environments, including hostname, ssh password-free login, disable selinux, iptables, install java environment

[root@newnode ~] # yum-y install java-1.8.0-openjdk-devel.x86_64

[root@master ~] # cat / etc/hosts

192.168.4.1 master

192.168.4.2 node1

192.168.4.3 node2

192.168.4.4 node3

192.168.4.5 newnode

-2. Modify the slaves file of namenode to add the node

[root@master ~] # cd / usr/local/hadoop/etc/hadoop/

[root@master hadoop] # echo newnode > > slaves

-3. Copy the configuration file of namnode to the configuration file directory

# cat / root/rsyncfile.sh

#! / bin/bash

For i in node {2..4}

Rsync-azSH-- delete / usr/local/hadoop/etc/hadoop ${I}: / usr/local/hadoop/etc/-e 'ssh' &

Done

Wait

[root@master hadoop] # bash / root/rsyncfile.sh

[root@newnode] # rsync-azSH-- delete master:/usr/local/hadoop / usr/local

-5. Start Datanode on this node

[root@newnode ~] # cd / usr/local/hadoop/

[root@newnode hadoop] #. / sbin/hadoop-daemon.sh start datanode

[root@newnode hadoop] # jps

4007 Jps

3705 DataNode

-6. View cluster status

[root@master hadoop] # cd / usr/local/hadoop/

[root@master hadoop] #. / bin/hdfs dfsadmin-report

Safe mode is ON

Configured Capacity: 2683 04384000 (249.88 GB)

Present Capacity: 249863049216 (232.70 GB)

DFS Remaining: 249862311936 (232.70 GB)

DFS Used: 737280 (720 KB)

DFS Used%: 0.005%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Live datanodes (5):

...

Name: 192.168.4.5 Name 50010 (newnode)

Hostname: newnode

Decommission Status: Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3662835712 (3.41 GB)

DFS Remaining: 49998036992 (46.56 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.17%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Sun Jan 28 20:30:23 EST 2018

...

-7. Set synchronization bandwidth and synchronize data

[root@master hadoop] #. / bin/hdfs dfsadmin-setBalancerBandwidth 67108864

[root@master hadoop] #. / sbin/start-balancer.sh-threshold 5

Reduced node

-configure hdfs-site.xml for NameNode

-number of dfs.replication copies

-add dfs.hosts.exclude configuration

[root@master hadoop] # vim etc/hadoop/hdfs-site.xml

...

Dfs.hosts.exclude

/ usr/local/hadoop/etc/hadoop/exclude

...

-add the exclude configuration file and write to the node ip to be deleted

[root@master hadoop] # vim etc/hadoop/slaves

Node1

Node2

Node3

[root@master hadoop] # vim etc/hadoop/exclude

Newnode

# cat / root/rsyncfile.sh

#! / bin/bash

For i in node {1..5}

Rsync-azSH-- delete / usr/local/hadoop/etc/hadoop ${I}: / usr/local/hadoop/etc/-e 'ssh' &

Done

Wait

[root@master hadoop] # bash / root/rsyncfile.sh

[root@master hadoop] #. / bin/hdfs dfsadmin-refreshNodes

[root@master hadoop] #. / bin/hdfs dfsadmin-report

...

Name: 192.168.4.6 50010 (newnode)

Hostname: newnode

Decommission Status: Decommission in progress / / data migration status

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 3662950400 (3.41 GB)

DFS Remaining: 49997914112 (46.56 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.17%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Sun Jan 28 20:52:01 EST 2018

...

[root@master hadoop] #. / bin/hdfs dfsadmin-report

...

Name: 192.168.4.6 50010 (newnode)

Hostname: newnode

Decommission Status: Decommissioned / / final status

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 3662950400 (3.41 GB)

DFS Remaining: 49997914112 (46.56 GB)

DFS Used%: 0.005%

DFS Remaining%: 93.17%

Configured Cache Capacity: 0 (0B)

Cache Used: 0 (0B)

Cache Remaining: 0 (0B)

Cache Used%: 100.00%

Cache Remaining%: 0.005%

Xceivers: 1

Last contact: Sun Jan 28 20:52:43 EST 2018

...

/ / the node can be stopped only when the node state changes to Decommissioned state.

[root@newnode hadoop] #. / sbin/hadoop-daemon.sh stop datanode

[root@newnode hadoop] # jps

4045 Jps

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.