In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
HDFS Hadoop distributed file system
Distributed file system
Distributed file system can effectively solve the problem of data storage and management.
-extend a file system fixed in a location to any number of locations / file systems
-many nodes form a file system network
-each node can be distributed in different locations for communication and data transmission between nodes through the network
-when using a distributed file system, people do not need to care about which node the data is stored on or from, but only need to manage and store the data in the file system as if they were using a local file system
The role and concept of HDFS
It is the foundation of data storage management in Hadoop system. It is a highly fault-tolerant system for running on low-cost general-purpose hardware.
Roles and concepts
-Client
-Namenode
-Secondarynode
-Datanode
NameNode
-Master node, manages HDFS namespace and block mapping information, configures replica policy, and handles all client requests.
Secondary NameNode
-merge fsimage and fsedits periodically and push them to NameNode
-in case of emergency, it can help restore NameNode.
But Secondary NameNode is not a hot backup for NameNode.
DataNode
-data storage node to store actual data
-report the stored information to NameNode.
Client
-split the file
-access HDFS
-interact with NameNode to get file location information
-interacts with DataNode to read and write data.
Block
-default 64MB size per block
-each piece can have more than one copy.
Build and deploy HDFS distributed file system
The lab environment is ready:
# vim / etc/hosts
.. ..
192.168.4.1master
192.168.4.2node1
192.168.4.3node2
192.168.4.4node3
# sed-ri "/ Host * / aStrictHostKeyChecking no" / etc/ssh/ssh_config
# ssh-keygen
# for i in {1..4}
> do
> ssh-copy-id 192.168.4.$ {I}
> done
# for i in {1.. 4} / / synchronize local domain names
> do
> rsync-a / etc/hosts 192.168.4.$ {I}: / etc/hosts
> done
# rm-rf / etc/yum.repos.d/*
# vim / etc/yum.repos.d/yum.repo / / configure network yum
[yum]
Name=yum
Baseurl= http://192.168.4.254/rhel7
Gpgcheck=0
# for i in {2..4}
> do
> ssh 192.168.4.$ {I} "rm-rf / etc/yum.repos.d/*"
> rsync-a / etc/yum.repos.d/yum.repo 192.168.4.$ {I}: / etc/yum.repos.d/
> done
# for i in {1..4}
> do
> ssh 192.168.4.$ {I} 'sed-ri "s / ^ (SELINUX=). * /\ 1disabled/" / etc/selinux/config; yum-y remove firewalld'
> done
/ / restart all machines
Build fully distributed
System planning:
Host role software
192.168.4.1 master NameNode SecondaryNameNode HDFS
192.168.4.2 node1 DataNode HDFS
192.168.4.3 node2 DataNode HDFS
192.168.4.4 node3 DataNode HDFS
Install the java environment and debugging tool jtarps on all systems
# for i in {1..4}
> do
> ssh 192.168.4.$ {I} "yum-y install java-1.8.0-openjdk-devel.x86_64"
> done
# which java
/ usr/bin/java
# readlink-f / usr/bin/java
/ usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java
Install hadoop
# tar-xf hadoop-2.7.3.tar.gz
# mv hadoop-2.7.3 / usr/local/hadoop
Modify configuration
# cd / usr/local/hadoop/
# sed-ri "s; (export JAVA_HOME=). *;\ 1/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86x64andJRE;" etc/hadoop/hadoop-env.sh
# sed-ri "s; (export HADOOP_CONF_DIR=). *;\ 1Universe Hadoop;" etc/hadoop/hadoop-env.sh
# sed-n "25pscape 33p" etc/hadoop/hadoop-env.sh
Export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre
Export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
/ / configuration parameters describe the website http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml
# vim etc/hadoop/core-site.xml
.. ..
Fs.defaultFS / / default file system
Hdfs://master:9000
Hadoop.tmp.dir / / hadoop root directory where all programs are stored
/ var/hadoop
/ / create a root directory on all machines
# for i in {1..4}
> do
> ssh 192.168.4.$ {I} "mkdir / var/hadoop"
> done
/ / configuration parameters describe the website http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
# vim etc/hadoop/hdfs-site.xml
Dfs.namenode.http-address / / configure namenode address
Master:50070
Dfs.namenode.secondary.http-address / / configure secondarynamenode address
Master:50090
Several copies of dfs.replication / / configuration data storage
two
# vim etc/hadoop/slaves / / configure to look for DataNode on those hosts
Node1
Node2
Node3
After the configuration is complete, copy the hadoop folder to all machines
# for i in {2..4}
> do
> rsync-azSH-- delete / usr/local/hadoop 192.168.4.$ {I}: / usr/local/-e "ssh"
> done
/ / execute formatted Hadoop under NameNode
#. / bin/hdfs namenode-format
See successfully formatted. It means that the formatting is successful.
/ / start the cluster without reporting an error
#. / sbin/start-dfs.sh
Execute commands in namenode and datanode respectively after startup
# for i in master node {1..3}
> do
> echo $I
> ssh ${I} "jps"
> done
Master
4562 SecondaryNameNode
4827 NameNode
5149 Jps
Node1
3959 DataNode
4105 Jps
Node2
3957 Jps
3803 DataNode
Node3
3956 Jps
3803 DataNode
#. / bin/hdfs dfsadmin-report / / View nodes that have been successfully registered
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 150644051968 (140.30 GB)
DFS Remaining: 150644039680 (140.30 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.005%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Live datanodes (3):
Name: 192.168.4.2 50010 node1)
Hostname: node1
Decommission Status: Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3446755328 (3.21 GB)
DFS Remaining: 50214117376 (46.77 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
Name: 192.168.4.4 50010 (node3)
Hostname: node3
Decommission Status: Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3445944320 (3.21 GB)
DFS Remaining: 50214928384 (46.77 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
Name: 192.168.4.3 50010 (node2)
Hostname: node2
Decommission Status: Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3445878784 (3.21 GB)
DFS Remaining: 50214993920 (46.77 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
Namenode
Secondarynamenode
Datanode
Basic use of HDFS
HDFS basic commands are almost the same as shell commands
#. / bin/hadoop fs-ls hdfs://master:9000/
#. / bin/hadoop fs-mkdir / test
#. / bin/hadoop fs-ls /
Found 1 items
Drwxr-xr-x-root supergroup 0 2018-01-29 21:35 / test
#. / bin/hadoop fs-rmdir / test
#. / bin/hadoop fs-mkdir / input
#. / bin/hadoop fs-put * .txt / input / / upload files
#. / bin/hadoop fs-ls / input
Found 3 items
-rw-r--r-- 2 root supergroup 84854 2018-01-29 21:37 / input/LICENSE.txt
-rw-r--r-- 2 root supergroup 14978 2018-01-29 21:37 / input/NOTICE.txt
-rw-r--r-- 2 root supergroup 1366 2018-01-29 21:37 / input/README.txt
#. / bin/hadoop fs-get / input/README.txt / root/ download the file
# ls / root/README.txt
/ root/README.txt
HDFS adds nodes
-1. Configure all hadoop environments, including hostname, ssh password-free login, disable selinux, iptables, install java environment
[root@newnode ~] # yum-y install java-1.8.0-openjdk-devel.x86_64
[root@master ~] # cat / etc/hosts
192.168.4.1 master
192.168.4.2 node1
192.168.4.3 node2
192.168.4.4 node3
192.168.4.5 newnode
-2. Modify the slaves file of namenode to add the node
[root@master ~] # cd / usr/local/hadoop/etc/hadoop/
[root@master hadoop] # echo newnode > > slaves
-3. Copy the configuration file of namnode to the configuration file directory
# cat / root/rsyncfile.sh
#! / bin/bash
For i in node {2..4}
Do
Rsync-azSH-- delete / usr/local/hadoop/etc/hadoop ${I}: / usr/local/hadoop/etc/-e 'ssh' &
Done
Wait
[root@master hadoop] # bash / root/rsyncfile.sh
[root@newnode] # rsync-azSH-- delete master:/usr/local/hadoop / usr/local
-5. Start Datanode on this node
[root@newnode ~] # cd / usr/local/hadoop/
[root@newnode hadoop] #. / sbin/hadoop-daemon.sh start datanode
[root@newnode hadoop] # jps
4007 Jps
3705 DataNode
-6. View cluster status
[root@master hadoop] # cd / usr/local/hadoop/
[root@master hadoop] #. / bin/hdfs dfsadmin-report
Safe mode is ON
Configured Capacity: 2683 04384000 (249.88 GB)
Present Capacity: 249863049216 (232.70 GB)
DFS Remaining: 249862311936 (232.70 GB)
DFS Used: 737280 (720 KB)
DFS Used%: 0.005%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Live datanodes (5):
...
Name: 192.168.4.5 Name 50010 (newnode)
Hostname: newnode
Decommission Status: Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3662835712 (3.41 GB)
DFS Remaining: 49998036992 (46.56 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Sun Jan 28 20:30:23 EST 2018
...
-7. Set synchronization bandwidth and synchronize data
[root@master hadoop] #. / bin/hdfs dfsadmin-setBalancerBandwidth 67108864
[root@master hadoop] #. / sbin/start-balancer.sh-threshold 5
Reduced node
-configure hdfs-site.xml for NameNode
-number of dfs.replication copies
-add dfs.hosts.exclude configuration
[root@master hadoop] # vim etc/hadoop/hdfs-site.xml
...
Dfs.hosts.exclude
/ usr/local/hadoop/etc/hadoop/exclude
...
-add the exclude configuration file and write to the node ip to be deleted
[root@master hadoop] # vim etc/hadoop/slaves
Node1
Node2
Node3
[root@master hadoop] # vim etc/hadoop/exclude
Newnode
# cat / root/rsyncfile.sh
#! / bin/bash
For i in node {1..5}
Do
Rsync-azSH-- delete / usr/local/hadoop/etc/hadoop ${I}: / usr/local/hadoop/etc/-e 'ssh' &
Done
Wait
[root@master hadoop] # bash / root/rsyncfile.sh
[root@master hadoop] #. / bin/hdfs dfsadmin-refreshNodes
[root@master hadoop] #. / bin/hdfs dfsadmin-report
...
Name: 192.168.4.6 50010 (newnode)
Hostname: newnode
Decommission Status: Decommission in progress / / data migration status
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 3662950400 (3.41 GB)
DFS Remaining: 49997914112 (46.56 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Sun Jan 28 20:52:01 EST 2018
...
[root@master hadoop] #. / bin/hdfs dfsadmin-report
...
Name: 192.168.4.6 50010 (newnode)
Hostname: newnode
Decommission Status: Decommissioned / / final status
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 3662950400 (3.41 GB)
DFS Remaining: 49997914112 (46.56 GB)
DFS Used%: 0.005%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0B)
Cache Used: 0 (0B)
Cache Remaining: 0 (0B)
Cache Used%: 100.00%
Cache Remaining%: 0.005%
Xceivers: 1
Last contact: Sun Jan 28 20:52:43 EST 2018
...
/ / the node can be stopped only when the node state changes to Decommissioned state.
[root@newnode hadoop] #. / sbin/hadoop-daemon.sh stop datanode
[root@newnode hadoop] # jps
4045 Jps
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.