54.HDFS distributed file system 07/03 Update SLTechnology News&Howtos

54.HDFS distributed file system

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

HDFS distributed file system

The HDFS system is deployed with the help of a hadoop tool. The main advantage of the file system is to improve the reading efficiency of the client. If a piece of 1TB disk data needs to be read, and the reading speed is 100MB/S, if the data in one disk is stored on 100 disks respectively, then when the user reads it, they run in parallel, then the user read operation can be completed instantly.

A HDFS cluster consists of one Namenode running on master and several Datanode running on slave.

Namenode is responsible for managing the file system namespace and client access to the file system.

Datanode is responsible for managing the stored data.

The file is stored in datanode in block form. Suppose a block 20MB, the number of copies of the block is 3, set the number of copies of the block to achieve the effect of redundancy, and prevent data loss after a single datanode disk failure. The same replica blocks are stored in different datanode to achieve redundancy, and large files will be cut into small blocks for storage.

Steps for building a DHFS file system:

Prerequisite environment for Master and Slave servers:

V basic operations such as turning off the firewall

# iptables-F

# setenforce 0

# ifconfig

V configure hosts resolution

# vim / etc/hosts

Modify the content:

192.168.0.133 master

192.168.0.134 slave1

192.168.0.135 slave2

V modify hostname

# vim / etc/sysconfig/network

Modify the content:

NETWORKING=yes

HOSTNAME=master

# hostname master

On the Master server:

V create hadoop running user and password

# useradd hadoop

# passwd hadoop

V deploy JAVA environment

# tar xzvf jdk-7u65-linux-x64.gz

# mv jdk1.7.0_65/ / usr/local/java

V install hadoop software

# tar xzvf hadoop-2.4.0.tar.gz

# mv hadoop-2.4.0 / usr/local/hadoop

# chown-R hadoop.hadoop / usr/local/hadoop

V set environment variable

# vim / etc/profile

Add content:

JAVA_HOME=/usr/local/java

HADOOP_HOME=/usr/local/hadoop

PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# source / etc/profile

V modify hadoop configuration file

# Environment files for vim / usr/local/hadoop/etc/hadoop/hadoop-env.sh Hadoop

Add content:

JAVA_HOME=/usr/local/java

# vim / usr/local/hadoop/etc/hadoop/core-site.xml core configuration file

Add content:

Fs.defaultFS

Hdfs://master:9000

Hadoop.tmp.dir

File:/usr/local/hadoop/tmp

Hadoop

# cp / usr/local/hadoop/etc/hadoop/mapred-site.xml.template / usr/local/hadoop/etc/hadoop/mapred-site.xml

# process configuration file for vim / usr/local/hadoop/etc/hadoop/mapred-site.xml Hadoop

Add content:

Mapred.job.tracker

Master:9001

Mapred.local.dir

/ usr/local/hadoop/var

# vim / usr/local/hadoop/etc/hadoop/hdfs-site.xml Namenode and Datanode configuration files

Add content:

Dfs.namenode.name.dir

File:/usr/local/hadoop/name

Dfs.datanade.data.dir

File:/usr/local/hadoop/data

Dfs.replication

three

Dfs.webhdfs.enable

True

Note:

Namenode is responsible for managing the file system namespace and client access to the file system.

Datanode is responsible for managing the stored data.

# vim / usr/local/hadoop/etc/hadoop/masters

Add content:

Master

# vim / usr/local/hadoop/etc/hadoop/slaves

Add content:

Slave1

Slave2

V deploy SSH, configure verification-free to enable Hadoop

# su hadoop

$ssh-keygen

$ssh-copy-id-I ~ / .ssh/id_rsa.pub hadoop@slave1

$ssh-copy-id-I ~ / .ssh/id_rsa.pub hadoop@slave2

$cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

V synchronize JAVA,HADOOP configuration files to the Slave server through SSH

# scp-r / usr/local/hadoop slave1:/usr/local/

# scp-r / usr/local/java slave1:/usr/local/

# scp-r / etc/profile slave1:/etc/

# scp-r / usr/local/hadoop slave2:/usr/local/

# scp-r / usr/local/java slave2:/usr/local/

# scp-r / etc/profile slave2:/etc/

On the Slave server:

# source / etc/profile

# chown-R hadoop.hadoop / usr/local/hadoop

The operation of the HDFS cluster after deployment:

Operations on the Master server:

V format HDFS file system

# su hadoop

$hdfs namenode-format

If you see the following log information, you will format it successfully:

16-10-13 10:50:22 INFO common.Storage: Storage directory / usr/local/hadoop/name has been successfully formatted.

V check the newly generated directory

$ll / usr/local/hadoop/name/

See:

Drwxr-xr-x. 2 root root 4096 October 13 10:50 current

V start the hadoop cluster

$/ usr/local/hadoop/sbin/start-all.sh

V verify whether the processes of the HDFS node are normal

The Master shows:

[hadoop@master Desktop] $jps

6114 NameNode

6438 ResourceManager

6579 Jps

6304 SecondaryNameNode

Slaves shows: [root@localhost Desktop] # jps

5387 Jps

5303 NodeManager

5191 DataNode

V authentication access

Visit https://192.168.0.133:50070 to see the view system

Add nodes to the DHFS cluster:

2 basic operations such as turning off the firewall

2 configure host resolution

2 modify hostname

2 deploy the JAVA environment

2 set environment variables

2 install hadoop software

2 synchronize configuration files from the Master server to the node server through SSH

2 the new node starts and balances the data already stored by the node

$hadoop-daemon.sh start datanode

$hadoop-daemon.sh start tasktracker

$jps

$hadoop dfsadmin-report to view cluster information

Delete nodes for the DHFS cluster:

$vim / usr/local/hadoop/etc/hadoop/core-site.xml

Add content:

Dfs.hosts.exclude

/ usr/localhadoop/etc/hadoop/exclude

$vim / usr/local/hadoop/etc/hadoop/excludes

Add content:

Name of the node to be deleted by slave4

$hdfs dfsadmin-refreshnodes refresh configuration

$jps

$hadoop dfsadmin-report to view cluster information

The use of Hadoop basic commands

Command

Action

Hadoop fs-help

Help

Hadoop fs-usage

Help

Hadoop fs-ls

Display

Hadoop fs-mkdir

Create

Hadoop fs-put

Upload

Hadoop fs-report

View node status information

Hadoop dfsadmin-safemode enter

Turn on safe mode

Hadoop dfsadmin-safemode leave

Turn on safe mode

Hadoop fs-copyfromlocal local source files HDFS destination directory

Copy local files to HDFS

Hadoop fs-copylocal HDFS file local

Copy files from HDFS to local

Hadoop fs-chgrp group name DHFS file or directory

Modify an attribute group

Hadoop fs-chmode 755 DHFS file or directory

Modify permissions

Hadoop fs-chown is the main genus. Group DHFS file or directory

Modify the owner

Hadoop fs-du DHFS file or directory

Statistics show the file size in the directory

Hadoop fs-getmerge-nl DHFS file local file

Merge Fil

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.