Summary of common commands in hadoop 07/03 Update SLTechnology News&Howtos

Summary of common commands in hadoop

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Hadoop common commands cluster node service start and stop hdfs file system management yarn resource cluster management hadoop startup process

Cluster node service start and stop

As shown in the figure, the path of the script: main classification commands: manage hdfs and manage yarn resources

Service startup mode: start each node service and cluster script to start the whole service respectively.

Start and stop nn and dn services separately

Hadoop-daemon.sh namenode | datanode | journalnode | zkfc

Hadoop-daemons.sh namenode | datanode | z kfc# means to start all dn/nn services in the cluster

Start the yarn service

Yarn-daemon.sh start | stop resourcemanager | nodemanager | proxyserver

Yarn-daemons.sh start | stop resourcemanager | nodemanager | proxyserver # means to start all rm/nm and other services in the yarn cluster

Start the MapReduce task history service

Mr-jobhistory-daemon.sh start | stop historyserver

[root@hadoop1 ~] # hadoop-daemons.sh start journalnodehadoop1: starting journalnode, logging to / hadoop/soft/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop1.outhadoop2: starting journalnode, logging to / hadoop/soft/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop2.outhadoop3: starting journalnode Logging to / hadoop/soft/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop3.out [root@hadoop1 ~] # jps1628 JournalNode1663 Jps [root@hadoop1 ~] # ssh hadoop2 jps1513 Jps1452 JournalNode [root@hadoop1 ~] # ssh hadoop3 jps1473 Jps1412 JournalNode script manages all services: you need to configure ssh mutual trust and configure slaves files

Start-dfs.sh | stop-dfs.sh # start and stop all hdfs services

Start-yarn.sh | stop-yarn.sh # start and stop all yarn services

Start-all.sh | stop-all.sh # can start and stop all hdfs and yarn services. These two scripts are about to be discarded. It is recommended to use the above two scripts to manage services.

Hdfs File system Management hadoop command uses Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] fs run a generic filesystem user client# to run a file system client version print the version # View version information jar run a jar file # run jar files, Note: run yarn applications distcp # Recursively copy files or directories using * * yarn jar** DistCp (distributed copy) is a tool for copying within and between large clusters. It uses Map/Reduce for file distribution, error handling and recovery, and report generation archive-archiveName NAME-p * # to create a hadoop archive classpath # to list the required class libraries

Bash$ hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo

Commands related to user file system management: operation is basically similar to Linux [root@hadoop2 ~] # hadoop/hdfs fsUsage: hadoop fs [generic options] [- cat [- ignoreCrc].] # View file contents [- checksum.] # View file verification code [- chgrp [- R] GROUP PATH...] # modify file group [- chmod [- R] PATH...] # modify file permissions [- chown [- R] [OWNER] [: [GROUP]] PATH...] # modify the file owner or group [- copyFromLocal [- f] [- p] [- l]. # copy local files to the hdfs file system Similar to the put command [- copyToLocal [- p] [- ignoreCrc] [- crc].] # copy the hdfs file locally, similar to the get command [- cp [- f] [- p |-p [topax].] # allow multi-source replication operations The destination path must be a directory [- createSnapshot []] # create a snapshot [- deleteSnapshot] # delete a snapshot [- df [- h] [...]] # display the file system usage space [- du [- s] [- h].] # display the size of the files contained in a directory and the size of the directory footprint Like the Linux command [- find...] # find the file [- get [- p] [- ignoreCrc] [- crc]...] [- getfacl [- R]] [- getfattr [- R] {- n name |-d} [- e en]] [- help [cmd.]] # View help [- ls [- d] [- h] [- R] [..] ] # replace "hadoop fs-ls-R" [- mkdir [- p]...] # create a directory [- moveFromLocal...] [- moveToLocal] [- mv...] [- put [- f] [- p] [- l]...] [- renameSnapshot] [- rm [- f] [- r |-R] [- skipTrash]...] [- rmdir [--ignore-fail-on-non-] Empty]...] [- setfacl [- R] [{- b |-k} {- m |-x}] | [--set]] [- setfattr {- n name [- v value] |-x name}] [- stat [format]...] [- tail [- f]] [- test-[defsz]] # test command Whether-d is a directory, whether the-e file exists, and whether-z is an empty file, use the same [- text [- ignoreCrc].] [- touchz...] # create an empty 0-byte file [- truncate [- w].] [- usage [cmd...]] # View the usage of the command

Hdfs management command [root@hadoop2 ~] # hdfs haadmin # dfs management client, view hdfs status cluster Usage: haadmin [- transitionToActive [--forceactive]] [- transitionToStandby] [- failover [--forcefence] [--forceactive]] # above 3 about manual failover commands [- getServiceState] # View active or standby status at nn nodes [- checkHealth] # check whether nn nodes are healthy [- help] # View Command help

Usage: hdfs dfsadmin: Note: hdfs Super Admin can run the command [- report [- live] [- dead] [- decommissioning]] # to report the basic information and statistics of the file system [- safemode] # secure mode maintenance command. Safe mode is a state of Namenode, does not accept changes to namespaces (read-only), does not copy or delete blocks [- saveNamespace] # saves the current namespace to the storage directory, starts a new edit-log, requires safe mode [- rollEdits] [- restoreFailedStorage true | false | check] [- refreshNodes] # to re-read the hosts and exclude files, and update those Datanode that need to be exited or newly added that are connected to the NN. [- setQuota...] # limit a directory to contain a maximum number of subdirectories and files # hdfs dfsadmin-setQuota 1t / user/dirname [- clrQuota.] [- setSpaceQuota [- storageType]...] # set a directory to use a maximum space of [- clrSpaceQuota [- storageType]...] [- refreshServiceAcl] [- refreshUserToGroupsMappings] [- refreshSuperUserGroupsConfiguration] [- refreshCallQueue] # refresh request queue [- refresh [arg1..argn] [ -reconfig] # rejoin datanode nodes to the cluster [- refreshNamenodes datanode_host:ipc_port] [- deleteBlockPool datanode_host:ipc_port blockpoolId [force]] [- setBalancerBandwidth] [- fetchImage] [- allowSnapshot] # A directory allows snapshots before snapshots [- disallowSnapshot] [- shutdownDatanode [upgrade]] [- getDatanodeInfo] [- metasave filename] blocks can be redistributed Ctrl-C to stop the balancing process

Hadoop balancer [- threshold] # percentage of disk capacity

The hdfs reaches the balance state and reaches the disk utilization deviation. The lower the value is, the more balanced it is, but the longer the consumption time is.

Yarn resource cluster management [root@hadoop2 ~] # yarn rmadmin # resourcemanager client Usage: yarn rmadmin-refreshQueues # reload queue acl, status and scheduler queue-refreshNodes # refresh host information for RM-refreshSuperUserGroupsConfiguration-refreshUserToGroupsMappings-refreshAdminAcls-refreshServiceAcl-addToClusterNodeLabels [label1,label2,label3] (label splitted by ",")-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by ",")-replaceLabelsOnNode [node1 [: port] = label1,label2 node2 [: port] = label1 Label2]-directlyAccessNodeLabelStore-transitionToActive [--forceactive] # rm Node failover-transitionToStandby-failover [--forcefence] [--forceactive]-getServiceState # check the current rm status-checkHealth-help [cmd] ha hadoop startup process: first startup after installation.

Build a cluster environment:

Step 1: start the zookeeper service. Because ha hadoop relies on zookeeper services.

Step 2:hadoop to start the journalnode log service

Format namenode on hadoop1, only when you start it for the first time. The appearance of this message indicates success.

Hdfs namenode-format

Start the namenode node on node 1. Due to the use of the NN node in the active / standby mode, the nn node on the hadoop2 will not start due to lack of metadata information, so you need to manually synchronize the primary metadata.

Hadoop1:hadoop-daemon.sh start namenode

Hadoop2:hdfs namenode-bootstrapStandby, when starting nn

Format zkfc on hadoop1 without error and success (only before the first boot, no format is required for subsequent startup). (if you do not do this, namenode will not be able to register with zookeeper, and there will be a status where both namenode are ready.

Hdfs zkfc-formatZK

Hadoop-daemon.sh start zkfc # start the failover node service, otherwise the current state is standby!

Then you can start the hdfs and yarn services of each node

Check whether the service processes of each node are running properly.

Webgui access

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.