In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1 basic overview
The command for Hadoop is under ${HADOOP_HOME} / bin, ${HADOOP_HOME} / sbin, ${HADOOP_HOME} / libexec. Contains shell scripts for Linux and batch files for windows. This article mainly parses the shell script under linux.
2 detailed explanation of script
2.1 start-all.sh
To start the nodes and other services of Hadoop, this is an inescapable startup script under ${HADOOP_HOME} / sbin. However, in the 2.x version of Hadoop, Hadoop has been officially deprecated. Next, take a closer look at how the script works:
1. First, there is a comment at the beginning of the script: # Start all hadoop daemons. Run this on master node. Start all processes (that is, each node) and run the script on the management node (that is, the namenode- name node).
2. If it is the 2.x version, there will be a prompt for echo "This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh", which means that the script is out of date and has been replaced by start-dfs.sh and start-yarn.sh.
3. Bin= `dirname "${BASH_SOURCE-$0}" `to extract the absolute path where the start-all.sh is located.
4. Bin= `CD "$bin"; pwd`, change to the directory where start-all.sh is located, and assign the path to bin.
5. DEFAULT_LIBEXEC_DIR= "$bin" /.. / libexec, and get the absolute path of ${HADOOP_HOME} / libexec for later use.
6. HADOOP_LIBEXEC_DIR=$ {HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}, which assigns ternary values to the HADOOP_LIBEXEC_DIR variable. If HADOOP_LIBEXEC_DIR is empty or the environment variable is not configured, the default absolute path is assigned in preparation for the next execution of the script under that directory.
7. $HADOOP_LIBEXEC_DIR/hadoop-config.sh. Execute the ${HADOOP_HOME} / libexec/hadoop-config.sh script. Preprocess the subsequent execution of starting each node and starting YARN.
8. Execute the node startup script
If [- f "${HADOOP_HDFS_HOME}" / sbin/start-dfs.sh]; then
"${HADOOP_HDFS_HOME}" / sbin/start-dfs.sh--config $HADOOP_CONF_DIR
Fi
The intention of this script is that if ${HADOOP_HDFS_HOME} / sbin/start-dfs.sh is a file, the joint-- config parameter and its subsequent parameter values execute start-dfs.sh. Start-dfs.sh will do a detailed analysis later.
Also see the note: # start hdfsdaemons if hdfs is present, which means: start the hdfs process.
9. Execute YRAN scheduling service
If [- f "${HADOOP_YARN_HOME}" / sbin/start-yarn.sh]; then
"${HADOOP_YARN_HOME}" / sbin/start-yarn.sh-- config$HADOOP_CONF_DIR
Fi
The intention of this script is that if ${HADOOP_HDFS_HOME} / sbin/start-yarn.sh is a file, the joint-- config parameter and its subsequent parameter values execute start-yarn.sh. Start-yarn.sh will do a detailed analysis later.
Also see the note: # start yarndaemons if yarn is present, which means: start the yarn process.
Note: ${HADOOP_HDFS_HOME} has been preprocessed in hadoop-config.sh. Hadoop-config.sh will do a detailed analysis later.
2.2 hadoop-config.sh
The script is located under ${HADOOP_HOME} / libexec. This script is a script that must be executed before nodes and other services can be started. Its main purpose is to do some environment variable preprocessing before starting Hadoop. Next, take a closer look at how the script works:
1. The comment at the front of the file: Resolvelinks ($0 may be a softlink) and convert a relative path to an absolute path. Parse the path and convert some relative paths to absolute paths.
2. Execute hadoop-layout.sh script
This= "${BASH_SOURCE-$0}"
Common_bin=$ (cd-P-"$(dirname -" $this ")" & & pwd-P)
Script= "$(basename--" $this ")"
This= "$common_bin/$script"
[- f "$common_bin/hadoop-layout.sh"] & &. "$common_bin/hadoop-layout.sh"
The intention of this script is to find the hadoop-layout.sh and execute it, but not if it is not found. At the same time, there are some variables that prepare for later program execution.
3. Prepare the jar package path of HDFS, YRAN, Mapreduce and set the path
HADOOP_COMMON_DIR=$ {HADOOP_COMMON_DIR:- "share/hadoop/common"}
HADOOP_COMMON_LIB_JARS_DIR=$ {HADOOP_COMMON_LIB_JARS_DIR:- "share/hadoop/common/lib"}
HADOOP_COMMON_LIB_NATIVE_DIR=$ {HADOOP_COMMON_LIB_NATIVE_DIR:- "lib/native"}
HDFS_DIR=$ {HDFS_DIR:- "share/hadoop/hdfs"}
HDFS_LIB_JARS_DIR=$ {HDFS_LIB_JARS_DIR:- "share/hadoop/hdfs/lib"}
YARN_DIR=$ {YARN_DIR:- "share/hadoop/yarn"}
YARN_LIB_JARS_DIR=$ {YARN_LIB_JARS_DIR:- "share/hadoop/yarn/lib"}
MAPRED_DIR=$ {MAPRED_DIR:- "share/hadoop/mapreduce"}
MAPRED_LIB_JARS_DIR=$ {MAPRED_LIB_JARS_DIR:- "share/hadoop/mapreduce/lib"}
4. Set the environment variable of the root directory of ${Hadoop_Home}
HADOOP_DEFAULT_PREFIX=$ (cd-P-"$common_bin" /.. & & pwd-P)
HADOOP_PREFIX=$ {HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}
Export HADOOP_PREFIX
5. Judge the output parameters
If [$#-gt 1]
Then
If ["--config" = "$1"]
Then
Shift
Confdir=$1
If [!-d "$confdir"]; then
Echo "Error: Cannot find configuration directory: $confdir"
Exit 1
Fi
Shift
HADOOP_CONF_DIR=$confdir
Fi
Fi
The meaning of this code parameter: $# indicates the number of parameters entered. If the number is greater than 1, determine whether the parameter after-config is a directory. If not, exit execution, and if you are assigning the passed parameter to the HADOOP_CONF_DIR variable, locate the HADOOP_CONF_DIR variable to the configuration directory of hadoop.
6. Set the log level of hadoop
If [$#-gt 1]
Then
If ["--loglevel" = "$1"]
Then
Shift
HADOOP_LOGLEVEL=$1
Shift
Fi
Fi
HADOOP_LOGLEVEL= "${HADOOP_LOGLEVEL:-INFO}"
This code means that if no log parameter is passed in, it defaults to the INFO level.
7. Set the environment variable of HADOOP_CONF_DIR configuration, which is also the working directory where Hadoop is started (--the parameter value after config)
If [- e "${HADOOP_PREFIX} / conf/hadoop-env.sh"]; then
DEFAULT_CONF_DIR= "conf"
Else
DEFAULT_CONF_DIR= "etc/hadoop"
Fi
ExportHADOOP_CONF_DIR= "${HADOOP_CONF_DIR:-$HADOOP_PREFIX/$DEFAULT_CONF_DIR}"
The intention of this code is: if the parameter value exists, output the parameter value passed in; if not, use the default directory, which is generally located at ${HADOOP_HOME} / etc/Hadoop.
8. Set the environment variables for deploying the hostname configuration of each node
If [$#-gt 1]
Then
If ["--hosts" = "$1"]
Then
Shift
ExportHADOOP_SLAVES= "${HADOOP_CONF_DIR} / $1"
Shift
Elif ["--hostnames" = "$1"]
Then
Shift
Export HADOOP_SLAVE_NAMES=$1
Shift
Fi
Fi
This code reads the list of host configurations in the ${HADOOP_HOME} / ${HADOOP_CONF_DIR} / slaves file and outputs the HADOOP_SLAVES or HADOOP_SLAVE_NAMES environment variables.
9. Set other environment variables for hadoop to run
Similar to the above: if the environment variable has been set in the operating system, use it directly. If not, it is set to the default environment variable. These environment variables are: JAVA_HOME, CLASSPATH, JVM startup parameters, JAVA_LIBRARY_PATH, MALLOC_ARENA_MAX, etc. I will not repeat them here. If there is something you don't understand, check the relevant environment variable configuration information of java yourself.
The following environment variables are also set: HADOOP_HOME, HADOOP_OPTS, HADOOP_COMMON_HOME, TOOL_PATH, HADOOP_HDFS_HOME, LD_LIBRARY_PATH, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME. The settings of these environment variables refer to the previous variables, and I will not repeat them here.
2.3 start-hdfs.sh
The script is located under ${HADOOP_HOME} / sbin. This script is a script that starts each node of the cluster, including the name master node, each name slave node, and each data node. Next, take a closer look at how the script works:
1. The comment at the front of the file: Starthadoop dfs daemons. Start each process of dfs.
2. Prompt variable: usage= "Usage:start-dfs.sh [- upgrade |-rollback] [other options such as-clusterId]". A hint of what main things are needed when executing this script.
3. Execute the preprocessing script, which has been analyzed earlier and will not be discussed in detail.
Bin= `dirname "${BASH_SOURCE-$0}" `
Bin= `CD "$bin"; pwd`
DEFAULT_LIBEXEC_DIR= "$bin" /.. / libexec
HADOOP_LIBEXEC_DIR=$ {HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
That's why the start-all.sh script is not recommended.
4. Judge and set the corresponding variables according to the incoming parameters.
If [[$#-ge 1]]; then
StartOpt= "$1"
Shift
Case "$startOpt" in
-upgrade)
NameStartOpt= "$startOpt"
-rollback)
DataStartOpt= "$startOpt"
*)
Echo $usage
Exit 1
Esac
Fi
This code means: if the number of parameter values is greater than or equal to 1, if the parameter is upgrade, set the nameStartOpt variable for later use; if it is rollback, set the dataStartOpt variable for later use.
5. Add possible parameter values of name node
NameStartOpt= "$nameStartOpt$@"
6. Launch management (main) name node
NAMENODES=$ ($HADOOP_PREFIX/bin/hdfsgetconf-namenodes)
Echo "Starting namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\
-config "$HADOOP_CONF_DIR"\
-hostnames "$NAMENODES"\
-- script "$bin/hdfs" start namenode $nameStartOpt
Pass the parameter values of config, hostnames, and script to the hadoop-daemons.sh execution and start the name master node. It was actually started through the hdfs command.
$($HADOOP_PREFIX/bin/hdfs getconf-namenodes) is the hostname of the hostname node being extracted.
-- script "$bin/hdfs" startnamenode $nameStartOpt actually starts the name node.
A detailed analysis is done later in the hadoop-daemons.sh script.
7. Start the data node
If [- n "$HADOOP_SECURE_DN_USER"]; then
Echo\
"Attempting to start secure cluster,skipping datanodes."\
"Run start-secure-dns.sh as root tocomplete startup."
Else
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\
-config "$HADOOP_CONF_DIR"\
-- script "$bin/hdfs" startdatanode $dataStartOpt
Fi
This paragraph means: pass the parameter values of config and script to hadoop-daemons.sh execution and start the data node node. It is actually started through the hdfs command.
If [- n "$HADOOP_SECURE_DN_USER"]; then this code can be ignored, basically not used.
-- script "$bin/hdfs" startdatanode $dataStartOpt actually starts the name node.
8. Start the slave node
SECONDARY_NAMENODES=$ ($HADOOP_PREFIX/bin/hdfsgetconf-secondarynamenodes 2 > / dev/null)
If [- n "$SECONDARY_NAMENODES"]; then
Echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\
-config "$HADOOP_CONF_DIR"\
-hostnames "$SECONDARY_NAMENODES"\
Script "$bin/hdfs" startsecondarynamenode
Fi
Pass the parameter values of config, hostnames, and script to the hadoop-daemons.sh execution and start the name master node. It was actually started through the hdfs command.
SECONDARY_NAMENODES=$ ($HADOOP_PREFIX/bin/hdfsgetconf-secondarynamenodes 2 > / dev/null) is extracting the hostname of the slave name node.
-- script "$bin/hdfs" start namenode secondarynamenode actually starts the name node.
9. Subsequent execution
After the implementation process, there is time to supplement one by one.
2.4 hadoop-daemons.sh
The script is located under ${HADOOP_HOME} / sbin. This script is a script that starts each node of the cluster (including the name master node, each name slave node, each data node). Next, take a closer look at how the script works:
1. Other code can be ignored.
2. Exec "$bin/slaves.sh"-- config$HADOOP_CONF_DIR cd "$HADOOP_PREFIX"\; "$bin/hadoop-daemon.sh"-config$HADOOP_CONF_DIR "$@"
Disassemble this code:
First execute the slaves.sh script, which iterates to execute the hadoop-daemon.sh script according to the host list of slaves.
B, then pass in all the parameters and execute the hadoop-daemon.sh script.
2.5 hadoop-daemon.sh
The script is located under ${HADOOP_HOME} / sbin. This script is a script that starts each node of the cluster (including the name master node, each name slave node, each data node). Next, take a closer look at how the script works:
1. Call the hadoop-config.sh script first and do not repeat it.
2. Extract the hadoop command and set variables:
HadoopScript= "$HADOOP_PREFIX" / bin/Hadoop
3. Extract command (either start command or stop command), extract node (or name node, or data node)
HadoopScript= "$HADOOP_PREFIX" / bin/hadoop
If ["--script" = "$1"]
Then
Shift
HadoopScript=$1
Shift
Fi
StartStop=$1
Shift
Command=$1
Shift
4. Output log correlation
Hadoop_rotate_log ()
{
Log=$1
Num=5
If [- n "$2"]; then
Num=$2
Fi
If [- f "$log"]; then # rotatelogs
While [$num-gt 1]; do
Prev= `expr $num-1`
[- f "$log.$prev"] & & mv "$log.$prev"$log.$num"
Num=$prev
Done
Mv "$log"$log.$num"
Fi
}
5. Execute the script for setting environment variables
If [- f "${HADOOP_CONF_DIR} / hadoop-env.sh"]; then
. "${HADOOP_CONF_DIR} / hadoop-env.sh"
Fi
6. Reset different variable values according to different startup nodes
# Determine if we're starting a secure datanode, and if so, redefine appropriatevariables
If ["$command" = = "datanode"] & & ["$EUID"-eq 0] & & [- n "$HADOOP_SECURE_DN_USER"]; then
ExportHADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
ExportHADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
ExportHADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
Starting_secure_dn= "true"
Fi
# Determineif we're starting a privileged NFS, if so, redefine the appropriate variables
If ["$command" = = "nfs3"] & & ["$EUID"-eq0] & & [- n "$HADOOP_PRIVILEGED_NFS_USER"]; then
ExportHADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR
ExportHADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR
ExportHADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER
Starting_privileged_nfs= "true"
Fi
7. The middle ones are all output logs and setting other temporary variables and environment variables. Please refer to the script without focusing on it.
8. Start or stop nodes
Case$startStop in
(start)
[- w "$HADOOP_PID_DIR"] | | mkdir-p "$HADOOP_PID_DIR"
If [- f $pid]; then
If kill-0 `cat $pid` > / dev/null2 > & 1; then
Echo $command running as process `cat$ pid`. Stop it first.
Exit 1
Fi
Fi
If ["$HADOOP_MASTER"! = "]; then
Echo rsync from $HADOOP_MASTER
Rsync-a-e ssh-delete-exclude=.svn--exclude='logs/*'-exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"
Fi
Hadoop_rotate_log $log
Echo starting $command, logging to $log
Cd "$HADOOP_PREFIX"
Case $command in
Namenode | secondarynamenode | datanode | journalnode | dfs | dfsadmin | fsck | balancer | zkfc)
If [- z "$HADOOP_HDFS_HOME"]; then
HdfsScript= "$HADOOP_PREFIX" / bin/hdfs
Else
HdfsScript= "$HADOOP_HDFS_HOME" / bin/hdfs
Fi
Nohup nice-n $HADOOP_NICENESS$hdfsScript-- config $HADOOP_CONF_DIR $command "$@" > "$log" 2 > & 1
< /dev/null & ;; (*) nohup nice -n $HADOOP_NICENESS$hadoopScript --config $HADOOP_CONF_DIR $command "$@" >"$log" 2 > & 1
< /dev/null & ;; esac echo $! >$pid
Sleep 1
Head "$log"
# capture the ulimit output
If ["true" = "$starting_secure_dn"]; then
Echo "ulimit-a for secure datanodeuser $HADOOP_SECURE_DN_USER" > > $log
# capture the ulimit info for theappropriate user
Su-- shell=/bin/bash$HADOOP_SECURE_DN_USER-c 'ulimit-a' > > $log 2 > & 1
Elif ["true" = "$starting_privileged_nfs"]; then
Echo "ulimit-a for privileged nfsuser $HADOOP_PRIVILEGED_NFS_USER" > > $log
Su-- shell=/bin/bash $HADOOP_PRIVILEGED_NFS_USER-c 'ulimit-a' > > $log 2 > & 1
Else
Echo "ulimit-a for user $USER" > > $log
Ulimit-a > > $log 2 > & 1
Fi
Sleep 3
If! Ps-p $! > / dev/null; then
Exit 1
Fi
(stop)
If [- f $pid]; then
TARGET_PID= `cat $pid`
If kill-0 $TARGET_PID > / dev/null2 > & 1; then
Echo stopping $command
Kill $TARGET_PID
Sleep $HADOOP_STOP_TIMEOUT
If kill-0 $TARGET_PID > / dev/null2 > & 1; then
Echo "$command did not stopgracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill-9"
Kill-9 $TARGET_PID
Fi
Else
Echo no $command to stop
Fi
Rm-f $pid
Else
Echo no $command to stop
Fi
(*)
Echo$usage
Exit 1
Esac
This is a branch execution process, which is divided into start node and stop node:
A, start the node branch: under this branch, first determine whether the PID (in fact, the default port) on the node host is occupied, and stop starting if it is occupied. If it is not occupied, start printing the log, which is also the log we saw on the console (echo starting $command, logging to $log). Indicates the start of the hadoop process on the node, then determines which node to start according to the $command variable, and finally passes all the parameters to the hdfs command to start the node, and the node starts.
B. Stop the node branch: under this branch, first determine whether the PID (the default port) on the node host is occupied, kill the process if it is occupied, and use kill-9 to force it to kill if it is not killed. If the PID is not occupied, the prompt message "there is no XX node to stop" is printed, and the node is out of service.
2.6 hdfs
The script first determines whether any parameters are passed in or whether the parameters are correct, and if there are no parameters or the parameters are incorrect, it will prompt the user to pass the parameters or pass the correct parameters.
Then call the corresponding script under jdk (JAVA_HOME) after judging according to the parameter value, and finally start the corresponding java process.
All in all, this script is the ultimate destination of starting hadoop. If you can master all the entry functions in hadoop, it can also be considered that this script is not the final destination, because the ultimate goal of this script is to start the necessary entry function of Hadoop, that is, the main method, by calling the java command under jdk, and finally achieve the goal of starting the entire Hadoop cluster.
2.7 start-yarn.sh
The execution process of this script is similar to that of start-hdfs. Please refer to the shell script yourself. I won't repeat it here.
2.8 yarn-daemons.sh
The execution process of this script is similar to that of hadoop-daemons.sh. Please refer to the shell script yourself. I won't repeat it here.
2.9 yarn-daemon.sh
The execution process of this script is similar to that of hadoop-daemon.sh. Please refer to the shell script yourself. I won't repeat it here.
2.10 each stop script
As can be seen from the above hadoop-daemon.sh, to stop the service of each node is to extract the corresponding environment variables first, then kill the corresponding process, and release the occupied port. If you are interested, you can study the stop script on your own. I will not do a detailed analysis here.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.