In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Filtering algorithm:
Focus on the weight formula:
W = TF * Log (N/DF)
TF: the total number of times the current keyword appears in this record
N: total number of records
DF: the number of entries that the current keyword appears in all records
Namenode HA and namenode Federation of HDFS
(1) resolve a single point of failure:
Use HDFS HA: solve the problem through the master / slave namenode; if the master fails, switch to the slave.
(2) address memory constraints:
Use HDFS Federation, scale horizontally, support multiple namenode, independent of each other. Share all datanode.
The following details are described:
All namenode HA:namenode modifications to metadata will go through journalnode and back up one on the QJM cluster, so the metadata on QJM is the same as that on namenode (the metadata of namenode is the metadata image on QJM). After the namenode is dead, standby's namenode will find the metadata on the QJM cluster and continue to work. If you use namenode Federation, the shared data for each namenode will be on the journalnode cluster. It is equivalent to storing an image of the journalnode cluster on each namenode, and the read and write of the namenode are modified and found on the jn cluster.
When the client first requests hdfs, it visits the zookeeper to check which namenode is dead and which is alive, and to decide which namenode to visit. Any namenode will correspond to a FailoverController, or ZKFC competitive lock. After a namenode dies, there is a competitive lock to choose which namenode to use, and the voting mechanism is used here, so the zookeeper uses an odd number.
Namenode Federation: there are several independent namenode in a cluster, which is equivalent to multiple independent clusters, but share datanode. When clients access these namenode, they choose which namenode to use before they can access and use them.
To add HA to Federation is to add HA to each namenode, independent of each other.
YARN:
YARN is a resource management system that manages HDFS data and knows all about the data; computing framework applies to yarn for resources to calculate, so that resources are not wasted and can run concurrently; it is compatible with other third-party parallel computing frameworks.
In terms of resource management:
ResourceManager: responsible for resource management and scheduling of the entire cluster
ApplicationMaster: responsible for application-related transactions, such as task scheduling, task monitoring, and fault tolerance. It has nodeManager when it works on each node, and there is ApplicationMaster in it.
NodeManager is preferably on datanode's machine, because it is easy to calculate.
Configure the startup hadoop cluster in the way of namenode HA
Configure hdfs-site.xml and its instructions:
This is all about the configuration of hdfs, such as which node has which specific operations.
Dfs.name.dir / root/data/namenode dfs.data.dir / root/data/datanode dfs.tmp.dir / root/data/tmp dfs.replication 1 / / nameservices is the name of the cluster and is the only mark for zookeeper to identify. Mycluster is the name. It can be changed to other dfs.nameservices mycluster / / indicating that there are several namenode and their names under the cluster. Here is the name of the cluster and the address of the rpc protocol corresponding to dfs.ha.namenodes.mycluster nn1,nn2 / / each namenode above, which is used to transfer data. The client uploads and downloads the port of the dfs.namenode.rpc-address.mycluster.nn1 hadoop11:4001 dfs.namenode.rpc-address.mycluster.nn2 hadoop22:4001 dfs.namenode.servicerpc-address.mycluster.nn1 hadoop11:4011 dfs.namenode.servicerpc-address.mycluster.nn2 hadoop22:4011 / / http protocol in order to pass through the network, such as a browser Check the dfs.namenode.http-address.mycluster.nn1 hadoop11:50070 dfs.namenode.http-address.mycluster.nn2 hadoop22:50070 / / of hdfs. Here are the hosts configured with journalnode, which are configured as odd, and which machines in the cluster have journalnode. / / when namenode reads and writes, the address is requested. Journalnode records the file in real time. When the outside world accesses the namenode,namenode, on the one hand, it responds to the request, on the other hand, it asks journalnode to read and write to make a good backup. Dfs.namenode.shared.edits.dir qjournal://hadoop11:8485;hadoop22:8485 The file location of hadoop33:8485/mycluster / / journalNode on the machine The working directory dfs.journalnode.edits.dir / root/data/journaldata/ classes called by namenode activated by external connections / / for the outside world to find the namenode dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider of active / / automatically switch namenode dfs.ha.automatic-failover.enabled true / / location for keys used by one machine to log in to another machine dfs.ha.fencing.methods sshfence / / location of private key file dfs.ha.fencing.ssh.private-key-files / root/.ssh/id_dsa
Configure core-site.xml
/ / this is the unified entrance to hdfs. Mycluster is the unified service ID of the cluster configured by us / / external access is the cluster fs.defaultFS hdfs://mycluster / / hdfs is managed by zookeeper. Here is the address and port of the ZooKeeper cluster. Note that the quantity must be odd and not less than three nodes ha.zookeeper.quorum hadoop11:2181,hadoop22:2181,hadoop33:2181
If a namenode that is not HA becomes HA, execute hdfs-initializeSharedEdits on the host of the namenode to be changed, which can change the metadata on that namenode to metadata on journalnode.
The .bashrc file under root is a configuration file for environment variables and is only available to root users.
Configuration of zookeeper:
One is to configure the dir path first to store files to avoid the loss of zookeeper information after closing.
Server.1=hadoop11:2888:3888server.2=hadoop22:2888:3888server.3=hadoop33:2888:3888
Server.1 refers to the number of zookeeper in the cluster
There is also a dataDir=/root/data/zookeeper in the configuration file of zookeeper, which contains a myid file.
[root@hadoop11 data] # cd zookeeper/
[root@hadoop11 zookeeper] # ls
Myid version-2
This myid file indicates the number of the current zookeeper in the cluster.
Brief description of the configuration process:
Now start zookeeper,zk on every machine. Activate. Don't move
Then, hdfs-daemon.sh journalnode, start journalnode, start namenode on one of the machines, use hdfs namenode-format to get the source file of namenode, you can start the namenode of this node, and then use hdfs namenode-bootstrapStandby on another namenode as a backup node, and the metafiles of the two namenode are the same.
If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode-format) on one of NameNodes.
If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command "hdfs namenode-bootstrapStandby" on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs-initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories.
There is a zkfc on each namenode, which is a failure mechanism that interacts with zookeeper.
The following instructions are executed on a namenode to associate zkfc with zookeeper
Enable zkfc to start normally
Initializing HA state in ZooKeeper
After the configuration keys have been added, the next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.
$hdfs zkfc-formatZK
This will create a znode in ZooKeeper inside of which the automatic failover system stores its data.
Some features of hdfs:
The hadoop-deamon.sh [node] in the sbin directory can be used to open a node on the machine.
Which process can kill the node of a certain process with kill-9?
Start-dfs.sh starts the hdfs of the cluster
After configuring the bin and sbin of hadoop as environment variables, you can use hdfs to do a lot of things, as follows:
[root@hadoop11 ~] # hdfs
Usage: hdfs [--config confdir] COMMAND
Where COMMAND is one of:
Dfs run a filesystem command on the file systems supported in Hadoop.
Namenode-format format the DFS filesystem
Secondarynamenode run the DFS secondarynamenode
Namenode run the DFS namenode
Journalnode run the DFS journalnode
Zkfc run the ZK Failover Controller daemon
Datanode run a DFS datanode
Dfsadmin run a DFS admin client
Haadmin run a DFS HA admin client
Fsck run a DFS filesystem checking utility
Balancer run a cluster balancing utility
Jmxget get JMX exported values from NameNode or DataNode.
Oiv apply the offline fsp_w_picpath viewer to an fsp_w_picpath
Oev apply the offline edits viewer to an edits file
Fetchdt fetch a delegation token from the NameNode
Getconf get config values from configuration
Groups get the groups which users belong to
SnapshotDiff diff two snapshots of a directory or diff the
Current directory contents with a snapshot
LsSnapshottableDir list all snapshottable dirs owned by the current user
Use-help to see options
Portmap run a portmap service
Nfs3 run an NFS version 3 gateway
Cacheadmin configure the HDFS cache
Configuration of YARN
In mapred-site.xml
Mapred-site.xml / / this indicates which framework mapreduce.framework.name yarn mapreduce uses
In yarn-site.xml
Yarn-site.xml / / the following are configured the same in each node, because it indicates which machine is used in the cluster as the resourcemanager / / this is the yarn explorer address For external connection to the resource manager (this) the yarn.resourcemanager.address hadoop1:9080 / / application host communicates with the resource manager yarn.resourcemanager.scheduler.address hadoop1:9081 / / node management The port through which the device communicates with the resource manager If you accompany this in hadoop2, the nodemanager in 2 can find the list of additional services running by 1's resourcemanager yarn.resourcemanager.resource-tracker.address hadoop1:9082 / / Node Manager yarn.nodemanager.aux-services mapreduce_shuffle
Each machine can start nodemanager on its own, using yarn-darmon.sh start nodemanager, where the started nodemanager will find its resourcemanager according to the configuration in the yarn-site.xml file. But in the cluster, nodemanager runs on datanode to manage datanode, so if you specify which machines have datanode in slaves, when start-yarn.sh is used on the host, the host is used as resourcemanager, and nodemanager is launched from the slaves on the node in the file.
There is a yarn on each node, which will form a cluster in an orderly manner according to its own yarn configuration, mainly resourcemanager.
Start yarn at the address required by resourcemanager to start resourcemanager.
To run mapreduce on hadoop:
To package the mapreduce program, put it in the hadoop cluster
Use instruction: hadoop jar [web.jar program name] [class name of the main function] [input file path] [output file path]
For example: hadoop jar web.jar org.shizhen.wordcount / test / output
Then you can check it on output.
The hadoop and yarn in the hadoop cluster itself correspond to a lot of instructions:
Use these instructions to manipulate a process and a node.
[root@hadoop11 ~] # hadoop
Usage: hadoop [--config confdir] COMMAND
Where COMMAND is one of:
Fs run a generic filesystem user client
Version print the version
Jar run a jar file
Checknative [- a |-h] checknative hadoop and compression libraries availability
Distcp copy file or directories recursively
Archive-archiveName NAME-p * create a hadoop archive
Classpath prints the classpath needed to get the
Hadoop jar and the required libraries
Daemonlog get/set the log level for each daemon
Or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
Using these yarn class instructions, you can manipulate mapreduce-related nodes and monitor the flow of programs, such as application.
[root@hadoop11 ~] # yarn
Usage: yarn [--config confdir] COMMAND
Where COMMAND is one of:
Resourcemanager run the ResourceManager
Nodemanager run a nodemanager on each slave
Historyserver run the application historyserver
Rmadmin admin tools
Version print the version
Jar run a jar file
Application prints application (s) report/kill application
Applicationattempt prints applicationattempt (s) report
Container prints container (s) report
Node prints node report (s)
Logs dump container logs
Classpath prints the classpath needed to get the
Hadoop jar and the required libraries
Daemonlog get/set the log level for each daemon
Or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
After the configuration is complete:
Start with zkServer.sh start to start zookeeper, and then start-all.sh to start hadoop.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.