Hadoop Learning-HA configuration (without zookeeper)-day05 10/17 Update SLTechnology News&Howtos

Hadoop Learning-HA configuration (without zookeeper)-day05

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1.HA configuration

(1). High availability, High availability

The ability to continue to serve

Cluster configuration between multiple hosts

(2) .failover, disaster recovery

(3) namenode, 2nn

Reliability problems solved by secondary namenode

(4). Single point of failure (SPOF), single point failure

Nn1+nn2

(5) .NFS (Network file System, shared storage device EMC) + QJM

(6) .ha architecture

Two hosts, one active, the other standby,active nn replication client all operations.

Standby maintains enough states to provide disaster recovery services at any time.

Journal is a separate process for synchronizing information between active nn and standby nn.

The ns modification action of active nn is written to jns,standby nn to read edit from jn. You need to constantly observe the changes of log.

Once the log changes, standby synchronizes to its own namespace.

Datanode sends block list information and heartbeats to both nn simultaneously.

There can be only one active nn at a time. If both of them are active namenode, it is called brain fissure.

two。 Deployment HA configuration resolution

There are eight machines in total.

Hadoop01 192.168.0.11 namenode jouralnode resourcemanager

Hadoop02 192.168.0.12 datanode nodemanager

Hadoop03 192.168.0.13 datanode nodemanager

Hadoop04 192.168.0.14 datanode nodemanager

Hadoop05 192.168.0.15 datanode nodemanager

Hadoop06 192.168.0.16 datanode nodemanager

Hadoop07 192.168.0.17 secondarynamenode jouralnode

Hadoop08 192.168.0.18 namenode jouralnode

(1)。 Configure the name service in hdfs-site.xml: dfs.nameservices

Logical name of the name service

Dfs.nameservices

Mycluster

(2)。 Configure each namenode in nameservice

Dfs.ha.namenodes. [nameservice ID]

Dfs.ha.namenodes.mycluster

Nn1,nn2

Note: the current hadoop2.7.2 can only be configured with a maximum of 2 namenode

(3)。 Configure the rpc address for each namenode

Dfs.namenode.rpc-address.mycluster.nn1

S100:8020

Dfs.namenode.rpc-address.mycluster.nn2

S800:8020

(4)。 Configure the webui address for each namenode

Dfs.namenode.http-address.mycluster.nn1

Machine1.example.com:50070

Dfs.namenode.http-address.mycluster.nn2

Machine2.example.com:50070

(5)。 Configure namenode's shared edit log directory, which is actually a logical directory

Dfs.namenode.shared.edits.dir

Qjournal://s100:8485;s700:8485;s800:8485/mycluster

(6)。 Configure the client disaster recovery agent vendor class

Used by the client to detect which namenode is the loose-leaf node

Dfs.client.failover.proxy.provider.mycluster

Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

(7)。 Optional: configure a collection of HA defense method names

QJM prevents brain fissure. There are no two active node.

You can configure sshfence or shell scripts

(8)。 Configure the file system of hdfs in core-site.xml

Fs.defaultFS

Hdfs://mycluster

(9)。 Configure JN's local data storage (edit log) directory

Dfs.journalnode.edit.dir

/ home/ubuntu/hadoop/journal

3. Configuration process

(1)。 First send the configured hdfs-site.xml and core-site.xml files to all nodes

(2)。 Start the journalnode process on the JN node

$hadoop-daemon.sh start journalnode

(3)。 After starting the JN, you need to synchronize the metadata of the two nn on the disk

If you are building a new hdfs cluster, you need to do format work on a namenode first

If you have formatted the file system or enabled ha on the non-ha cluster, you need to copy the existing nn directory (~ / hadoop/dfs/name) to the same directory of another nn, and execute the command on the unformatted nn: hdfs namnode-bootstrapStandby, which ensures that jn has enough edit to start both nn

If you are converting non-ha namenode to ha, execute the command hdfs namenode-initializeShareEdits, which initializes the log directory data from local's namenode to jns. Start two nn and check the status of each nn through webui

Http://hadoop01:50070/

Http://hadoop08:50070/

(4)。 There will be an error when starting hdfs namenode-initializeShareEdits on S100. Turn off namenode to unlock.

S100 $hadoop-daemon.sh stop namenode

Then execute it again:

S100 $hdfs namenode-initializeSharedEdits

(5)。 Then start two nn

S100 $hadoop-daemon.sh start namenode

S800 $hadoop-daemon.sh start namenode

(6)。 Start all data nodes

S100 $hadoop-daemons.sh start datanode

(7)。 Toggle active nodes

S100 $hdfs haadmin-transitionToActive nn1

At this time, S800 will not be allowed to read.

4. Rack awareness

(1)。 To ensure that data is still available in the event of a switch failure or an event problem within the cluster, configuration rack awareness can be achieved in two ways

A: one is to configure a script for mapping

The topology.script.file.name parameter in core-site.xml specifies the location of the script file

See rack.sh and topology.data for specific scripts. Change the contents of the original topology.data file and add hostname to it to ensure correctness. Because JobTracker is mapped through hostname.

B: the other is to map the network location by implementing the resolve () method of the DNSToSwtichMapping interface. The value of dnsToSwitchMapping is specified by the "topology.node.switch.mapping.impl" parameter in the "core-site.xml" configuration file, and the default value is ScriptBaseMapping, that is, the network location is mapped by reading the script file written in advance, but if the script is not configured, use the default value.

"default-rack" serves as the network location of all nodes. A custom MyResolveNetworkTopology class is placed in the org.apache.hadoop.net directory of the core package.

The configuration in the core-site.xml file is as follows:

Topology.node.switch.mapping.impl

Org.apache.hadoop.net.MyResolveNetworkTopology

The default implementation of the DNSToSwitchMapping. It

Invokes a script specified in topology.script.file.name to resolve

Node names. If the value for topology.script.file.name is not set, the

Default value of DEFAULT_RACK is returned for all node names.

(2)。 Write a script or implement the interface org.apache.hadoop.net.DNSToSwitchMapping

Return data format such as:'/ myrack/myhost'

(3)。 Package the custom rack-aware classes into jar packages and distribute them to ${hadoop_home} / share/hadoop/common/lib/ of all nodes

(4)。 Configure the class, specifying a custom class name

Topology.node.switch.mapping.impl

Refer to the above information for specific format

(5)。 Delete the logs of all nodes and start namenode

$bin/hadoop dfsadmin-printTopology to view rack-aware location information

Copy of data between 5.distcp clusters

(1)。 Through hadoop mr technology to achieve parallel replication, recursive replication of folders. High performance and cross-cluster replication

(2)。 Usage

$hadoop distcp hdfs://s100:8020/user/hadoop/data hdfs://X.X.X.X:8020/user/hadoop/new

14. Archiving

(1) java jar / / jar archieve

(2) hadoop har / / hadoop archieve

$hadoop archive-archiveName new.har-p / user/hadoop/new / user/hadoop

(3)。 Delete the dustbin

$hadoop fs-rm-R-f / user/hadoop/.Trash

(4)。 View the contents of the archive file

$hadoop fs-lsr har:///user/hadoop/new.har

6. Data integrity

(1)。 Checksum

Checksum

(2) CRC-32

Cycle redundent check-32

(3)。 Specify the length of the byte array for the check

Io.bytes.per.checksum

(4). $hdfs dfs-get-crc / user/hadoop/demo.txt. / downloads

A file called .demo.txt.crc appears in the specified folder.

(5). $cat .demo.txt.crc

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.