Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Use zookeeper to build hadoop cluster (QJM)

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1: configure password-free login

Using ssh-keygen to generate a key generates a .ssh folder and an id_rsa private key public key in the current directory

Copy the public key to the .ssh directory corresponding to the same host, and then load it into the authorized_keys file

Cat id_rsa.pub > > authorized_keys / / all public keys are appended to this file

Copy the authorized_keys file to the .ssh directory of each host; then log in one by one

You can also use ssh-copy-id myuser@mynode

2: install zookeeper;, configure conf/zoo.cfg;, add each host and myid

Start zookeeper:bin/zkServer.sh start

3: if java is in the form of downloading a compressed package, then you need to configure the system environment

Vim / etc/profile

Add:

Export JAVA_HOME=/home/hadoop/jdk1.7.0_80

Export PATH=$JAVA_HOME/bin:$PATH

Export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Source / etc/profile

4: modify the hadoop configuration file:

The configuration files are all in the etc/hadoop/ directory under the hadoop installation directory

Core-site.xml file:

The corresponding name dfs.journalnode.edits.dir / var/hadoop/jn / / in the fs.defaultFS hdfs://mycluster / / hdfs-site.xml configuration file needs to create the jn/mycluster folder hadoop.tmp.dir / var/hadoop/tmp

Hdfs-site.xml

Dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 master:8020 dfs.namenode.rpc-address.mycluster.nn2 slave-one:8020 dfs.namenode.http-address.mycluster.nn1 master:50070 dfs.namenode.http-address.mycluster.nn2 slave-one:50070 dfs.namenode.shared.edits.dir qjournal://master:8485;slave-one:8485 Slave-two:8485/mycluster dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files / root/.ssh/id_rsa dfs.replication 3 dfs.permissions.enabled False dfs.ha.automatic-failover.enabled true ha.zookeeper.quorum master:2181 Slave-one:2181 Slave-two:2181 dfs.namenode.secondary.http-address master:50090 dfs.namenode.name.dir / var/hadoop/dfs/name dfs.datanode.data.dir / var/hadoop/dfs/data Dfs.replication 2

Configure the mapred-site.xml file:

Mapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.webapp.address master:19888 mapreduce.job.ubertask.enable true

Configure the yarn-site.xml file

Yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id cluster1 yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 master yarn.resourcemanager.hostname.rm2 slave-one yarn.resourcemanager.webapp.address.rm1 master:8088 yarn.resourcemanager.webapp.address.rm2 slave-one:8088 yarn.resourcemanager.zk-address master:2181,slave-one:2181 Slave-two:2181 yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.nodemanager.recovery.enabled true yarn.nodemanager.recovery.dir / var/hadoop/yarn-recovery yarn.nodemanager.address 45454 yarn.nodemanager.aux-services mapreduce_shuffle yarn.log-aggregation-enable true

Create a folder:

Mkdir-p / var/hadoop/jn/mycluster mkdir-p / var/hadoop/tmp mkdir-p / var/hadoop/dfs/name mkdir-p / var/hadoop/dfs/data mkdir-p / var/hadoop/yarn-recovery

Configure the java_home path for hadoop-env.sh:

Export JAVA_HOME=/usr/lib/jvm/java-1.8.0

Configure slaves file: add all node hosts

Master

Slave-one

Slave-two

5: start the journalnode node on every machine: prepare for formatting

Sbin/hadoop-daemon.sh start journalnode

6: format namenode on a host:

Bin/hdfs namenode-format mycluster

7: after the host has formatted namenode, start namenode to make it easy to synchronize namenode information to other hosts

Sbin/hadoop-daemon.sh start namenode

8: synchronize namenode information on other hosts: since only two namenode nodes are needed, only the hosts in the configuration are synchronized

Bin/hdfs namenode-bootstrapstandby

If the synchronization is successful, you can see the id and other relevant information of the cluster. If not, check whether the address monitored by the host is incorrect.

If listening to 127.0.0.1 will cause the connection to fail, modify the / etc/hosts file

9: configure zookeeper failover: zk format the namenode; make sure that the namenode process has been started

Run on a host:

Bin/hdfs zkfc-formatZK

10: shut down all hdfs:sbin/stop-dfs.sh; and restart all sbin/start-dfs.sh

11: configure yarn: need to start on each ResourceManager

Sbin/yarn-daemon.sh start resourcemanager

12: check the startup status of yarn:

Bin/yarn rmadmin-getServiceState rm1 | rm2 / / rm1,rm2 is the defined name

In the current version, a Namespace can only run up to two namenode nodes; for example, mycluster can only have two namenode nodes, nn1 and nn2; nn1 and nn2 names can be named themselves.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report