How to realize offline installation in Hadoop Environment 07/19 Update SLTechnology News&Howtos

How to realize offline installation in Hadoop Environment

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to achieve offline installation in Hadoop environment", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to achieve offline installation in Hadoop environment" this article.

1. Software download

Before installing the server offline, it is necessary to set up a HTTP server in the intranet to install the corresponding software. The software needs to be downloaded locally in advance, so I won't repeat how to download it here.

1.1. Linux installation package

CentOS download address: http://isoredirect.centos.org/centos/6/isos/x86_64/

If you have a DVD installation package (CentOS-6.5-x86_64-bin-DVD1to2), you do not need to download it.

1.2. Hadoop installation package

CDH: http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/

CM: http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/

Impala: http://archive.cloudera.com/impala/redhat/6/x86_64/impala/

Official website address: http://www.cloudera.com/

Note: when downloading the RPM package, please download the package under the noarch directory as well.

2. Yum source configuration 2.1. Set up HTTP server

The default installed CentOS system comes with Apache's HTTP service, which can be started.

# service httpd start

Enter: http://localhost validation in the browser

1.1. Linux source

Before installing Hadoop, you may need some other uninstalled components in addition to the components you installed with the system, so you need the Linux installation package. Before creating the yum software source, delete or back up the source that comes with the system.

# cd / etc/yum.repos.d/

# rm-rf * .repo

1.1.1. Virtual machine mirror source

Please load the image into the virtual optical drive, establish a soft connection in the folder of the server, and connect to the virtual optical drive. The CentOS bin installation package includes two ISO images. It is recommended that you set up two virtual optical drives and load both images.

# cd / var/www/html

# ln-s / media/CentOS_6.5_Final centos

# ln-s / media/CentOS_6.5_Final_ centos2

Create a system software source

# cd / etc/yum.repos.d/

Create a source for centos.repo under this directory, as follows:

1.1.2. Hard disk source

Copy the contents of the system installation disk directly to the hard disk, establish a soft connection in the folder of the http server directory, and connect to the folder where the system installation disk is located.

# cd / var/www/html

# ln-s / tmp/CentOS_6.5_Final centos

# ln-s / tmp/CentOS_6.5_Final_ centos2

Create a system software source

# cd / etc/yum.repos.d/

Create a source for centos.repo under this directory, as follows:

1.2. Hadoop source

The Hadoop source has three parts: CDH, CM, and Impala. We install Hadoop in bulk on all nodes with CM, and manage it in the future.

Cloudera Manager (CM) is the first tool in the industry to manage Hadoop clusters using a graphical interface. Using CM reduces deployment time from a few days to a few hours, providing a cluster-wide, real-time view of running nodes and services that can be used to change configurations across clusters. It also includes reporting and diagnostic tools to observe cluster performance and utilization.

Upload the downloaded CDH, CM, Impala and so on to the HTTP server, tentatively place them under the tmp folder, generate the system verification file repodata folder of the rpm package, and create a soft link under the http directory. If the ftp service is not installed, install the ftp service first.

To create a CM source, if you are installing Hadoop with CM, you only need to create a CM source and install CM. When you use CM for batch installation, when you select the local source of the system, CM will create the corresponding data source.

1. Install the ftp service (optional)

# yum install vsftp

two。 Create a repo source check. If the system does not have the component createrepo installed, install it first.

# yum install createrepo

# cd / tmp

# createrepo CDH

# createrepo CM

# createrepo Impala

3. Create a soft link

# cd / var/www/html

# ln-s / tmp/CDH cdh5

# ln-s / tmp/CM manager

# ln-s / tmp/Impala impala

4. Create a system source

# cd / etc/yum.repos.d

# vi myrepo.repo

2. Linux environment configuration 2.1. Create a user

Create the same username ai and password asiainfo on all node servers. Add this setting to enter the same username and password for bulk installation of cm.

# useradd ai

# passwd ai

2.2. Network configuration

Configure a fixed IP for each machine and set up a power-on automatic connection.

Try not to modify IP after installing cm. Cm will bind IP during installation. Modifying IP will cause cm not to identify the host correctly.

2.3. Modify hostname (hostname)

Open network with root user and change it to the name you want

# vi / etc/sysconfig/network

2.4. Close selinux

# vi / etc/selinux/config

2.5. Child node interworking configuration

# vi / etc/hosts

2.6. User sudo feature Settings

Add this setting to enter the same username and password for bulk installation of cm.

Change the format of the content in the red box of the following image to

Ai ALL= (ALL) NOPASSWD:ALL

# vi / etc/sudoers

2.7. Turn off the firewall

Turn off the firewall and add it to the boot, that is, turn off the firewall when you boot.

# service iptables stop

# service iptables status

# vi / etc/rc.local

2.8. Configure ssh key-free (optional)

# ssh-keygen-t rsa-P''- f ~ / .ssh/id_rsa

# ssh-copy-id 10.0.7.238 (10.0.7.238 is who you want to be key-free with, and you should also set key-free with yourself)

2.9. Configure time synchronization (to be rewritten) 2.9.1. Modify the time zone:

2.9.2. Use pdsh synchronization

Upload pdsh and decompress it

# tar-xvf pdsh-2.26.tar

Enter the pdsh directory and execute the following command

#. / configure-with-ssh-without-rsh

# make

# make install

Execute the following command on the 156master node to synchronize the time of the machine with ip 157,158 with the machine 10.0.7.156, which is 13:51:19

# pdsh-w ssh:10.0.7. [156157158] date-s 13:51:19

To use this synchronization, you must configure ssh to be key-free

2.9.3. Use NTP server synchronization

The part that needs to be set up for the primary node

3. Hadoop installation 3. 1. Install cm

Install and verify cm with the help of cm's bin package. Usually, you can't find the data source when you install the bin package offline. In fact, you only need to install the rpm installation sequence set in bin. If you are worried, you can finally use the bin package to perform the verification again.

Download address: http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin

RPM package installation order:

# rpm-I jdk-6u31-linux-amd64.rpm

# rpm-I cloudera-manager-daemons-4.8.2-1.cm482.p0.101.el6.x86_64.rpm

# rpm-I cloudera-manager-server-4.8.2-1.cm482.p0.101.el6.x86_64.rpm

# rpm-I cloudera-manager-agent-4.8.2-1.cm482.p0.101.el6.x86_64.rpm

# rpm-I cloudera-manager-server-db-4.8.2-1.cm482.p0.101.el6.x86_64.rpm

# rpm-I enterprise-debuginfo-4.8.2-1.cm482.p0.101.el6.x86_64.rpm

#. / cloudera-manager-installer.bin-skip_repo_package=1

Enter the URL http://10.1.195.60:7180/cmf/login in the browser

Username / password: admin/admin

3.2. Install CDH

Select Free version

Enter all the IP or host in the cluster, one for each line, and enter to change the line.

This step requires a lot of choices. Follow the following steps to choose (do) OK.

Make sure that the path entered is the same as the path configured by the HTTP server and is accessible in the browser.

Such as:

Http://10.1.195.60/manager/

Http://10.1.195.60/impala/

Http://10.1.195.60/cdh5/

Users use ai, and all hosts accept the same password, and the password is asiainfo.

Please make sure that it is consistent and that there are scripts to use later.

Click the following figure to check the service, and click "check role assignment". Due to the different number of machines in different provinces, the allocation scheme is different. For details of each province, please refer to the separate settings document.

Check the DataNode data directory, assign the path under / opt, and make sure all hard drives are mounted during installation. Cm automatically selects the largest partition.

The above is all the contents of the article "how to implement offline installation in Hadoop environment". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.