Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The most practical Construction method of cdh4 Cloud Storage

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

What is cdh

CDH is Cloudera's 100% open source Hadoop distribution, built specifically to meet enterprise demands

That is, an open source distributed storage system

What software and functions are included in cdh5

First of all, hbase,hadoop,zookeeper, these are essential.

Secondly, hive,oozie,Map/Reduce can also be integrated into it.

HBase is a distributed, column-oriented open source database. This technology comes from the Google paper "Bigtable: a distributed Storage system for structured data" written by Chang et al.

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of clusters for high-speed computing and storage

ZooKeeper is an official sub-project of Hadoop. It is a reliable coordination system for large-scale distributed systems. It provides functions such as configuration maintenance, name service, distributed synchronization, group service, etc.

Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide complete sql query functions, and convert sql statements into MapReduce tasks to run.

Oozie is a framework that allows us to combine multiple Map/Reduce jobs into a single logical unit of work

MapReduce is a programming model for parallel operations on large datasets (larger than 1TB). The concepts "Map" and "Reduce", and their main ideas, are borrowed from functional programming languages, as well as features borrowed from vector programming languages. It greatly facilitates programmers to run their programs on distributed systems without distributed parallel programming.

III. Installation of cdh5

Generally speaking, the popular way to install cdh5 is to log on to the official website http://www.cloudera.com/blog/2012/02/introducing-cdh5/.

Download the required rpm package, install it all the way through yum according to the official documentation, and finally configure it.

What I want to introduce here is the installation process of installing cdh5 through cloudera-manager.

Cloudera-manager is also a product of the apache Foundation. At present, there are two versions: the free version and the commercial version. The free version only supports 50 nodes, and the commercial version is not limited.

Of course, in general, 50 nodes will be enough. Here we use the free version of cloudera-manager.

Official download address: https://ccp.cloudera.com/display/SUPPORT/Downloads

1. Installation environment

Node1:192.168.1.124 centos6.2 system

Node2:192.168.1.163 centos6.2 system

Iptables shuts down

Selinux shuts down

two。 Install cloudera-manager

Node1:

After the official download, you will get an executable file cloudera-manager-installer.bin

Here we need to install the X Window System package group in advance, the reason is very simple, graphical installation interface

When it is installed here, it will automatically yum install the packages he needs. There are about more than 100m yum installed and downloaded automatically. Because it is a foreign source, coupled with the company's speed limit, China's various policies, and so on, it often leads to the situation that the card will not move and the installation will not be finished in a day.

My installation method is to directly interrupt the installation of the graphical interface, that is, to kill directly. At this time, the yum source that he needs to import has been imported into our system.

According to the connection http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/4.0.4/ in the yum source

Download it manually, such as the package below.

After the download is complete, use yum to install locally

Yum localinstall-- nogpgcheck * .rpm

After the yum installation is complete, rerun cloudera-manager-installer.bin to complete the installation (if the installation fails and prompts you to install it, go to the / usr/share/cmf directory and delete the uninstall-cloudera-manager.sh file)

Attachment 1: both hosts need to be installed, only one running graphical interface, as a console, the other does not need to move, here I am using the node1 node as the console

Attachment 2: the two host jdk should also be installed, otherwise they will be downloaded and installed automatically. It is recommended to use the jdk installed in the rpm package.

3. Install cdh5

After the installation of ①. Cloudera-manager is completed, it will start automatically, and ports such as 7182 and 7180 can be found through netstat-tnlp.

Connect to http://192.168.1.124:7180 through the web page to enter the web management entry of cloudera-manager. By default, the administrator user admin and password admin

After logging in, you will be prompted as follows, that is, whether to use the free version or the commercial version, we choose to use it for free.

two。 Then there is the installation of the full cloudera-manager console web interface, which is very simple.

First search for the host, fill in the two host ip, search for the host, and then select install

Install the version of cdh5, etc., and then there is the installation page of the reader bar, which is the same as installing cloudera-manager. After the Yum source file comes out, it is interrupted directly, and then go back to the system kill to drop the yum process and close the page.

Check the required download software through / etc/yum.repos.d/cloudera-cdh5.repo, connect to http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/4/ to download the following rpm package

And then, as above, yum localinstall-- nogpgcheck *. Rpm

Finally, reopen the http://192.168.1.124:7180 page and reinstall the host.

Attached 1:cloudera-manager console does not redownload and install software packages that have already been installed

Attachment 2: if the network speed is good, you can wait for the installation to be completed without interruption, but if you fail, do not click retry, which will uninstall the installed content, that is, start all over again, due to foreign sources. Internet speed is known to all.

3. After installing and playing with the above content, there will be a host detection. If there are many hosts, it will be relatively slow. This depends on the individual. After testing, you can choose the service. Here I choose hbase,hadoop,zookeeper, and then start the service.

Real-time detection of service status

Real-time detection of host condition

Enter the mainframe and open the hbase shell test

At this point, the cdh5 framework can be used

Attachment: for services that are not selected, they are not started by default. Don't worry about this. If you need to use hive, etc., you can execute it manually.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report