Using Ambari to build Hadoop Cluster 07/19 Update SLTechnology News&Howtos

Using Ambari to build Hadoop Cluster

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop introduction

Hadoop is a software platform that develops and runs to deal with large-scale data. it is an open source software framework for Apache to implement distributed computing of massive data in a cluster composed of a large number of computers The core designs of the Hadoop framework are MapReduce and HDFS. MapReduce provides distributed computing of data, while HDFS provides distributed storage of massive data.

In the Hadoop family, there are more than 20 components and tools for computing, analysis, storage, monitoring, management and so on. These family members greatly enrich the functions of Hadoop.

Common family members of Hadoop

Here are a few common components:

Apache Hadoop: a distributed computing open source framework of the Apache open source organization, it provides a distributed file system subproject (HDFS) and a software architecture that supports MapReduce distributed computing. Apache Hive: is a data warehouse tool based on Hadoop, which can map structured data files to a database table, and quickly realize simple MapReduce statistics through SQL-like statements, without the need to develop special MapReduce applications, so it is very suitable for statistical analysis of data warehouse. Apache Pig: a large-scale data analysis tool based on Hadoop, it provides a SQL-LIKE language called Pig Latin. The compiler of this language converts SQL-like data analysis requests into a series of optimized MapReduce operations. Apache HBase: a distributed storage system with high reliability, high performance, column-oriented and scalable. Large-scale structured storage cluster can be built on cheap PC Server by using HBase technology. Apache Sqoop: is a tool for transferring data from Hadoop and relational databases to each other. Data from a relational database (MySQL, Oracle, Postgres, etc.) can be imported into Hadoop's HDFS, and HDFS data can also be imported into relational databases. Apache Zookeeper: a distributed, open source coordination service designed for distributed applications, it is mainly used to solve some data management problems often encountered in distributed applications, simplify the difficulty of distributed application coordination and management, and provide high-performance distributed services Apache Mahout: it is a distributed framework for machine learning and data mining based on Hadoop. Mahout implements some data mining algorithms with MapReduce, which solves the problem of parallel mining. Apache Cassandra: is an open source distributed NoSQL database system. Originally developed by Facebook, it is used to store simple format data, integrating the data model of Google BigTable and the fully distributed architecture of Amazon Dynamo. Apache Avro: it is a data serialization system designed to support data-intensive, mass data exchange applications. Avro is a new data serialization format and transmission tool, which will gradually replace Hadoop's original IPC mechanism Apache Ambari: it is a Web-based tool that supports the supply, management and monitoring of Hadoop clusters. Apache Chukwa: an open source data collection system for monitoring large distributed systems, it can collect various types of data into files suitable for Hadoop processing and save them in HDFS for Hadoop to do various MapReduce operations. Apache Hama: a HDFS-based BSP (Bulk Synchronous Parallel) parallel computing framework, Hama can be used for large-scale, big data computing, including graphs, matrices, and network algorithms. Apache Flume: is a distributed, reliable, highly available mass log aggregation system, which can be used for log data collection, log data processing, log data transmission. Apache Giraph: a scalable, distributed iterative graph processing system based on the Hadoop platform, inspired by BSP (bulk synchronous parallel) and Google's Pregel. Apache Oozie: a workflow engine server for managing and coordinating tasks running on the Hadoop platform (HDFS, Pig, and MapReduce). Apache Crunch: a Java library based on Google's FlumeJava library for creating MapReduce programs. Similar to Hive,Pig, Crunch provides a pattern library Apache Whirr for common tasks such as connecting data, performing aggregation and sorting records: a set of class libraries (including Hadoop) running on cloud services that provide a high degree of complementarity. Whirr Learning supports Amazon EC2 and Rackspace services. Apache Bigtop: a tool for packaging, distributing and testing Hadoop and its surrounding ecology. Apache HCatalog: is based on Hadoop data table and storage management, to achieve central metadata and schema management, across Hadoop and RDBMS, using Pig and Hive to provide relational views. Cloudera Hue: is a WEB-based monitoring and management system to achieve HDFS,MapReduce/YARN, HBase, Hive, Pig web operation and management. Ambari introduction

Like Hadoop and other open source software, Ambari is also a project in Apache Software Foundation, and it is a top-level project. The latest release is 2.6.0. Ambari is a cluster used to create, manage, and monitor Hadoop, but here Hadoop is broad, referring to the entire biosphere of Hadoop (such as Hive,Hbase,Sqoop,Zookeeper, etc.), not just Hadoop. In a word, Ambari is a tool to make Hadoop and related big data software easier to use.

Ambari provides a more convenient and fast management function for Hadoop, which mainly includes:

Cluster provisioning is simplified through a step-by-step installation wizard. With key operation and maintenance indicators (metrics) pre-configured, you can directly check whether Hadoop Core (HDFS and MapReduce) and related projects (such as HBase, Hive and HCatalog) are healthy. Support the visualization and analysis of job and task execution to better view dependencies and performance. The monitoring information is exposed through a complete RESTful API, which integrates the existing operation and maintenance tools. The user interface is very intuitive, and users can easily and effectively view information and control the cluster. Deploy a Hadoop cluster using Ambari

When using Ambari to install and deploy Hadoop, you need to configure the local environment for the download mirror source.

Configuration description host:

Node-1: 192.168.10.11, configuration: 2C8G-30G, yum mirror source, database, java environment

Node-2: 192.168.10.12, configuration: 2C8G-30G, java environment

Package Hadoop Software Image package, Ambari Image package, version 2.6

Configure a local yum source

1. Configure the yum source for all components of hadoop on node-1. Install httpd:

[root@node-1 ~] # yum install httpd-y

2. Download the image file officially. This file is about 7 gigabytes and can be downloaded using P2P tool. It contains two repo files and four compressed packages:

Wget http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.1.0/ambari.repowget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.4.0/hdp.repowget http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.1.0/ambari-2.6.1.0-centos7.tar.gzwget http: / / public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.22/repos/centos7/HDP-UTILS-1.1.0.22-centos7.tar.gzwget http://public-repo-1.hortonworks.com/HDP-GPL/centos7/2.x/updates/2.6.4.0/HDP-GPL-2.6.4.0-centos7-rpm.tar.gzwget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/ Updates/2.6.4.0/HDP-2.6.4.0-centos7-rpm.tar.gz

3. Extract the corresponding tar package to the file directory of httpd:

[root@node-1 html] # tar xf ambari-2.6.1.0-centos7.tar.gz [root@node-1 html] # tar xf HDP-2.6.4.0-centos7-rpm.tar.gz [root@node-1 html] # tar xf HDP-GPL-2.6.4.0-centos7-rpm.tar.gz [root@node-1 html] # mkdir HDP-UTILS [root@node-1 html] # tar xf HDP-UTILS-1.1.0.22-centos7.tar.gz-C HDP-UTILS/

4. Configure the basic source, create the repo file of hadoop, and modify the source path of repo file:

# ambari source vim / etc/yum.repo.d/ambari.repo [ambari-2.6.1.0] name=ambari Version-ambari-2.6.1.0baseurl= http://192.168.10.11/ambari/centos7/2.6.1.0-143gpgcheck=1gpgkey=http://192.168.10.11/ambari/centos7/2.6.1.0-143/RPM-GPG-KEY/RPM-GPG-KEY-Jenkinsenabled=1priority=1# HDP source: vim / etc/yum.repo.d/hdp.repo#VERSION _ NUMBER=2.6.4.0-91 [HDP-2.6.4.0] name=HDP Version-HDP-2.6.4.0baseurl= http://192.168.10.11/HDP/centos7/2.6.4.0-91gpgcheck=1gpgkey=http://192.168.10.11/HDP/centos7/2.6.4.0-91/RPM-GPG-KEY/RPM-GPG-KEY-Jenkinsenabled=1priority=1[HDP-UTILS-1.1.0.22]name=HDP-UTILS Version-HDP-UTILS-1.1 .0.22baseurl = http://192.168.10.11/HDP-UTILS/gpgcheck=1gpgkey=http://192.168.10.11/HDP-UTILS/RPM-GPG-KEY/RPM-GPG-KEY-Jenkinsenabled=1priority=1[HDP-GPL-2.6.4.0]name=HDP-GPL Version-HDP-GPL-2.6.4.0baseurl= http://192.168.10.11/HDP-GPL/centos7/2.6.4.0-91gpgcheck=1gpgkey=http://192.168.10.11/HDP-GPL/centos7/ 2.6.4.0-91/RPM-GPG-KEY/RPM-GPG-KEY-Jenkinsenabled=1priority=1

Start httpd.

5. Copy the repo configuration of the local source to other nodes and create a cache:

[root@node-1 ~] # scp / etc/yum.repos.d/ambari.repo 192.168.10.12:/etc/yum.repos.d/ [root @ node-1 ~] # scp / etc/yum.repos.d/ambari.repo 192.168.10.13:/etc/yum.repos.d/ [root @ node-1 ~] # scp / etc/yum.repos.d/hdp.repo 192.168.10.12:/etc/yum.repos.d/ [root@ Node-1 ~] # scp / etc/yum.repos.d/hdp.repo 192.168.10.13:/etc/yum.repos.d/

Create a cache at each node:

# yum clean all# yum makecache fast initialization environment

1. Install java-1.8.0-openjdk on each node:

Yum install java-1.8.0-openjdk-y

2. Resolve the host name:

Echo "192.168.10.11 node-1" > > / etc/hosts echo "192.168.10.12 node-2" > > / etc/hosts echo "192.168.10.13 node-3" > > / etc/hosts

3. Create a host trust relationship, mainly from master node to slave node:

[root@node-1] # ssh-keygen-t rsa [root@node-1 ~] # cp id_rsa.pub / root/.ssh/authorized_keys [root@node-1 ~] # scp id_rsa.pub 192.168.10.12:/root/.ssh/authorized_keys [root@node-1 ~] # scp id_rsa.pub 192.168.10.13:/root/.ssh/authorized_keys

4. Install the configuration database:

Yum install mariadb-server-ysystemctl start mariadbmysql_secure_installation # create database: MariaDB [(none)] > create database ambari default character set utf8;Query OK, 1 row affected (0.00 sec) MariaDB [(none)] > grant all on ambari.* to ambari@localhost identified by 'bigdata';Query OK, 0 rows affected (0.00 sec) MariaDB [(none)] > grant all on ambari.* to ambari@'%' identified by' bigdata' Query OK, 0 rows affected (0.00 sec) MariaDB [(none)] > create database hive default character set utf8;Query OK, 1 row affected (0.00 sec) MariaDB [(none)] > grant all on hive.* to hive@localhost identified by 'hive';Query OK, 0 rows affected (0.00 sec) MariaDB [(none)] > grant all on hive.* to hive@'%' identified by' hive'; install Amabri service

1. Install ambari-server on node-1 and launch the configuration wizard:

[root@node-1 ~] # yum install ambari-server-y [root @ node-1 ~] # ambari-server setup

Tip: if you install and configure a user, the following error occurs:

ERROR: Unexpected error 'getpwuid (): uid not found: 1001permission, you can view the permission of the ambari.repo file, and change it to the default permission of root 644.

2. Configure the user according to the information of the configuration wizard, java_home:

[root@node-1] # ambari-server setupUsing python / usr/bin/pythonSetup ambari-serverChecking SELinux...SELinux status is' disabled'Customize user account for ambari-server daemon [yzone] (n)? YEnter user account for ambari-server daemon (root): ambari Adjusting ambari-server permissions and ownership...Checking firewall status...Checking JDK... [1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8 [2] Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7 [3] Custom JDK====Enter choice (1): 3WARNING: JDK must be installed on all hosts and JAVA_HOME must be valid on all hosts.WARNING: JCE Policy files are required for configuring Kerberos security. If you plan to use Kerberos Please make sure JCE Unlimited Strength Jurisdiction Policy Files are valid on all hosts.Path to JAVA_HOME: / usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre # fill in java_homeValidating JDK on Ambari Server...done.Checking GPL software agreement...GPL License for LZO: https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlEnable Ambari Server to download and install GPL Licensed LZO packages [YBO] (n)? NCompleting setup...Configuring database...Enter advanced database configuration [YBO] (n)? Y Configuring database...====Choose one of the following options: [1]-PostgreSQL (Embedded) [2]-Oracle [3]-MySQL / MariaDB [4]-PostgreSQL [5]-Microsoft SQL Server (Tech Preview) [6]-SQL Anywhere [7]-BDB====Enter choice (1): 3Hostname (localhost): Port (3306): Database name (ambari): Username (ambari): Enter Database Password (bigdata): Configuring ambari database...WARNING: Before starting Ambari Server You must copy the MySQL JDBC driver JAR file to/ usr/share/java and set property "server.jdbc.driver.path= [path / to/custom_jdbc_driver]" in ambari.properties.Press to continue.

3. When you go to the above step, upload the jdbc driver of mysql according to the prompt, and modify the configuration file to specify the location of jdbc driver file:

[root@node-1 ~] # cd / usr/share/ java [root @ node-1 java] # lltotal 3388 RWMUR mv mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar-1 root root 3467861 Jan 22 16:16 mysql-connector-java-5.1.45.tar.gz [root@node-1 java] # tar xf mysql-connector-java-5.1.45.tar.gz [root@node-1 java] # mv mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar. /

Modify the configuration file:

Vim / etc/ambari-server/conf/ambari.properties server.jdbc.driver.path=/usr/share/java/mysql-connector-java-5.1.45-bin.jar

When you continue after the configuration is complete, the following prompt appears:

Press to continue.Configuring remote database connection properties...WARNING: Before starting Ambari Server, you must run the following DDL against the database to create the schema: / var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sqlProceed with configuring remote database connection properties [YBO] (y)?

4. When prompted above, import the database according to the information:

[root@node-1] # mysql-uroot-p ambari

< /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql 5、启动服务： [root@node-1 ~]# ambari-server start 6、服务启动成功后，会监听8080端口，使用浏览器登录，账号密码admin/admin正常登录，则安装完成。

Create a cluster

Use the web interface lesson of ambari to complete all the management operations of the cluster and create an instance cluster.

Select the local source and remove unnecessary versions:

Add the host node and upload the id_rsa file:

After the installation is successful, the following interface is displayed:

Select the services that need to be installed:

Select the service configuration as needed:

In the subsequent configuration, you will be prompted to configure the account password, database and other information, according to the prompts to configure.

Reference documentation:

Https://baike.baidu.com/item/Ambari

Https://baike.baidu.com/item/Hadoop/3526507?fr=aladdin

Http://blog.fens.me/hadoop-family-roadmap/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.