Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

CDH 5 installation tutorial, Kafka installation, LZO installation

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Catalogue

Host list

Basic environment

Basic configuration of cluster host

Configure the NTP service

Configure the MySQL server

Install Cloudera Manager Server and AgentServer

Configure the Server side

Configure the Agent side

Install CDH

Configure and assign CDH5 parcel packages

Install Hadoop cluster and related components

Browse related layouts on the CDH Web side

Install Kafka components

Configure and assign Kafka parcel packages

Install the Kafka service in the cluster

Configure HDFS LZO compression

Configure and assign LZO parcel packages

HDFS related LZO configuration

YARN related LZO configuration

Host list

| | hostname | IP | Memory | CPU | role and service |-|--: |::-- | test1.lan | 192.168.22.11 | 9G | 4 core | cm-agent, Namenode, YARN | test2.lan | 192.168.22.12 | 9G | 4 core | cm-agent, SecondNameode, HBase-Master | | test3.lan | 192.168.22.13 | 9G | 4 core | cm-agent, Datanode, zk-server, kafka-broker | Regionserver | | test4.lan | 192.168.22.14 | 9G | 4 cores | cm-agent, Datanode, zk-server, kafka-broker, Regionserver | | test5.lan | 192.168.22.15 | 9g | 4 cores | cm-agent, Datanode, zk-server, kafka-broker, Regionserver | | test6.lan | 192.168.22.16 | 9G | 4 cores | cm-server MySQL-Server |

Basic environment

CentOS 6 x86_64

Jdk-8u101-linux-x64.rpm

MySQL-5.6.x

NTPd = > On

CDH-5.12.0-1.cdh6.12.0.p0.29-el6.parcel (offline parcel)

Cloudera-manager-el6-cm5.12.0x8664.tar.gz

KAFKA-2.2.0-1.2.2.0.p0.68-el6.parcel

Basic configuration of cluster host

Make sure / directory is at least 100g or more.

SELinux shuts down

IPtables shuts down

Disable Transparent Hugepage Compaction

Set vm.swapiness to 1

Ntp service is enabled, time synchronization (ntpdate is not recommended)

Configure the NTP service

The following configuration should be done once for each host in the cluster

`

Vim / etc/sysconfig/ntpdate SYNC_HWCLOCK=yes / / turn on the hardware clock to save ntpdate time.windows.com synchronously

/ / manually synchronize the clock for the first time to avoid that the ntpd service cannot synchronize vim / etc/ntp.conf server time.windows.com prefer / / add the time synchronization server service ntpd start & & chkconfig ntpd on / / run the time synchronization service ```for the first time due to excessive time deviation

Configure the MySQL server for cm-server

The MySQL service can be installed on the cm-server server or shared with other services

> rpm- qa | grep-I-E "mysql-libs | mariadb-libs" > yum remove-y mysql-libs mariadb-libs & & yum install-y-q crontabs postfix > tar xf MySQL-5.6.35-1.el6.x86_64.rpm-bundle.tar > rpm- ivh MySQL-client-5.6.35-1.el6.x86_64.rpm\ MySQL-shared-*\ MySQL-server-5.6.35-1.el6.x86_64.rpm\ MySQL-devel-5.6.35-1.el6.x86_64.rpm

Install MySQL-Server ```

Wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz tar xf mysql-connector-java-5.1.44.zip ```download mysql-connector.jar (for installation on cm-server server)

> vim / etc/my.cnf [mysqld] character-set-server = utf8 / / initial password is in ~ / .mysql-secret file > mysql- p`default _ secret` sql_cli > SET PASSWORD = PASSWORD ("new_secret") sql_cli > exit

Install Cloudera Manager Server and AgentServer

Cloudera Manager Server is installed in test6.lan AgentServer and each host in the cluster needs to be installed separately. Download address: http://archive-primary.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.12.0x8664.tar.gz

Configure the Server side

After downloading cloudera-manager, upload it to test6.lan and unzip it to / opt directory (only under this directory), because the source of cdh6 will be found in / opt/cloudera/parcel-repo by default.

> tar xf cloudera-manager-el6-cm5.12.0_x86_64.tar.gz-C / opt/

Add cloudera-scm user ```to all nodes in the cluster

Useradd-system-home=/opt/cm-5.12.0/run/cloudera-scm-server/-no-create-home-shell=/bin/false-comment "Cloudera SCM User" cloudera-scm ```

Configure mysql-connector-java on the cm-server node, and create the initial database > cp / path/to/mysql-connector-java-5.1.44-bin.jar / opt/cm-5.12.0/share/cmf/lib/ for Cloudera Manager 5. Then, create the initial database (- psecret is the password of the corresponding account in the database) ```

/ opt/cm-5.12.0/share/cmf/schema/scmpreparedatabase.sh mysql cm- hlocalhost-uroot-psecret-- scm-host localhost scm scm scm ```see Successfully connected to database. All done, you SCM database is configured correctly! That is, the structure of the library structure table is configured successfully.

The Server side running Cloudera Manager 5:

> / opt/cm-5.12.0/etc/init.d/cloudera-scm-server start

Note: when you run Server for the first time, it will take about 5-10 minutes to initialize the data (the server process takes up about 1.5 GB of memory). After initialization, java programs will listen on port 7180 7182.

Configure the Agent side

Modify the host address of server_host in Agent configuration file on the Server side

> vi / opt/cm-5.12.0/etc/cloudera-scm-agent/config.iniserver_host=test6.lan

Copy the Agent program on the Server side to all nodes in the cluster / opt/ directory

> for i in {1.. 5}; do echo "- Start scp to test$ {I} .lan -" scp-r-Q / opt/cm-5.12.0/ test$ {I} .lan: / opt/ echo "# Done #" done

Wait for the replication to succeed, then you can start the Agent program in all nodes of Agent

> / opt/cm-5.12.0/etc/init.d/cloudera-scm-agent start

The Agent program is a Python process, which actively registers information to the server_host node in the configuration file. The Agent is also used to receive relevant instructions sent by the Server side and heartbeat information monitoring.

Install CDH configuration and assign CDH5 parcel package

You need to go back to the test6.lan shell terminal separately and configure CDH5's parcel package (cloudera uses a precompiled bundle to support Hadoop offline installation). The download address of the corresponding CDH parcel package is: http://archive-primary.cloudera.com/cdh6/parcels/5.12.0/ ```

Cd / opt/cloudera/parcel-repo curl-O http://fileserver.lan/CDH5/CDH5-5.12.0-1.cdh6.12.0.p0.29-el6.parcel curl-O http://fileserver.lan/CDH5/CDH5-5.12.0-1.cdh6.12.0.p0.29-el6.parcel.sha1 mv CDH5-5.12.0-1.cdh6.12.0.p0.29-el6.parcel.sha1\ CDH5-5.12.0-1. Cdh6.12.0.p0.29-el6.parcel.sha ```here you need to rename the sha1 file of the corresponding parcel package to CDH5-5.12.0-1.cdh6.12.0.p0.29-el6.parcel.sha Otherwise, cm-server will not recognize the parcel package.

Restart the cloudera-scm-server server

> / opt/cm-5.12.0/etc/init.d/cloudera-scm-server restart

Open http://test6.lan:7180/ and start installing CDH. The default login user password is admin admin.

Agree to the relevant terms

Select the relevant service version

Service packages and information related to this version

Add a cluster host where the currently managed host (5) indicates that it is normal for the Agent side to register with the Server side. If there is only one option here, that is, the new host, then the Agent registration is not normal, please check whether the network or service is normal. You can also choose to connect to the remote node by specifying the hostname or IP.

Select host

Select the parcel package of the relevant supporting components for the cluster installation.

Start parcel package deployment for nodes in the cluster

The deployment alarm at the end of this figure indicates that the cloudera-scm user has not been created. It is true that the node has been forgotten to be created, and you can reverify it after the user has created it.

Overview of deployment Information

Install Hadoop cluster and related components

CDH officially has a matching scheme that has been packaged, and you can also match the components on your own.

Select a few components here, including HBase, HDFS, YARN and Zookeeper (Kafka is provided by a separate parcel package and will be installed separately later)

Configuration parameters of related components

In deployment.

The installation path of the relevant service components in the server file system is as follows

Installation completed

Browse related layouts on the CDH Web side

Modify NameNode's initial default configuration for Heapsize size (1-4G size is recommended). After modifying the configuration, you need to restart the service. Wait a moment after restarting the service, and then the alarm disappears after the relevant subordinate child processes of the service are started. (modify the Heapsize size of NameNode, of course, you also need to modify the Heapsize of SecondaryNameNode)

Install Kafka component configuration and assign Kafka parcel packages

On the Web page, the parcel packages configured and assigned by the current cluster are listed in CVM-> Parcel. Currently, only CDH5,Kafka is configured to exist separately in other parcel packages, so you need to load parcel separately and assign it to each node in the cluster.

The download address of the parcel package for the official Kafka component of Cloudera is as follows: download the percel file and the sha1 string of the file as usual, and then rename * *. Sha1 to *. Sha.

After downloading the above two files, put them in the / opt/cloudera/parcel-repo/ directory of the cm-server node without restarting the server daemon, which can be refreshed, allocated and activated online on the page.

Install the Kafka service in the cluster

Here you need to confirm and modify 2 default configurations

Replication process, default is 1, modified to 3 (depending on business volume)

The number of partitions. The default number of partitions is 50, which is reserved for the time being.

Delete the old topic, open it by default, and make no changes.

Business port is 9092.

Configure HDFS LZO compression configuration and assign LZO parcel packages

The LZO function is also packaged in a separate parcel package to select the package for the corresponding platform. The download address is: http://archive-primary.cloudera.com/gplextras/parcels/latest/, the sha file is not provided directly here, so you need to check the manifest.json file, find the hash value of the corresponding parcel package, and manually save it to the local file.

Download the parcel package and its sha file and store it in the / opt/cloudera/parcel-repo/ directory of cm-server. Just like installing Kafka bundles, refresh, registration, allocation, and activation operations can be completed on the page.

After activating LZO, several dependent services will prompt you to restart and load the new configuration. Don't restart yet, there are several configurations that need to be manually modified separately.

HDFS related LZO configuration

Add a new line to io.compression.codecs and enter com.Hadoop.compression.lzo.LzopCodec to save the configuration.

YARN related LZO configuration

Add a new line to the attribute value of mapreduce.application.classpath and fill in / opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*

Add the attribute value of mapreduce.admin.user.env to / opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native

Just save and restart the dependent services.

Last preview of related services

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report