In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Introduction to CDH:
Reference link:
Https://blog.csdn.net/u013061459/article/details/73368929
Https://www.cnblogs.com/raphael5200/p/5293960.html
In order to build a data-driven business scenario, we need a powerful management tool to manage our business data uniformly and securely. Cloudera was born as a powerful data center management tool. Cloudera not only provides its own market-leading and 100% open source commercial Apache Hadoop distributions (CDH, Cloudera's Distribution including Apache Hadoop) and related components, including a variety of secure and efficient enterprise data management tools, such as Hive, HBase,Oozie, Zookeeper and so on. Hadoop is an open source project of big data under Apache (Open Source web Server Software Foundation). Many commercial companies will redevelop the commercial version on the basis of Apache Hadoop. Cloudera is one of them. The two Hadoop versions being maintained recently are CDH4 and CDH5. Hadoop uses distributed ideas to store, calculate and analyze data. It allows multiple data analysis and calculation tasks to act on the same data block at the same time and carry out distributed computing on the cluster, so as to process very large-scale data. Hadoop is one of the forefathers of big data's processing framework, and Cloudera's Hadoop version CDH is currently the most widely used commercial version of Hadoop. In a broad sense, CDH is a self-packaged commercial software distribution package released by Cloudera, which contains not only the commercial version of Cloudera Hadoop, but also a variety of commonly used open source data processing and storage frameworks, such as Spark,Hive,HBase.
As a powerful commercial data center management tool, Cloudera provides a variety of data computing frameworks that can run quickly and stably. For example, Apache Spark; uses Apache Impala as a high-performance SQL query engine for HDFS,HBase; it also brings Hive data warehouse tools to help users analyze data; users can also use Cloudera management to install HBase distributed column NoSQL database Cloudera also includes the native Hadoop search engine and Cloudera Navigator Optimizer to visually coordinate and optimize the computing tasks on Hadoop to improve running efficiency; at the same time, various components provided in Cloudera allow users to easily manage, configure and monitor Hadoop and all other related components in a visual UI interface, and have a certain degree of fault tolerance and disaster tolerance. As a widely used commercial data center management tool, Cloudera is uncompromising to data security.
What problems can CDH solve?
For a cluster of 1000 servers, at least how long will it take to build a Hadoop cluster, including Hive, Hbase, Flume, Kafka, Spark, etc.
Just give you one day to finish the above work?
For the above clusters to upgrade the hadoop version, which upgrade plan will you choose and how long will it take at least?
The new version of Hadoop is compatible with Hive, Hbase, Flume, Kafka, Spark, etc.
CDH installation environment:
Https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#concept_ap1_q2g_4cb
CDH installation
CDH installation is divided into two parts, Cloudera Manager (CM) installation and CDH installation. CM consists of server side and server side, usually CM is installed first, and CDH is installed and deployed through the administrative console of WEB UI.
There are three official ways to install CDH:
1. Online installation: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_non_production.html
2. Rpm/yum installation: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_install_path_b.html
3. Tar package installation: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_install_path_c.html
CDH offline installation process: installed here as a tar package
Environment:
System: CentOS7.2
JDK version: 1.8
CDH version: 5.14.0
Roles:
Hadoop01: master node, CM-Server,MySQL needs to be installed
Hadoop02: client, installing CM-Client
Hadoop03: client, installing CM-Client
Installation steps:
1. Download the CM installation package and CDH installation package:
Http://archive.cloudera.com/cm5/cm/5/
Download:
Cloudera-manager-centos7-cm5.14.0_x86_64.tar.gz
Https://archive.cloudera.com/cdh6/parcels/5.10.0/
Download:
CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel
CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel.sha1
Manifest.json
2. Install JDK1.8 on all nodes and set the JAVA_HOME environment variable, omitting the process.
3. Time synchronization of all nodes
4. Modify the hostname of all nodes, such as hadoop01
Vim / etc/sysconfig/networkNETWORKING=yesHOSTNAME=hadoop01
5. Modify all / etc/hosts files and turn off the firewall and SELinux
192.168.131.165 hadoop01192.168.131.166 hadoop02192.168.131.168 hadoop03
6. Create hadoop users, generate key pairs, and log in using face-to-face passwords between all nodes.
The production environment sometimes does not allow root accounts to log in, so it is best to create a special account for nodes to connect directly to each other, and the user must have sudo permission
Adduser hadooppasswd hadoop
Ssh-copy-id-I. ssh / id_rsa.pub HADOOP01ssh-copy-id-I. ssh / id_rsa.pub HADOOP02ssh-copy-id-I. ssh / id_rsa.pub HADOOP03
After executing the above command, the hadoop user of each node creates a .ssh directory and generates an authorized_keys file
# copy the key pair to the slave node
Scp id_rsa id_rsa.pub HADOOP02:~/.sshscp id_rsa id_rsa.pub HADOOP03:~/.ssh
Make sure that the following files are found in the .ssh directory under the hadoop home directory of each node
# ll / home/hadoop/.ssh/ Total usage 16 hadoop hadoop-1 hadoop hadoop 397 May 18 10:57 authorized_keys-rw- 1 hadoop hadoop 1675 May 18 10:53 id_rsa-rw-r--r-- 1 hadoop hadoop 397 May 18 10:53 id_rsa.pub-rw-r--r-- 1 hadoop hadoop 1206 May 18 10:59 known_hosts
# hadoop users add sudo permissions
Boundary / etc/sudoers
Add an additional line
Hadoop ALL= (root) NOPASSWD:ALL
# under the hadoop user, execute sudo su-root to switch to the root user, but it is normal if you are not prompted for a password
7. All nodes extract the tar package of cloudera-manager to the / opt directory
Tar zxf cloudera-manager-el6-cm5.10.0_x86_64.tar.gz-C / opt
8. All nodes create users
Sudo useradd-- system-- home=/opt/cm-5.14.0/run/cloudera-scm-server-- no-create-home-- shell=/bin/false-- comment "Cloudera SCM User" cloudera-scm## next step on the Internet, but I don't know what the use is. It seems to be useful for using single-user mode. I did not enable single-user mode echo USER=\ "cloudera-scm\" > > / etc/default/cloudera-scm-agent in the later CM-Server console.
9. All nodes modify the cloudera-scm-agent configuration to set the server_host to the hostname of the primary node
Cd / opt/cloudera-manager/cm-5.14.0/etc/cloudera-scm-agentvim config.ini
Set server_host=hadoop01 (note that all nodes have to be changed)
All node settings / proc/sys/vm/swappiness are set to a maximum of 0. Currently set to 60
Echo 0 > / proc/sys/vm/swappiness
The above is only a temporary change, invalidated after a restart, and the following is a permanent change.
Edit / etc/sysctl.conf file
Set up vm.swappiness=0 and restart the server
Transparent compression is disabled for all hosts, refer to
Https://www.linuxidc.com/Linux/2016-11/137515.htm
Echo never > / sys/kernel/mm/transparent_hugepage/defragecho never > / sys/kernel/mm/transparent_hugepage/enabled
The above is only a temporary modification. If you want to modify it permanently, edit / etc/rc.d/rc.local and add the following
If test-f / sys/kernel/mm/transparent_hugepage/enabled; then echo never > / sys/kernel/mm/transparent_hugepage/enabledfiif test-f / sys/kernel/mm/transparent_hugepage/defrag; then echo never > / sys/kernel/mm/transparent_hugepage/defragfi
Save the exit, and then give the rc.local file execution permission:
[root@localhost ~] # chmod + x / etc/rc.d/rc.local
Finally, restart the system and check later that THP should be disabled.
10. Master node configuration
A. install the dependency package
Yum-y install bind-utils chkconfig cyrus-sasl-gssapi cyrus-sasl-plain fuse fuse-libs gcc httpdyum-y install libxslt mod_ssl openssl openssl-devel perl portmap psmisc sqlite swig zlib
B. Install mysql and start the service
C. Set the login password for the mysql account
Mysqladmin-u root password '123456'
# # creating libraries needed to install CDH components
Mysql > create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;Query OK, 1 row affected (0.00 sec) mysql > create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;Query OK, 1 row affected (0.00 sec) mysql > create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;Query OK, 1 row affected (0.00 sec) mysql > create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;Query OK, 1 row affected (0.00 sec) mysql > create database reports DEFAULT CHARSET utf8 COLLATE utf8_general_ci;Query OK, 1 row affected (0.00 sec)
# set root to authorize access to all the above databases:
# authorized root users have access to all databases on the master node
Grant all privileges on *. * to 'root'@'%' identified by' 123456 'with grant option;flush privileges
D. Download the mysql driver package
Cd / opt/cloudera-manager/cm-5.10.0/share/cmf/libwget http://maven.aliyun.com/nexus/service/local/repositories/hongkong-nexus/content/Mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar
E. Create a library
Create a database for CM
Format:
Scm_prepare_database.sh mysql cm-h-u-p-- scm-host scm scm scm
Corresponding to: database type, database server username and password-the node where scm-host Cloudera_Manager_Server resides.
Cd / opt/cm-5.14.0/share/cmf/schema. / scm_prepare_database.sh mysql cm- hlocalhost-uroot-p123456-- scm-host localhost scm scm scm JAVA_HOME=/usr/java/jdk1.8.0_151Verifying that we can write to / opt/cloudera-manager/cm-5.10.0/etc/cloudera-scm-serverCreating SCM configuration file in / opt/cloudera-manager/cm-5.10.0/etc/cloudera-scm-serverExecuting: / usr/java/jdk1. 8.0_151/bin/java-cp / usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera-manager/cm-5.10.0/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor / opt/cloudera-manager/cm-5.10.0/etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ Main] DbCommandExecutor INFO Successfully connected to database.All done Your SCM database is configured correctly!
The above prompt indicates success. After testing, replacing locaohost with hadoop01 will report an error, indicating that there is no permission. It seems that there is no problem with using localhost.
F. Create the / opt/cloudera/parcel-repo directory on the primary node
Mkdir-p / opt/cloudera/parcel-repo
Copy the following three files to this directory
CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel
Manifest.json
CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel.sha
Note: rename CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel.sha1 to CDH-5.14.2-1.cdh6.14.2.p0.3-el7.parcel.sha
Modify / opt/cloudera/parcel-repo file permissions
Chown cloudera-scm:cloudera-scm / opt/cloudera
G. Create a log directory
Sudo mkdir-p / var/log/cloudera-scm-headlampsudo chown cloudera-scm:cloudera-scm / var/log/cloudera-scm-headlampsudo mkdir-p / var/log/cloudera-scm-firehosesudo chown cloudera-scm:cloudera-scm / var/log/cloudera-scm-firehosesudo mkdir-p / var/log/cloudera-scm-alertpublishersudo chown cloudera-scm:cloudera-scm / var/log/cloudera-scm-alertpublishersudo mkdir-p / var/log/cloudera-scm-eventserversudo chown cloudera-scm:cloudera-scm / var/log/ Cloudera-scm-eventserversudo mkdir-p / var/lib/cloudera-scm-headlampsudo chown cloudera-scm:cloudera-scm / var/lib/cloudera-scm-headlampsudo mkdir-p / var/lib/cloudera-scm-firehosesudo chown cloudera-scm:cloudera-scm / var/lib/cloudera-scm-firehosesudo mkdir-p / var/lib/cloudera-scm-alertpublishersudo chown cloudera-scm:cloudera-scm / var/lib/cloudera-scm-alertpublishersudo mkdir-p / var/lib/cloudera-scm-eventserversudo chown cloudera-scm:cloudera-scm / Var/lib/cloudera-scm-eventserversudo mkdir-p / var/lib/cloudera-scm-serversudo chown cloudera-scm:cloudera-scm / var/lib/cloudera-scm-server
Start the CM-Manager service and the CM-Agent service on the master node, and start the CM-Agent service on all slave nodes
/ opt/cloudera-manager/cm-5.10.0/etc/init.d/cloudera-scm-server start/opt/cloudera-manager/cm-5.10.0/etc/init.d/cloudera-scm-agent start
11. CM console configuration
Log in to http://hadoop01:7180
The account password is all admin.
Configure the administrative account of the cluster, which was created in step 6 above and requires sudo permission
Only HDFS and zookeeper services are installed in my cluster. If you need to install other services, you can add them to the cluster.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.