Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Summary of cdh official documents (including optimization items) 001

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Friday, 2019-3-22

1. After installation, all those who can install getway should install getway.

2. Static resource allocation is not enabled by default.

3. Cloudera uses the following version control convention: major.minor.maintenance. If the cluster is running Cloudera Manager 5.14.0, the major version is 5, the minor version is 14, and the maintenance version is 0.

The Cloudera Manager minor version must always be equal to or greater than the CDH minor version. Older versions of Cloudera Manager may not support features in newer versions of CDH.

For example, Cloudera Manager 5.12.0 can manage CDH 5.12.2 because the minor version is the same. Cloudera Manager 5.12.0 cannot manage CDH 5.14.0 because Cloudera Manager minor version 12 is smaller than CDH minor version 14.

Important: using Cloudera Manager 6.0.x to manage CDH 5.15.x or CDH 5.16 clusters is not a supported configuration.

Operating systems supported by CDH and Cloudera Manager

4. Cloudera strongly opposes the use of RHEL 5 for new installations.

5. Cloudera does not support CDH cluster deployment in Docker containers.

6. Kudu file system requirements-Kudu is supported on ext4 and XFS.

7. The Linux file system retains records to access the metadata of each file. This means that even reading results in writing to disk. To speed up file reads, Cloudera recommends that you disable this option, called atime, using the mount option in / etc / fstab:

Specific operation / / optimization item *

[root@NewCDH-0--141 ~] # vim / etc/fstab# # / etc/fstab# Created by anaconda on Tue Oct 10 15:41:01 2017 September # Accessible filesystems, by reference, are maintained under'/ dev/disk'# See man pages fstab (5), findfs (8) Mount (8) and/or blkid (8) for more info#/dev/mapper/centos-root / xfs defaults 0 0UUID=ea80e99b-8d97-406f-a527-4247483ad7b6 / boot xfs defaults 0 0/dev/mapper/centos-home / home xfs defaults 0 0/dev/mapper/centos-swap swap swap Defaults 0 0 changed to: / dev/mapper/centos-root / xfs defaults Noatime 0 0 / dev/mapper/centos-home / home xfs defaults,noatime 0 0 applies changes without restarting: mount-o remount / noatime-does not update inode access records on the file system Performance can be improved (see the atime parameter). The reference link is: https://blog.csdn.net/jc_benben/article/details/78224212

7. File system mount options

The file system mount option has a synchronization option that allows you to write synchronously.

"however, using the sync option results in poor performance of services that write data to disk, such as HDFS,YARN,Kafka and Kudu." In CDH, most writes have been replicated. Therefore, synchronous writes to the disk are unnecessary, expensive, and not worth providing additional security.

The NFS and NAS options must not be used as an DataNode data directory installation, even if you use the Hierarchal Storage feature.

8. Cloudera Manager and CDH come with an embedded PostgreSQL database for non-production environments. The production environment does not support embedded PostgreSQL databases. For a production environment, the cluster must be configured to use an external database.

9. In most cases, but not all, Cloudera supports versions of MariaDB,MySQL and PostgreSQL, which are native to each supported Linux distribution.

10. For MySQL 5.6and 5.7you must install the MySQL-shared-compat or MySQL-shared packages. This is required for the installation of the Cloudera Manager Agent package.

11. MySQL GTID-based replication is not supported.

/ / Supplementary GTID is the global transaction ID, which ensures that a unique ID can be generated in the replication cluster for each transaction committed on the master.

GTID-based replication:

1. The slave server tells the master server the GTID value of the transaction that has been executed.

two。 The master library tells you which GTID transactions are not executed.

The same transaction is executed once in the specified slave library.

12. Cdh does not support HA / / of mysql, but we can use mysql from

13. Important: when the process is restarted, the configuration of each service will be redeployed using the information saved in the Cloudera Manager database. If this information is not available, the cluster cannot start or function properly. You must schedule and maintain regular backups of the Cloudera Manager database to restore the cluster if the database is lost. For more information, see backing up databases.

14 、

Cdh6.10~5.16 supports mysql 5.1 5.5 5.6 5.7

Cdh6.1~5.9 supports 5.1 5.5 5.6

Cdh6.0 supports 5.1 5.5

We use mysql5.6 online.

Cloudera Manager/CDH 5.9-5.16 mariadb 5.5 10.0

Cloudera Manager/CDH 5.5-5.8 mariadb 5.5

15 java heap optimization

If the heap does not need to exceed 32 GB, set the heap size to 31 GB or less to avoid this problem.

If 32 GB or more is required, set the heap size to 48 GB or higher to consider larger pointers. In general, for heaps larger than 32 GB in size, multiply the number of heaps you want by 1.5.

16. Only 64-bit JDK is supported. All versions of Cloudera Manager 5 and CDH 5 support Oracle JDK 7.Cloudera Enterprise 5.16.1 and later versions of OpenJDK 8. Oracle JDK 9 is not supported in any Cloudera Manager or CDH version.

17. Jdk7 will now be upgraded to jdk8

Oracle JDK release notes for the tested and recommended Oracle JDK 7 version

1.7u80 recommendation / latest version test

1.7u75 recommendation

1.7u67 recommendation

Minimum requirements for 1.7u55

In the dev118 environment, we are: java version "1.7.0,67"

eighteen

For CDH version 5.13.0 and later, JDK 7u76 or later is required for JDK-8055949,Sentry.

CDK 3.0 and later is supported by Apache Kafka requires JDK 8 and does not support JDK 7.

CDS Powered by Apache Spark version 2.2 and later, which can be installed on CDH 5, requires JDK 8.

OpenJDK 7 is not supported.

19 、

We use it for production.

Java version "1.8.0,102"

Due to JDK issues affecting CDH functionality, Oracle JDK 8u40, 8u45 and 8u60 are not supported:

CDS Powered by Apache Spark version 2.2 and later, which can be installed on CDH 5, requires JDK 8.

CDK 3.0 and later is supported by Apache Kafka requires JDK 8 and does not support JDK 7.

Oracle JDK 8u60 is not compatible with AWS SDK and causes problems with DistCP.

Oracle JDK 8 version has been tested and recommended Oracle JDK release notes

1.8u181 recommendation / latest version test

1.8u162 recommendation / latest version test

1.8u144 recommendation

1.8u131 recommendation

1.8u121 recommendation

1.8u111 recommendation

1.8u102 recommendation

1.8u91 recommendation

1.8u74 recommendation

Minimum requirements for 1.8u31

Tested and recommended OpenJDK release notes for OpenJDK version 1.8

Minimum requirements / latest version of 1.8u181 testing

20 、

Java cryptographic extension (JCE) unlimited strength jurisdiction requires that if you are using CentOS / Red Hat Enterprise Linux 5.6 or later or Ubuntu, you must install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all clustered hosts, including gateway hosts. This is to ensure that JDK uses the same default encryption type (aes256-cts) as the rest of the Red Hat / CentOS operating system, Kerberos, and CDH clusters.

Enable unlimited strength encryption for JDK 1.8.0 encryption 151 (and later)

Starting with JDK 1.8.0 encryption 151, unlimited strength encryption can be enabled using the java.security file recorded in the JDK 1.8.0 release notes. You do not need to install the JCE policy file.

Since JDK 1.8.0 encryption 161, unlimited strength encryption has been enabled by default. No further action is needed.

So to turn on Kerberos in production, you need to use jdk 1.8u181 recommendation / latest version test 1.8u162 recommendation / latest version test

21 、

Recommendations for disks:

Disk space

Cloudera Manager Server

There is 5 GB on partition hosting / var.

500 MB on partition managed / usr.

CDH 5 (including Impala and search)-1.5 GB per package (packaged), 2 GB per package (unpacked)

Impala-200 MB per package

Cloudera search-400 MB per package

Cloudera Management Service-the host monitor and service monitor databases are stored on a partition that hosts / var. Ensure that there is at least 20 GB free space on this partition.

By default, the unzipped parcel is in / opt / cloudera / parcels.

22. Memory recommendation

RAM-4 GB is recommended in most cases, which is required when using an Oracle database. For non-Oracle deployments with less than 100 hosts, 2 GB may be sufficient.

However, to run Cloudera Manager Server on a computer with 2 GB RAM, its maximum heap size must be reduced (by modifying the-Xmx in / etc / default / cloudera-scm-server). Otherwise, the kernel may terminate the server by taking too much RAM.

[root@NewCDH-0--141 ~] # vim / etc/default/cloudera-scm-server

Export CMF_JAVA_OPTS= "- Xmx2G-XX:MaxPermSize=256m-XX:+HeapDumpOnOutOfMemoryError-XX:HeapDumpPath=/tmp"

23 、

Python-Cloudera Manager requires Python 2.4 or later (but is not compatible with Python 3.0 or later). The installation of Hue and CDH 5 packages in CDH 5 requires Python 2.6 or 2.7. All supported operating systems include Python 2.4 or later. Cloudera Manager is compatible with Python 2.4 with the latest version of Python 2.x. Cloudera Manager does not support Python 3.0 and later.

24 、

If Cloudera Manager Server and Agent are running on the same host, ​​ install Cloudera Manager Server first, and then add the python-psycopg2 repository or package. After adding the repository or package, install Cloudera Manager Agent.

25 、

Network protocol support

IPv4 is required for CDH. IPv6 is not supported and IPv6 must be disabled.

Note: please contact the operating system vendor for help on disabling IPv6

26 、

The / etc / hosts file must:

Contains consistent information about hostnames and IP addresses on all hosts

Does not contain uppercase hostnames

Does not contain duplicate IP addresses

127.0.0.1 localhost.localdomain localhost

192.168.1.1 cluster-01.example.com cluster-01

192.168.1.2 cluster-02.example.com cluster-02

192.168.1.3 cluster-03.example.com cluster-03

27. Hardware requirements of cdh

/ usr minimum 5G

Cloudera Manager Database 5 GB if the Cloudera Manager database shares hosts with Service Monitor and Host Monitor, more storage space is required to meet the requirements of these components.

28. Host-based Cloudera Manager server requirements

Number of cluster hosts database host configuration heap size logical processor Cloudera Manager Server storage local directory

Very small (≤ 10) Shared 2 GB 4 5 GB minimum

Small (≤ 20) Shared 4 GB 6 20 GB minimum

Medium (≤ 200) Dedicated 8 GB 6 200 GB minimum

Large (≤ 500) Dedicated 10 GB 8 500 GB minimum

Extra Large Dedicated 16 GB 16 1 TB minimum

Note: on smaller clusters, Cloudera Manager Server and databases can share hosts. On larger clusters, they must run on separate dedicated hosts.

29. Service Monitor (Service Monitor) requirements

Use the recommendations in this table for clustering, where the only service with a secondary role is HDFS,YARN or Impala.

Number of hosts required for the number of monitored entities Java heap size recommended non-Java heap size

0-2000 0-100 1 GB 6 GB

2000-4000 2000-200 1.5 GB 6 GB

4000-8000 200-400 1.5 GB 12 GB

8000-16000 400-8002.5 GB 12 GB

16000-20000 800-1000 3.5 GB 12 GB

Cluster with HBase,Solr,Kafka or Kudu

Use these recommendations when deploying services such as HBase,Solr,Kafka or Kudu in a cluster. These services typically have a larger number of monitored entities.

Number of hosts required for the number of monitored entities Java heap size recommended non-Java heap size

0-30000 0-1002 GB 12 GB

30000-60000 100-200 3 GB 12 GB

60000-120000 200-4003.5 GB 12 GB

120000-240000 400-8008 GB 20 GB

thirty。

Reports Manager

Reports Manager periodically obtains fsimage from NameNode. It reads the fsimage and creates a Lucene index for it. To improve indexing performance, Cloudera recommends that you configure hosts as powerful as possible and dedicate SSD disks to Reports Manager.

Reports Manager component Java heap CPU disk

The report manager is 3-4 times the size of the fsimage.

Minimum: 8 cores

Recommended: 16 cores (32 cores, hyperthreading enabled).

1 dedicated disk, at least 20 times the size of fsimage. Cloudera strongly recommends using SSD disks.

31 、

Cloudera recommends that RAM be a node from 60GB to 256GB

Magnetic disk

Root volume: 100 GB

Application block device or mount point (master host only): 1 TB

Docker Image Block device: 1 TB

It is strongly recommended that SSD be used for application data storage.

33. Hardware resources required by flume components / / hardware resources required by each component are recommended in a separate table

Java Heap: minimum: 1 GB maximum 4 GB Java heap size should be greater than the maximum channel capacity

CPU: use the following formula to calculate the number of cores (number of sources + sinks) / 2 / (Number of sources + Number of sinks) / 2

Disks: multiple disks are recommended for file channels, JBOD settings, or RAID10 (preferred due to improved reliability).

Hardware resources required for hdfs components:

Component memory CPU disk

JournalNode 1 GB (default)

34 、

Cloudera Manager 5.15.2. 5.14.4, 5.13.3, 5.12.2, 5.11.2, 5.10.2, 5.9.3, 5.8.5, 5.7.6, 5.6.1, 5.5.6, 5.4.10, 5.3.10, 5.2.7, 5.1.6, and 5.0.7 are previous stable releases of Cloudera Manager 5.14, 5.13, 5.12, 5.11, 5.10, 5.9, 5.8, 5.7 5.6, 5.5, 5.4, 5.3, 5.2, 5.1, and 5.0 respectively.

That is to say

5.15.2. 5.14.4, 5.13.3, 5.12.2, 5.11.2, 5.10.2, 5.9.3, 5.8.5, 5.7.6, 5.6.1, 5.5.6, 5.4.10, 5.3.10, 5.2.7, 5.1.6, and 5.0.7

Are

Stable version 5.14,5.13,5.12,5.11,5.10,5.9,5.8,5.7,5.6,5.5,5.4,5.3,5.2,5.1, and 5.0

So when we choose to install, we should choose the stable version

35. The download address of Cloudera Manager 5.16.1 is:

Yum RHEL/CentOS/Oracle 7

Https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.16.1/ https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo https://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.16.1_x86_64.tar.gz

36 、

CDH 5 is the current stable version based on Apache Hadoop 2.3.0 or later.

37 impala can be installed separately from cdh

Impala 2.2.0 and later are only available for CDH 5, and all packages are 64-bit.

Yum RHEL 6/CentOS 6 (64-bit) / / only centos6 and 5 systems are supported in a separate installation

Https://archive.cloudera.com/impala/redhat/6/x86_64/impala/2/

Https://archive.cloudera.com/impala/redhat/6/x86_64/impala/cloudera-impala.repo

38, new features added to 5.13.x

1. Support dynamic partition pruning of map links on Hive on Spark. Dynamic partition pruning (DPP) is a database optimization that significantly reduces the amount of data scanned by queries and thus performs workloads faster. It is disabled by default, but can be enabled by setting the hive.spark.dynamic.partition.pruning.map.join.only property to true. When enabled, DPP only triggers queries where the join on the partition column is a map connection. For more information

2. Apache Pig now supports writing partitioned Hive tables in Parquet format using HCatalog.

3. Sentry supports Hive Metastore high availability.

39 、

JDK must be 64-bit. Do not use 32-bit JDK.

Install one of the JDK versions supported by CDH and Cloudera Manager.

Install the same version of Oracle JDK on each host.

Install JDK in / usr / java / jdk-version.

40 、

Or add a new sudo configuration by running the command visudo and then adding the following behavior cloudera-scm group:

% cloudera-scm ALL = (ALL) NOPASSWD:ALL

Sudo must be configured so that / usr / sbin is in the path when running sudo. One way to do this is to add the following configuration to sudoers:

Use the visudo command to edit the / etc / sudoers file

Add this line to the configuration file:

The default value is secure_path = / sbin:/ bin:/ usr / sbin:/ usr / bin

Vim / etc/sudoers, enter edit mode and find this line: "root ALL= (ALL) ALL" add "xxx ALL= (ALL) NOPASSWD:ALL" below

forty-one

The link to install mysql recommended by cdh must see https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_ig_mysql.html.

Recommended by cloudera for mysql's configuration file

[mysqld] datadir=/var/lib/mysqlsocket=/var/lib/mysql/mysql.socktransaction-isolation = READ-COMMITTED# Disabling symbolic-links is recommended to prevent assorted security risks # to do so Uncomment this line:symbolic-links = 0key_buffer_size = 32Mmax_allowed_packet = 32Mthread_stack = 256Kthread_cache_size = 64query_cache_limit = 8Mquery_cache_size = 64Mquery_cache_type = 1max_connections = 550#expire_logs_days = 10#max_binlog_size = 100M#log_bin should be on a disk with enough free space.#Replace'/ var/lib/mysql/mysql_binary_log' with an appropriate path for your#system and chown the specified folder to the mysql user.log_bin= / var/lib/mysql/mysql_binary_log#In later versions of MySQL If you enable the binary log and do not set#a server_id, MySQL will not start. The server_id must be unique within#the replicating group.server_id=1binlog_format = mixedread_buffer_size = 2Mread_rnd_buffer_size = 16Msort_buffer_size = 8Mjoin_buffer_size = 8M# InnoDB settingsinnodb_file_per_table = 1innodb_flush_log_at_trx_commit = 2innodb_log_buffer_size = 64Minnodb_buffer_pool_size = 4Ginnodb_thread_concurrency = 8innodb_flush_method = O_DIRECTinnodb_log_file_size = 512m [mysqld_safe] log-error=/var/log/ Mysqld.logpid-file=/var/run/mysqld/mysqld.pidsql_mode=STRICT_ALL_TABLES

42. MySQL 5.6 requires 5.1 driver version 5.1.26 or later.

Cloudera recommends that you consolidate all roles that require databases on a limited number of hosts and install drivers on those hosts. It is recommended that all such roles be located on the same host, but it is not required.

Ensure that the JDBC driver is installed on each host that runs the role of accessing the database.

Note: Cloudera recommends using only version 5.1 of the JDBC driver.

Our online version of jdbc is mysql-connector-java-5.1.35-bin.jar.

Mysql is installed with version 5.6

/ / the official recommended version for download address and actual operation is: mysql-connector-java-5.1.46.tar.gz

Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html (in .tar.gz format).

Extract the JDBC driver JAR file from the downloaded file. For example:

Tar zxvf mysql-connector-java-5.1.46.tar.gz

Copy the JDBC driver, renamed, to / usr/share/java/. If the target directory does not yet exist, create it. For example:

Sudo mkdir-p / usr/share/java/

Cd mysql-connector-java-5.1.46

Sudo cp mysql-connector-java-5.1.46-bin.jar / usr/share/java/mysql-connector-java.jar

43. Sqoop 2 has a built-in Derby database, but Cloudera recommends that you use the PostgreSQL database

/ / by default, the Derby database Derby runs in embedded mode and cannot be monitored.

Although it is possible, Cloudera currently does not have a real-time backup strategy for embedded Derby databases.

44. Mysql database needs to be backed up

Back up the MySQL database

To back up the MySQL database, run the mysqldump command on the MySQL host as follows: M

$mysqldump-hhostname-uusername-ppassword database > / tmp/database-backup.sql

"for example, to back up the activity Monitor database created in creating the Cloudera software database, use the password amon_password on the local host as root:"

$mysqldump-pamon_password amon > / tmp/amon-backup.sql

To back up the sample activity Monitor database on the remote host myhost.example.com as root, use the password amon_password:

$mysqldump-hmyhost.example.com-uroot-pamon_password amon > / tmp/amon-backup.sql

45. Store metrics data in Cloudera Manager and information on how storage restrictions affect data retention

The service monitor stores time series data and health data, Impala query metadata and YARN application metadata. By default, data is stored in / var / lib / cloudera-service-monitor / on the Service Monitor host. You can change this setting by modifying the Service Monitor storage directory configuration (firehose.storage.base.directory).

Time series metrics and health data-time series storage (firehose_time_series_storage_bytes-default value is 10 GB, minimum value is 10 GB)

Impala query metadata-Impala storage (firehose_impala_storage_bytes-1 GB by default)

YARN application metadata-YARN storage (firehose_yarn_storage_bytes-1 GB by default)

Cdh's explanation:

An approximate amount of disk space dedicated to storing Impala query data. After the storage reaches its maximum value, the older data is deleted to make room for the updated query. Disk usage is approximate, because the data will not be deleted until the limit is reached.

An approximate amount of disk space dedicated to storing time series and health data. After the storage reaches its maximum value, the older data is deleted to make room for the updated data. Disk usage is approximate, because the data will not be deleted until the limit is reached.

46 、

Cloudera-sever-manager configuration on the cdh monitor side

Cluster with HBase,Solr,Kafka or Kudu

Use recommendations when deploying services such as HBase,Solr,Kafka or Kudu in a cluster. These services typically have a larger number of monitored entities.

Number of hosts required for the number of monitored entities Java heap size recommended non-Java heap size

0-30000 0-1002 GB 12 GB

Clusters with HDFS,YARN or Impala

Use the recommendations in this table for clustering, where only services with secondary roles are HDFS,YARN or Impala.

0-2000 0-100 1 GB 6 GB

Step 2 Custom rules exclude static Service Pool Wizard

HDFS

For the NameNode and Secondary NameNode JVM heaps, the minimum is 50 MB and the ideal is max (4 GB, sum_over_all (DataNode mountpoints' available space) / 0.000008).

/ / NameNode and Secondary NameNode JVM heaps recommends JVM heap 4G

MapReduce

For the JobTracker JVM heap, the minimum is 50 MB and the ideal is max (1 GB, round ((1 GB 2.3717181092 ln (number of TaskTrackers in MapReduce service))-2.6019933306). If the number of TaskTrackers

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report