Cdh version upgrade (5.14-& gt; 6.2) 07/02 Update SLTechnology News&Howtos

Cdh version upgrade (5.14-& gt; 6.2)

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Our Cloudera Manager and cdh version is 5.14, and now the company needs to upgrade to cdh7.2

You need to upgrade Cloudera Manager first and then upgrade cdh.

1.Cloudera Manager upgrade

(refer to

Https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_cm_upgrade.html)

Before upgrading, make sure that the version of linux has been upgraded to the version supported by Cloudera Manager6.2

1.1 backup 1.1.1 backup Cloudera Manager Agent

# View database information

$sudo cat / etc/cloudera-scm-server/db.properties

Get information similar to the following:

... com.cloudera.cmf.db.type=...com.cloudera.cmf.db.host=database_hostname:database_portcom.cloudera.cmf.db.name=scmcom.cloudera.cmf.db.user=scmcom.cloudera.cmf.db.password=SOME_PASSWORD performs the following backup operations on each machine on which Cloudera Manager agent is installed:

Create a top level backup directory.

$export CM_BACKUP_DIR= "`date +% F`-CM5.14" $echo $CM_BACKUP_DIR$ mkdir-p $CM_BACKUP_DIR

Back up the Agent directory and the runtime state.

$sudo-E tar-cf $CM_BACKUP_DIR/cloudera-scm-agent.tar-- exclude=*.sock / etc/cloudera-scm-agent / etc/default/cloudera-scm-agent / var/run/cloudera-scm-agent / var/lib/cloudera-scm-agent

Back up the existing repository directory.

$sudo-E tar-cf $CM_BACKUP_DIR/repository.tar / etc/yum.repos.d1.1.2 backup Cloudera Manager Service

Execute on the machine where Service Monitor is installed:

$sudo cp-rp / var/lib/cloudera-service-monitor / var/lib/cloudera-service-monitor- `date +% F`-CM5.14

Execute on the machine where Host Monitor is installed:

$sudo cp-rp / var/lib/cloudera-host-monitor / var/lib/cloudera-host-monitor- `date +% F`-CM5.14

Execute on the machine where Event Server is installed:

$sudo cp-rp / var/lib/cloudera-scm-eventserver / var/lib/cloudera-scm-eventserver- `date +% F`-CM5.141.1.3 backup Cloudera Manager Databases$ mysqldump-- databases database_name--host=database_hostname-- port=database_port-u user_name-p > $HOME/database_name-backup- `date +% F`-CM5.14.sql1.1.2 backup Cloudera Manager Server

Create a top-level backup directory.

$export CM_BACKUP_DIR= "`date +% F`-CM5.14" $echo $CM_BACKUP_DIR$ mkdir-p $CM_BACKUP_DIR

$Back up the Cloudera Manager Server directories:

$sudo-E tar-cf $CM_BACKUP_DIR/cloudera-scm-server.tar / etc/cloudera-scm-server / etc/default/cloudera-scm-server

Back up the existing repository directory.

$sudo-E tar-cf $CM_BACKUP_DIR/repository.tar / etc/yum.repos.d1.2 upgrade access to Cloudera Manager Server1.2.1 setup software (replace yum source)

$sudo rm / etc/yum.repos.d/cloudera*manager.repo*

Create a new yum source file

$sudo vim / etc/yum.repos.d/cloudera-manager.repo [cloudera-manager] # Packages for Cloudera Managername=Cloudera Managerbaseurl= https://archive.cloudera.com/cm6/6.2.0/redhat6/yum/gpgkey=https://archive.cloudera.com/cm6/6.2.0/redhat6/yum/RPM-GPG-KEY-clouderagpgcheck=11.2.2 install or configuration java8

Configure java_home in the configuration file of server:

In / etc/default/cloudera-scm-server

Add JAVA_HOME

Export JAVA_HOME= "/ usr/java/jdk1.8.0_162"

1.2.3 upgrade Cloudera Manager Server

1. Log in to the Cloudera Manager Server host.

two。 Stop the Cloudera management service. (important: not stopping Cloudera Management Service at this time may cause the administrative role to crash or Cloudera Manager Server may not be able to restart. )

Steps:

A.Log in to the Cloudera Manager Admin Console.b.Select Clusters > Cloudera Management Service.c.Select Actions > Stop.

3. Stop Cloudera Manager Server.

$sudo service cloudera-scm-server stop

4. Stop Cloudera Manager Agent.

$sudo service cloudera-scm-agent stop

5. Upgrade Cloudera packages.

$sudo yum clean all$ sudo yum upgrade cloudera-manager-server cloudera-manager-daemons cloudera-manager-agent-y

6. Make sure the package is installed.

$rpm-qa 'cloudera-manager-*'

7. Start Cloudera Manager Agent.

$sudo service cloudera-scm-agent start

8. Start Cloudera Manager Server.

$sudo service cloudera-scm-server start

If you have any problems during startup, you can refer to the log file:

$tail-f / var/log/cloudera-scm-server/cloudera-scm-server.log$ tail-f / var/log/cloudera-scm-agent/cloudera-scm-agent.log$ tail-f / var/log/ messages

9. Normally, you can see the upgrade situation by opening the cdh upgrade page.

Http://cloudera_Manager_server_hostname:7180/cmf/upgrade

1.2.4 upgrade Cloudera Manager Agent

a. Upgrade using the CDH interface

Click on the Cloudera Manager Agent package

Option 1: select the agent repository

We can just use the common library.

Select Public Cloudera Repository

two。 Install JDK

If it's already installed, you don't have to choose.

3. Install agent

Just configure your root or sudo account. You need access to all agent nodes.

Option 2: upgrade using the command

Clear old repo files

$sudo rm / etc/yum.repos.d/cloudera*manager.repo*

Create a new repo file:

$sudo vim / etc/yum.repos.d/cloudera-manager.repo

Contents of repo file:

[cloudera-manager] # Packages for Cloudera Managername=Cloudera Managerbaseurl= https://archive.cloudera.com/cm6/6.2.0/redhat6/yum/gpgkey=https://archive.cloudera.com/cm6/6.2.0/redhat6/yum/RPM-GPG-KEY-clouderagpgcheck=1

Stop the Cloudera Manager agent service

$sudo service cloudera-scm-agent stop

Upgrade Cloudera Manager agent

$sudo yum clean all$ sudo yum repolist$ sudo yum upgrade cloudera-manager-daemons cloudera-manager-agent-y

After all the machines are completed, each agent node executes the

$sudo service cloudera-scm-agent start

View

Http://192.168.0.254:7180/cmf/upgrade

Shows that the agent of all machines has been upgraded and all have a heartbeat

Click Host Inspector to check the status of the node

When you are finished, click to show the results of the inspector, view the items in question, and fix them.

One of the more important issues shown is that if you need to use python2.7 to run CDH6,hue later, remember, forget it for the time being.

Then, start Cloudera Management Service

At this point, the upgrade of Cloudera Manager is completed, followed by the upgrade of cdh

If the upgrade fails and needs to be restored, you can refer to the official steps:

Https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_cm_downgrade.html

2.CDH upgrade

Before upgrading, make sure that the version of linux has been upgraded to the version supported by CDH6.2, and the java version is 1.8.

2.1. Preparatory work

Then run the following command to check the cluster

If there is a problem, fix it

Check hdfs:

$sudo-u hdfs hdfs fsck /-includeSnapshots$ sudo-u hdfs hdfs dfsadmin-report

Check the consistency in the hbase table:

$sudo-u hdfs hbase hbck

If kudu is used, check kudu:

$sudo-u kudu kudu cluster ksck

The following services are no longer available in 6.0.0 and need to be stopped and deleted before upgrading

Accumulo

Sqoop 2

MapReduce 1

Spark 1.6

Record Service

2.2 backup cdh

The following CDH components do not require backup:

MapReduce

YARN

Spark

Pig

Impala

Complete the following backup steps before upgrading CDH

1.Back Up Databases

We use mysql, so take mysq as an example

1) if you have not stopped, please stop the service. If Cloudera Manager indicates that there is a dependent service, then stop relying on the service as well.

2) back up the databases of various services (Sqoop, Oozie, Hue,Hive Metastore, Sentry). Replace the database name, hostname, port, user name, and backup directory path, and then run the following command:

$mysqldump-- databases database_name-- host = database_hostname-- port = database_port-u database_username-p > backup_directory_path / database_name-backup-`date +% F`-CDH 5.14.sql

2.Back Up ZooKeeper

On each zookeeper node, back up the data storage directory of the zookeeper configured in cdh, as shown in

$sudo cp-rp / var/lib/zookeeper/ / var/lib/zookeeper-backup- `date +% F` CM-CDH5.14

3.Back Up HDFS

(the data path in the command is changed according to the actual configuration in cdh)

a. Back up journal data and execute it on each JournalNode

$sudo cp-rp / data/dfs/jn / data/dfs/jn-CM-CDH5.14

b. Back up the runtime directory of each namenode and run: $mkdir-p / etc/hadoop/conf.rollback.namenode$ cd / var/run/cloudera-scm-agent/process/ & & cd `ls-T1 | grep-e "- NAMENODE\ $" | head-1` $cp-rp * / etc/hadoop/conf.rollback.namenode/$ rm-rf / etc/hadoop/conf.rollback.namenode/log4j.properties$ cp-rp / etc/hadoop/conf.rollback.namenode/log4j.properties / etc/hadoop/conf.rollback.namenode/

These commands create temporary rollback directories. If you need to roll back to CDH 5.x later, the rollback process requires you to modify the files in this directory.

c. Back up the runtime directory of each datanode $mkdir-p / etc/hadoop/conf.rollback.datanode/$ cd / var/run/cloudera-scm-agent/process/ & & cd `ls-T1 | grep-e "- DATANODE\ $" | head-1` $cp-rp * / etc/hadoop/conf.rollback.datanode/$ rm-rf / etc/hadoop/conf.rollback.datanode/log4j.properties$ cp-rp / etc/hadoop/conf.cloudera.hdfs/log4j.properties / etc/hadoop/conf.rollback.datanode/

4.Back Up Key Trustee Server and Clients

The service is not used

5.Back Up HSM KMS

The service is not used

6.Back Up Navigator Encrypt

The service is not used

7.Back Up HBase

Because the rollback process also rolls back the HDFS, the data in the HBase is also rolled back. In addition, HBase metadata stored in ZooKeeper is restored as part of the ZooKeeper rollback process.

8.Back Up Search

The service is not used

9.Back Up Sqoop 2

The service is not used

10.Back Up Hue

Back up the app registry file on all hosts running the Hue Server role

$mkdir-p / opt/cloudera/parcels_backup$ cp-rp / opt/cloudera/parcels/CDH/lib/hue/app.reg / opt/cloudera/parcels_backup/app.reg-CM-CDH5.142.3 Service change: hue:

For centos6 versions of the system:

Python2.7 needs to be installed on the node of hue

Enable the Software Collections Library:

$sudo yum install centos-release-scl

$Install the Software Collections utilities:

$sudo yum install scl-utils

$Install Python 2.7:

$sudo yum install python27

Verify that Python 2.7 is installed:

$source / opt/rh/python27/enable$ python-versionhbase:

1.HBase 2.0 does not support PREFIX_TREE block encoding. You need to delete it before upgrading, otherwise hbase2.0 cannot start.

If you have already installed CDH6. Then ensure that all tables or snapshots do not use PREFIX_TREE block encoding by running the following tools:

$hbase pre-upgrade validate-dbe$ hbase pre-upgrade validate-hfile

two。 Upgrade coprocessor classes

External coprocessors are not automatically upgraded. There are two ways to handle coprocessor upgrades:

Before continuing with the upgrade, manually upgrade the coprocessor jar.

Temporarily cancel the setting of the coprocessor and continue the upgrade.

After you upgrade manually, you can reset them.

Attempting to upgrade without upgrading the coprocessor jar may result in unpredictable behavior such as HBase role startup failure, HBase role crash, and even data corruption.

If you already have CDH 6 installed, you can run it to ensure that your coprocessor and upgrade are compatible with the hbase pre-upgrade validate-cp tool.

2.4 considerations for upgrading a cluster

When upgrading a cluster from Cloudera Manager 5.13 or earlier to CDH 6.0 or later using Cloudera Manager Backup and Disaster Recovery (BDR), backing up data using Cloudera Manager Backup and Disaster Recovery (BDR) will not work.

The Cloudera Manager of the minor version used to perform the upgrade must be equal to or greater than the minor version of CDH. To upgrade Cloudera Manager

Note:

When upgrading CDH using a rolling reboot (for upgrades only):

Automatic failover does not affect the rolling restart operation.

After the upgrade is complete, do not delete the old block if you are currently running a MapReduce or Spark job. These jobs still use the old blocks and must be restarted to use the newly upgraded blocks.

Make sure that Oozie work is idempotent.

Do not use Oozie Shell Actions to run Hadoop-related commands.

Rolling upgrade of Spark Streaming jobs is not supported. Restart the flow job after the upgrade is complete to start using the newly deployed version.

The runtime library must be packaged as part of the Spark application.

You must use distributed caching to propagate the job profile from the client gateway computer.

Do not build "super" or "fat" JAR files that contain third-party dependencies or CDH classes, as these files may conflict with classes that Yarn,Oozie and other services automatically add to CLASSPATH.

Build Spark applications without bundling CDH JAR.

2.4.1 backup cloudera manager

We backed up once before the cloudera manager upgrade, and we need to back up again after the upgrade.

1. View database information $cat / etc/cloudera-scm-server/db.properties

For example:

Com.cloudera.cmf.db.type=...com.cloudera.cmf.db.host=database_hostname:database_portcom.cloudera.cmf.db.name=scmcom.cloudera.cmf.db.user=scmcom.cloudera.cmf.db.password=SOME_PASSWORD2. Backup Cloudera Manager Agent

Execute on each agent node:

Create a backup directory $export CM_BACKUP_DIR= "`date +% F`-CM5.14" $echo $CM_BACKUP_DIR$ mkdir-p $CM_BACKUP_DIR

Back up the agent directory and runtime status

$sudo-E tar-cf $CM_BACKUP_DIR/cloudera-scm-agent.tar-- exclude=*.sock / etc/cloudera-scm-agent / etc/default/cloudera-scm-agent / var/run/cloudera-scm-agent / var/lib/cloudera-scm-agent backup the current repo directory $sudo-E tar-cf $CM_BACKUP_DIR/repository.tar / etc/yum.repos.d

Backup Cloudera Management Service

Execute on the Service Monitor node

$sudo cp-rp / var/lib/cloudera-service-monitor / var/lib/cloudera-service-monitor- `date +% F`-CM5.14

Execute on the Host Monitor node

$sudo cp-rp / var/lib/cloudera-host-monitor / var/lib/cloudera-host-monitor- `date +% F`-CM5.14

Execute on the Event Server node

$sudo cp-rp / var/lib/cloudera-scm-eventserver / var/lib/cloudera-scm-eventserver- `date +% F`-CM5.143. Stop Cloudera Manager Server & Cloudera Management Service

Stop Cloudera Management Service in the CDH management interface and select:

Clusters- > Cloudera Management Service.

Actions > Stop.

Stop Cloudera Manager Server:

$sudo service cloudera-scm-server stop4. Backup Cloudera Manager database $mysqldump-- databases database_name--host=database_hostname-- port=database_port-u user_name-p > $HOME/database_name-backup- `date +% F`-CM5.14.sql

The database information is the information obtained from viewing the file in the first step.

5. Backup Cloudera Manager Server

Execute on the Cloudera Manager Server node:

1. Create a backup directory:

$export CM_BACKUP_DIR= "`date +% F`-CM5.14" $echo $CM_BACKUP_DIR$ mkdir-p $CM_BACKUP_DIR

two。 Back up the directory of the Cloudera Manager Server

$sudo-E tar-cf $CM_BACKUP_DIR/cloudera-scm-server.tar / etc/cloudera-scm-server / etc/default/cloudera-scm-server

3. Back up the current repo directory

$sudo-E tar-cf $CM_BACKUP_DIR/repository.tar / etc/yum.repos.d2.4.2 enters maintenance mode

To avoid unnecessary alerts during the upgrade process, enter maintenance mode on the cluster before starting the upgrade. Entering maintenance mode stops sending email alerts and SNMP traps, but does not stop checking and configuring validation. After completing the upgrade, be sure to exit maintenance mode to re-enable Cloudera Manager alerts.

2.4.3 complete the pre-upgrade migration step yarn

Decommission and recommission the YARN NodeManagers but do not start the NodeManagers.

A decommission is required so that the NodeManagers stop accepting new containers, kill any running containers, and then shutdown. (drop YARN's NodeManagers to Decommission, then recommission, but do not start NodeManagers, need Decommission so that NodeManagers stops accepting new containers, terminates all running containers, and then closes. ) procedure:

1. Ensure that new applications, such as MapReduce or Spark applications, are not committed to the cluster until the upgrade is complete.

two。 Open the CDH management interface and go to the YARN service to be upgraded

3. On the instance tab, select all NodeManager roles. This can be done by filtering roles under role types.

4. Click the selected action-> remove authorization

If the cluster is running CDH version 5.9 or later and is managed by Cloudera Manager version 5.9 or later, and you have configured normal de-authorization, a timeout countdown is initiated.

A timeout is provided before smooth decommissioning starts the decommissioning process. Timeouts create a time window to consume already running workloads from the system and allow them to complete. Search for the Node Manager Graceful Decommission Timeout field on the Configuration tab of the YARN service and set this property to a value greater than 0 to create a timeout.

5. Wait until the deregistration is complete. After completion, the status of NodeManager is stopped, and the authorization status is de-authorization.

6. Select all NodeManagers and click on the selected action-> re-authorize.

(6 if you do not do this step, an error will be reported later in the upgrade process, and it is difficult to find the reason, such an one will be reported in the process of yarn upgrade:

Caused by: org.apache.hadoop.ipc.RemoteException (java.io.IOException): Requested replication factor of 0 is less than the required minimum of 1 for / user/yarn/mapreduce/mr-framework/3.0.0-cdh7.2.0-mr-framework.tar.gz)

Important: do not start all NodeManager selected. Hive

Query syntax, DDL syntax, and Hive API have all changed. Before upgrading, you may need to edit the HiveQL code in the application workload.

Sentry

If the cluster uses Sentry policy file authorization, you must migrate the policy file to a Sentry service supported by the database before upgrading to CDH 6.

Spark

If the cluster uses Spark or Spark Standalone, you must perform several steps to ensure that the correct version is installed.

Delete spark standalone

After the upgrade, if spark2 is installed, spark2-submit is replaced with spark-submit, and you need to replace the command to submit the job before submitting the job 2.4.4 to run Hue document cleanup

If the cluster uses Hue, perform the following steps (maintenance version is not required). These steps clean up the database tables used by Hue and can help improve performance after upgrade.

1. Back up the Hue database.

two。 Connect to the Hue database.

3. Check the size of the desktop_document,desktop_document2,oozie_job,beeswax_session,beeswax_savedquery and beeswax_ query tables for reference points. If any of these lines exceed 100000, run cleanup.

2.4.5 download and distribute packages

1. Open the CDH management interface and click Host-> Parcels-> configuration

two。 Update the Parcel repository for CDH with the following remote parcel repository URL:

Https://archive.cloudera.com/cdh7/6.2.0/parcels/

a. In the remote Parcel repository URL section, add the url above on the stand-alone "+" icon, and click Save changes

b. Locate the row in the table that contains the new CDH parcel, and then click the download button.

c. After downloading the package, click the assign button.

d. When all packages have been distributed, click the upgrade button.

2.4.6 run the upgrade CDH Wizard

1. After entering the upgrade wizard, there may be some problems with the results of check,check running some clusters, which will affect subsequent upgrades. Solve these problems first. There will also be a prompt to back up the database. If you have already ok, click Yes, I have performed these steps, and then click continue.

two。 Click full cluster restart (full cluster downtime) and click continue. (this step will restart all services)

Some problems were encountered during the upgrade:

Oozie exception prompt during upgrade:

1.E0103: Could not load service classes, Cannot create PoolableConnectionFactory (Table 'oozie.validate_conn' doesn't exist)

Solution:

2. "java.lang.ClassNotFoundException:org.cloudera.log4j.redactor.RedactorAppender" could not find the class.

Refer to this article and make a soft connection to the missing logredactor-2.0.7.jar from / opt/cloudera/parcels/CDH/lib/oozie/lib to / opt/cloudera/parcels/CDH/lib/oozie/libtools directory

3.ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

The reason is that the exception information cannot be displayed because log4j.xml is not configured, so you can also put a template of log4j.xml in the / opt/cloudera/parcels/CDH/lib/oozie/libtools directory.

2.4.7 Migration after upgrade 1.spark to CDH 6, multiple Spark services may be configured, each with its own set of configurations, including the event log location. Determine which services to keep, and then manually merge the two services.

The command (spark2-submit) for submitting Spark 2 jobs in CDH 5 is deleted in CDH 6

Replace with spark-submit. In CDH 5 clusters with built-in Spark 1.6 services and Spark 2 services, spark-submit is used with Spark 1.6 services and with spark2-submit and Spark 2 services. After upgrading to CDH 6, spark-submit uses CDH's built-in Spark 2 service, and spark2-submit no longer works. Be sure to use these commands to update any workflows that submit Spark jobs.

Manually merge the Spark service by performing the following steps:

1. Copy all relevant configurations from the service you want to delete to the service you want to retain. To view and edit the configuration:

a. In Cloudera Manager Admin Console, go to the Spark service that you want to delete.

b. Click the configuration tab.

c. Record the configuration.

d. Go to the Spark service you want to keep and copy the configuration.

e. Click Save changes.

To keep a historical event log:

Determine the location of the event log for the service you want to delete:

In Cloudera Manager Admin Console, go to the Spark service that you want to delete.

Click the configuration tab.

Search: spark.eventLog.dir

Pay attention to the path.

Hadoop fs-mv / * /.

Using Cloudera Manager, stop and delete the Spark service you chose to delete

Restart the remaining Spark services: click the drop-down arrow next to the Spark service, and then select restart. 2.impala

Impala is mainly used for real-time queries, not for online tasks, so the importance is not so high. Refer to the official website.

Https://www.cloudera.com/documentation/enterprise/upgrade/topics/impala_upgrading.html

That's it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.