0013-how to replicate data when Kerberos and non-Kerberos CDH cluster BDR are not available 04/11 Update SLTechnology News&Howtos

0013-how to replicate data when Kerberos and non-Kerberos CDH cluster BDR are not available

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

1. Overview

This document describes the implementation of data interoperability when BDR is not available between Kerberos and non-Kerberos CDH clusters. The document is mainly about

1. Test cluster environment description

BDR function Verification of 2.CDH

3. Requirements and restrictions on data replication between clusters

4. The way data is replicated between clusters

5. Considerations for replicating data using DistCp

6. Data mutual derivation through DistCp

This document will focus on the implementation of data mutual derivation between CDH Kerberos clusters and non-Kerberos clusters when BDR is not available, based on the following assumptions:

1.Kerberos cluster and non-Kerberos cluster have been set up and running normally.

two。 Both clusters have HttpFS services installed

two。 Test cluster environment description

The following are the hard limits for this test environment, but not for this operation manual:

Source cluster (non-Kerberos) target cluster (Kerberos) http://52.221.181.252:7180/http://172.31.8.141:7180/CDH5.12.0CDH5.11.1root user operation ec2-user user operating system CentOS 6.5 operating system CentOS 6.5

BDR function Verification of 3.CDH

3.1 configure BDR in two clusters

Configure the peer cluster in the two clusters. Here are the configuration steps.

Click "backup"-> "peer" on the CM home page to enter the configuration interface

two。 Click "Add Peer" to add cluster information

PeerName: defines the name of a peer cluster

PeerURL: peer cluster CM access address

PeerAdmin Username: the administrator account of the peer cluster

PeerAdmin Password: administrative password of peer cluster

3. Click add to check whether the status is "connected"

3.2 Test BDR in a non-Kerberos cluster

Click "backup"-> "replication schedule" on the CM home page

Create a HDFS replication plan

3. Click Save and prompt for the following error

3.3.Tests BDR in Kerberos cluster

Create a HDFS replication plan

two。 Click Save and prompt for the following error

From the above operation, the BDR function of CDH does not support data replication between secure clusters and non-secure clusters.

Data replication between DistCp clusters of 4.Hadoop

Description of the term:

Source cluster (Source): refers to the cluster where data is to be migrated and replicated

Target cluster (Destination): the cluster in which the migrated data is stored

4.1 requirements and limitations of data migration between clusters

The cluster running the DistCp command must start the MapReduce service (MRV1 or YARN (MRV2)) because the DictCp command is running a MapReduce job. All the MapReduce nodes in the cluster running the DistCp command are connected to the network of all nodes in the source cluster. To copy data between a Kerberos cluster and a non-Kerberos cluster, you must execute the DistCp command on the Kerberos cluster

4.2 data replication between clusters

If the source is a non-Kerberos environment and the target is a Kerberos environment, run the command on the target cluster, using WebHDFS for the source cluster and HDFS or WebHDFS protocol for the target cluster. If the source is a Kerberos environment and the target is a non-Kerberos environment, run the command on the source cluster, using HDFS or WebHDFS for the source cluster, and using the WebHDFS protocol for the target cluster.

5.Distcp usage

Note: the following operations are performed on the Kerberos cluster

5.1 non-Kerberos to Kerberos cluster data replication

The non-Kerberos cluster is the source cluster and the Kerberos cluster is the target cluster.

The following action is to copy the data from the source cluster / data directory to the target cluster's / sourcedata directory

1. View the source cluster / data directory

[root@ip-172-31-6-14814] # hadoop fs-ls / dataFound 2 itemsdrwxr-xr-x-root supergroup 0 2017-08-31 14:19 / data/cdh-shell-rw-r--r-- 3 root supergroup 5279500 2017-08-31 14:11 / data/ kudu.tar.gz [root @ ip-172-31-6-148g] # hadoop fs-ls / data/cdh-shellFound 9 items-rw-r--r-- 3 root Supergroup 60 2017-08-31 14:19 / data/cdh-shell/README.md-rw-r--r-- 3 root supergroup 2017-08-31 14:19 / data/cdh-shell/a.sh-rw-r--r-- 3 root supergroup 5470 2017-08-31 14:19 / data/cdh-shell/autouninstall.sh-rw-r--r-- 3 root supergroup 2017-08-31 14:19 / Data/cdh-shell/b.sh-rw-r--r-- 3 root supergroup 2017-08-31 14:19 / data/cdh-shell/components.list-rw-r--r-- 3 root supergroup 2438 2017-08-31 14:19 / data/cdh-shell/delete.list-rw-r--r-- 3 root supergroup 52 2017-08-31 14:19 / data/cdh-shell/node.list-rw-r- -data/cdh-shell/ntp.conf-rw-r--r---3 root supergroup 1809 2017-08-31 14:19 / data/cdh-shell/ntp.conf-rw-r--r-- 3 root supergroup 2017-08-31 14:19 / data/cdh-shell/user.list [root@ip-172-31-6-148i] #

The data directory of the target cluster [ec2-user@ip-172-31-8-141141] $hadoop fs-ls / Found 9 itemsdrwxrwxrwx-root supergroup 0 2017-08-27 10:27 / HiBenchdrwxrwxrwx-user_r supergroup 0 2017-08-21 11:23 / benchmarksdrwxr-xr-x-hive supergroup 0 2017-08-30 03:58 / datadrwxrwxrwx-user_r supergroup 0 2017-08-23 03:23 / faysondrwx -- hbase hbase 0 2017-08-31 09:56 / hbasedrwxrwxrwx-solr solr 0 2017-08-17 14:09 / solrdrwxrwxrwt-hdfs supergroup 0 2017-08-31 06:18 / tmpdrwxrwxrwx-hive supergroup 0 2017-08-24 12:28 / udfjardrwxrwxrwx-hdfs supergroup 0 2017-08-30 03:48 / user [ec2-user@ip-172-31-8-141] $

3. Initialize Kerberos users on the Kerberos cluster command line

[ec2-user@ip-172-31-8-141 cycles] * * $* * kinit user\ _ rPassword_ * * for** user\ _ r@CLOUDERA.COM: [EC2-user@ip-172-31-8-141 cycles] * * $* * klistTicket_ cache: FILE:/tmp/krb5cc\ _ 500 default _ principal: user\ _ ringing CLOUDERA.COMValid.COMValid17 10:03: 41 krbtgt/CLOUDERA.COM@CLOUDERA.COM renew_ * * until** 09Compact 07Accord 17 10:03:41 [ec2-user@ip-172-31-8-141i] * * $* *

4. Copy the data by doing the following (target HDFS mode)

The source cluster uses WebHDFS and the target cluster uses the HDFS protocol to copy data

[ec2-user@ip-172-31-8-141141] $hadoop distcp webhdfs://ip-172-31-6-148:14000/data/ hdfs://172.31.8.141:8020/sourcedata...17/08/31 10:23:58 INFO tools.DistCp: DistCp job-id: job_1504187767743_000217/08/31 10:23:58 INFO mapreduce.Job: Running job: job_1504187767743_000217/08/31 10:24:10 INFO mapreduce.Job: Job job_1504187767743 _ 0002 running in uber mode: false17/08/31 10:24:10 INFO mapreduce.Job: map 0 reduce 0 reduce 031 10:24:27 INFO mapreduce.Job: map 33% reduce 0 Accord 08 10:24:28 INFO mapreduce.Job: map 100% reduce 0 Chark 31 10:24:28 INFO mapreduce.Job: Job job_1504187767743_0002 completed successfully... [ec2-user@ip-172-31-8-141141B] $

Yarn Job running Interface

The task is completed to check whether the data is consistent with the source cluster data

[ec2-user@ip-172-31-8-141l] $hadoop fs-ls / sourcedataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 10:24 / sourcedata/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 10:24 / sourcedata/kudu.tar.gz [ec2-user@ip-172-31-8-141141B] $[ec2-user@ip-172-31-8-141B] $hadoop fs-ls / sourcedata/cdh-shellFound 9 items-rw-r--r-- 3 user_r supergroup 60 2017-08-31 10:24 / sourcedata/cdh-shell/README.md-rw-r--r-- 3 user_r supergroup 2017-08-31 10:24 / sourcedata/cdh-shell/a.sh-rw-r--r-- 3 user_r supergroup 5470 2017-08-31 10:24 / sourcedata/cdh- Shell/autouninstall.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 10:24 / sourcedata/cdh-shell/b.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 10:24 / sourcedata/cdh-shell/components.list-rw-r--r-- 3 user_r supergroup 2438 2017-08-31 10:24 / sourcedata/cdh-shell/delete.list-rw-r -- user_r supergroup-3 user_r supergroup 52 2017-08-31 10:24 / sourcedata/cdh-shell/node.list-rw-r--r-- 3 user_r supergroup 1809 2017-08-31 10:24 / sourcedata/cdh-shell/ntp.conf-rw-r--r-- 3 user_r supergroup 125 2017-08-31 10:24 / sourcedata/cdh-shell/user.list [ec2-user@ip-172-31-8-141l] $

The source cluster data is consistent with the target cluster data.

5. Execute data copy command (target WebHDFS mode)

The source cluster uses WebHDFS and the target cluster uses the WebHDFS protocol to copy the data, and delete the / sourcedata directory of the target cluster hdfs

[ec2-user@ip-172-31-8-141141] $hadoop distcp webhdfs://ip-172-31-6-148:14000/data/ webhdfs://ip-172-31-9-186:14000/sourcedata...17/08/31 10:37:11 INFO mapreduce.Job: The url to track the job: http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1504187767743_0003/17/08/31 10: 37:11 INFO tools.DistCp: DistCp job-id: job_1504187767743_000317/08/31 10:37:11 INFO mapreduce.Job: Running job: job_1504187767743_000317/08/31 10:37:22 INFO mapreduce.Job: Job job_1504187767743_0003 running in uber mode: false17/08/31 10:37:22 INFO mapreduce.Job: map 0 reduce 0 INFO mapreduce.Job 31 10:37:31 INFO mapreduce.Job: map 33 reduce 0 pound 08 INFO mapreduce.Job: 31 10:37:33 INFO mapreduce.Job: Map 100% reduce 08Accord 31 10:37:33 INFO mapreduce.Job: Job job_1504187767743_0003 completed successfully... [ec2-user@ip-172-31-8-141141] $

Yarn Task Interface

The task is completed to check whether the data is consistent with the source cluster data

[ec2-user@ip-172-31-8-141141] $hadoop fs-ls / sourcedataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 10:37 / sourcedata/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 10:37 / sourcedata/kudu.tar.gz [ec2-user@ip-172-31-8-141141B] $hadoop fs-ls / sourcedata/cdh-shellFound 9 items-rw-r -- user_r supergroup-3 user_r supergroup 60 2017-08-31 10:37 / sourcedata/cdh-shell/README.md-rw-r--r-- 3 user_r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/a.sh-rw-r--r-- 3 user_r supergroup 5470 2017-08-31 10:37 / sourcedata/cdh-shell/autouninstall.sh-rw-r--r-- 3 user _ r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/b.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/components.list-rw-r--r-- 3 user_r supergroup 2438 2017-08-31 10:37 / sourcedata/cdh-shell/delete.list-rw-r--r-- 3 user_r supergroup 52 2017 -08-31 10:37 / sourcedata/cdh-shell/node.list-rw-r--r-- 3 user_r supergroup 1809 2017-08-31 10:37 / sourcedata/cdh-shell/ntp.conf-rw-r--r-- 3 user_r supergroup 1252017-08-31 10:37 / sourcedata/cdh-shell/user.list [ec2-user@ip-172-31-8-141l] $

5.2Kerberos to non-Kerberos cluster data replication

Kerberos cluster is the source cluster, and non-Kerberos cluster is the target cluster.

The following action is to copy the data from the / sourcedata directory of the source cluster to the / data directory of the target cluster.

1. View the source cluster / sourcedata directory

[ec2-user@ip-172-31-8-141141] $hadoop fs-ls / sourcedataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 10:37 / sourcedata/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 10:37 / sourcedata/kudu.tar.gz [ec2-user@ip-172-31-8-141141] $hadoop fs-ls / sourcedata/cdh-shellFound 9 items-rw-r -- user_r supergroup-3 user_r supergroup 60 2017-08-31 10:37 / sourcedata/cdh-shell/README.md-rw-r--r-- 3 user_r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/a.sh-rw-r--r-- 3 user_r supergroup 5470 2017-08-31 10:37 / sourcedata/cdh-shell/autouninstall.sh-rw-r--r-- 3 user _ r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/b.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 10:37 / sourcedata/cdh-shell/components.list-rw-r--r-- 3 user_r supergroup 2438 2017-08-31 10:37 / sourcedata/cdh-shell/delete.list-rw-r--r-- 3 user_r supergroup 52 2017 -08-31 10:37 / sourcedata/cdh-shell/node.list-rw-r--r-- 3 user_r supergroup 1809 2017-08-31 10:37 / sourcedata/cdh-shell/ntp.conf-rw-r--r-- 3 user_r supergroup 1252017-08-31 10:37 / sourcedata/cdh-shell/user.list [ec2-user@ip-172-31-8-141l] $

two。 HDFS directory of the target cluster

[root@ip-172-31-6-148B] # hadoop fs-ls / Found 2 itemsdrwxrwxrwt-hdfs supergroup 0 2017-08-30 15:36 / tmpdrwxrwxrwx-hdfs supergroup 0 2017-08-31 09:08 / user [ip-172-31-6-148C] #

3. Kerberos user initialization on the source cluster command line

[root@ip-172-31-6-14814] # klistTicket cache: user_r@CLOUDERA.COMValid starting Expires Service principal08/31/17 09:22:26 09 krbtgt/CLOUDERA.COM@CLOUDERA.COM renew until 17 09:22:24 krbtgt/CLOUDERA.COM@CLOUDERA.COM renew until 09) 17 09:22: 24 [root @ ip-172-31-6-14814] #

4. Copy the data by doing the following (source HDFS mode)

The target cluster uses WebHDFS and the source cluster uses the WebHDFS protocol to copy data

[ec2-user@ip-172-31-8-141141B] $hadoop distcp hdfs://ip-172-31-8-141:8020/sourcedata/ webhdfs://ip-172-31-6-148:14000/data...17/08/31 10:50:26 INFO tools.DistCp: DistCp job-id: job_1504187767743_000417/08/31 10:50:26 INFO mapreduce.Job: Running job: job_1504187767743_000417/08/31 10:50:36 INFO mapreduce.Job: Job job _ 1504187767743000004 running in uber mode: false17/08/31 10:50:36 INFO mapreduce.Job: map 0 reduce 0 reduce 08Accord 31 10:50:45 INFO mapreduce.Job: map 33% reduce 0Accord 08Universe 31 10:50:46 INFO mapreduce.Job: map 100% reduce 0pex 08Charpy 31 10:50:47 INFO mapreduce.Job: Job job_1504187767743_0004 completed successfully... [ec2-user@ip-172-31-8-141l] $

Yarn Job View

The task runs successfully. Check whether the data replication is complete.

[root@ip-172-31-6-14814] # hadoop fs-ls / data Found 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 14:50 / data/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 14:50 / data/ kudu.tar.gz [root @ ip-172-31-6-148g] # hadoop fs-ls / data/cdh-shellFound 9 items-rw-r- -user_r supergroup-3 user_r supergroup 60 2017-08-31 14:50 / data/cdh-shell/README.md-rw-r--r-- 3 user_r supergroup 2017-08-31 14:50 / data/cdh-shell/a.sh-rw-r--r-- 3 user_r supergroup 5470 2017-08-31 14:50 / data/cdh-shell/autouninstall.sh-rw-r--r-- 3 user_ R supergroup 1452017-08-31 14:50 / data/cdh-shell/b.sh-rw-r--r-- 3 user_r supergroup 498 2017-08-31 14:50 / data/cdh-shell/components.list-rw-r--r-- 3 user_r supergroup 2438 2017-08-31 14:50 / data/cdh-shell/delete.list-rw-r--r-- 3 user_r supergroup 52 2017- 08-31 14:50 / data/cdh-shell/node.list-rw-r--r-- 3 user_r supergroup 1809 2017-08-31 14:50 / data/cdh-shell/ntp.conf-rw-r--r-- 3 user_r supergroup 1252017-08-31 14:50 / data/cdh-shell/user.list [root@ip-172-31-6-14814] #

The data of the target cluster is consistent with that of the source cluster.

5. Copy the data by doing the following (source WebHDFS mode)

[ec2-user@ip-172-31-8-141141] $hadoop distcp webhdfs://ip-172-31-9-186:14000/sourcedata/ webhdfs://ip-172-31-6-148:14000/data...17/08/31 10:58:09 INFO tools.DistCp: DistCp job-id: job_1504187767743_000517/08/31 10:58:09 INFO mapreduce.Job: Running job: job_1504187767743_000517/08/31 10:58:20 INFO mapreduce.Job: Job job _ 1504187767743000005 running in uber mode: false17/08/31 10:58:20 INFO mapreduce.Job: map 0 reduce 0 reduce 08Accord 31 10:58:36 INFO mapreduce.Job: map 67% reduce 0Accord 08Universe 31 10:58:37 INFO mapreduce.Job: map 100% reduce 0pex 08Charpy 31 10:58:37 INFO mapreduce.Job: Job job_1504187767743_0005 completed successfully... [ec2-user@ip-172-31-8-141l] $

Yarn Task Interface

The task runs successfully. Check whether the data is complete.

[root@ip-172-31-6-14814] # hadoop fs-ls / dataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 14:58 / data/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 14:58 / data/ kudu.tar.gz [root @ ip-172-31-6-148g] # hadoop fs-ls / data/cdh-shellFound 9 items-rw-r--r- -3 user_r supergroup 60 2017-08-31 14:58 / data/cdh-shell/README.md-rw-r--r-- 3 user_r supergroup 2017-08-31 14:58 / data/cdh-shell/a.sh-rw-r--r-- 3 user_r supergroup 5470 2017-08-31 14:58 / data/cdh-shell/autouninstall.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 14:58 / data/cdh-shell/b.sh-rw-r--r-- 3 user_r supergroup 2017-08-31 14:58 / data/cdh-shell/components.list-rw-r--r-- 3 user_r supergroup 2438 2017-08-31 14:58 / data/cdh-shell/delete.list-rw-r--r-- 3 user_r supergroup 52 2017-08-31 14:58 / data/cdh-shell/node.list-rw-r--r-- 3 user_r supergroup 1809 2017-08-31 14:58 / data/cdh-shell/ntp.conf-rw-r--r-- 3 user_r supergroup 2017-08-31 14:58 / data/cdh-shell/user.list [root@ip-172-31-6-148i] #

6. Summary

During DistCp, it is automatically created if the directory of the target cluster does not exist.

It should be noted that the directory of the target cluster already exists and the copy that does not exist is different.

When the sourcedata directory exists, after the DistCp command is run, copy the data directory of the source cluster to the sourcedata directory of the target cluster View the sourcedata directory as follows: [ec2-user@ip-172-31-8-141l] $hadoop fs-ls / sourcedataFound 1 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 11:19 / sourcedata/data [ec2-user@ip-172-31-8-141141] $hadoop fs-ls / sourcedata/dataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 11:19 / sourcedata/data/cdh-shell -rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 11:19 / sourcedata/data/kudu.tar.gz [ec2-user@ip-172-31-8-141141B] $

If the sourcedata directory does not exist, after the DistCp command is run, only the files under the source cluster data directory are copied to the target cluster sourcedata directory (the data subdirectory is not copied) View the sourcedata directory as follows: [ec2-user@ip-172-31-8-141141] $hadoop dfs-ls / sourcedataFound 2 itemsdrwxr-xr-x-user_r supergroup 0 2017-08-31 11:16 / sourcedata/cdh-shell-rw-r--r-- 3 user_r supergroup 5279500 2017-08-31 11:16 / sourcedata/kudu.tar.gz [ec2-user@ip-172-31-8-141141] $

Drunken whips are famous horses, and teenagers are so pompous! Lingnan Huan Xisha, under the vomiting liquor store! The best friend refuses to let go, the flower of data play!

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.