Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Overview and usage of Cloudera data encryption

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "Overview and usage of Cloudera data encryption". In daily operation, I believe many people have doubts about the overview and usage of Cloudera data encryption. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "Overview and usage of Cloudera data encryption". Next, please follow the editor to study!

01

-

Overview of Cloudera data encryption

Encryption is the process of encoding various components (such as text, files, databases, passwords, applications, or network packets) using digital keys, so that only the appropriate entities (users, system processes, etc.) can decode (decrypt) items, and then view, modify or add to the data. Cloudera provides encryption mechanisms to protect data that is persisted on disk or other storage media (data at rest or simply called data encryption) as well as data that is moved over the network (data in encryption is transmitted).

In government, health, finance, education and many other environments, data encryption is mandatory. For example, the Federal Information Security Management Act (FISMA) regulates patient privacy issues, while the payment Card Industry data Security Standard (PCI DSS) regulates the information security of credit card processors. These are just two examples.

However, the level of privacy required in the use case, the large amount of data contained in the Cloudera cluster of confidentiality and data integrity (deployed with many different components) must still be supported. The encryption mechanisms supported by Cloudera and discussed in this overview are designed to achieve this goal.

02

-

Protect static data

Protecting data at rest usually means encrypting data stored on disk and allowing authorized users and processes (only authorized users and processes) to decrypt data when required by the application or task at hand. For data at rest encryption, the encryption key must be distributed and managed, the key should be rotated or changed periodically (to reduce the risk of key disclosure), and many other factors complicate the process.

However, encrypting data may not be enough. For example, administrators and others with sufficient privileges may have access to log files, audit data, or personally identifiable information (PII) in SQL queries. Depending on the specific use case, in a hospital or financial environment, it may be necessary to remove PII from all such files to ensure that users who have privileges on logs and queries, which may contain sensitive data, are still unable to use them when viewing data.

Cloudera provides complementary methods to encrypt data at rest and provides mechanisms to mask PII in log files, audit data, and SQL queries.

Available encryption options

Cloudera provides a variety of mechanisms to secure sensitive data. CDH provides transparent HDFS encryption to ensure that all sensitive data is encrypted before it is stored on disk. By combining HDFS encryption with Navigator Key Trustee's enterprise-class encryption key management, you can make most enterprises comply with regulations. For Cloudera Enterprise, HDFS encryption can be enhanced through Navigator Encrypt to protect metadata outside of the data. Considering that the data nodes are encrypted in parallel, Cloudera clusters using these solutions operate as usual with very little impact on performance. As the cluster grows, so does encryption.

In addition, this transparent encryption is optimized for Intel chipsets to achieve high performance. The Intel chipset includes an AES-NI coprocessor that provides special features that allow encryption workloads to run very fast. Cloudera takes advantage of the latest technological advances in Intel to achieve faster performance.

Key trustee KMS is used in conjunction with key trustee server and key HSM to provide HSM-based protection for stored key materials. The key trustee KMS generates the encrypted area key material locally on the KMS, and then encrypts the key material using the key generated by HSM. Instead, the Navigator HSM KMS service relies on HSM to generate and store all encrypted zone keys. When using Navigator HSM KMS, the encrypted area key material originates from HSM and never leaves HSM. This enables the highest level of key isolation, but requires some network overhead to make network calls to HSM for key generation, encryption and decryption operations. For most production scenarios, key trustee KMS is still the recommended HDFS encryption key management solution.

The following figure shows a sample deployment:

Cloudera transparent HDFS encryption encrypts data stored on HDFS

Navigator Encrypt encrypts all other data (including metadata, logs, and overflow data) related to Cloudera Manager,Cloudera Navigator,Hive and HBase

Navigator Key Trustee for robust, fault-tolerant key management

In addition to applying encryption to the data layer of the Cloudera cluster, encryption can also be applied at the network layer to encrypt the communication between the cluster nodes.

Encryption does not prevent administrators who have full access to the cluster from viewing sensitive data. To confuse sensitive data, including PII, you can configure the cluster for data editing.

Data collation of Cloudera Cluster

Editing is a process of blurring data. It can help organizations comply with industry regulations and standards such as PCI (payment card industry) and HIPAA by confusing personally identifiable information (PII), making it unusable unless work requires people with such access. For example, HIPAA legislation requires that no one, except the appropriate doctor (and patient), can use the patient PII and may not use any patient's PII to determine personal identity or associate it with health data. By converting PII to a meaningless pattern (for example, converting an American Social Security number to an XXX-XX-XXXX string.

Data editing works separately from Cloudera encryption, and Cloudera encryption does not prevent administrators who have full access to the cluster from viewing sensitive user data. It ensures that cluster administrators, data analysts, and others do not see PII or other sensitive data that is not in their workspace, and does not prevent users with appropriate permissions from accessing data that they have privileges.

03

-

Protect dynamic data

For data in transmission, it is relatively easy to implement data protection and encryption. Wired encryption is built into the Hadoop stack (for example, SSL) and usually does not require an external system. Use a session-level one-time key to build data encryption in this transfer through a session handshake and immediate and subsequent transmissions. Therefore, because of the temporary nature of the key, the data in transmission avoids many key management problems related to static data, but it does rely on correct authentication; certificate disclosure is an authentication problem, but it may break wired encryption. As the name implies, the data in transmission covers the secure transmission and intermediate storage of the data. This applies to all inter-process communication, within or between nodes. There are three main channels of communication:

HDFS transparent encryption: data encrypted using HDFS transparent encryption is end-to-end protection. Any data written and written to HDFS can only be encrypted or decrypted by the client. HDFS does not have access to unencrypted data or encryption keys. This supports both static encryption and in-transit encryption.

Data transfer: the first channel is data transfer, which includes reading and writing data blocks to the HDFS. Hadoop uses the SASL-enabled wrapper DataTransportProtocol around its native TCP / IP-based direct transport (called) to protect the I / O stream within the DIGEST-MD5 envelope. This procedure also uses secure HadoopRPC (see remote procedure calls) for key exchange. However, the HttpFS REST interface does not provide secure communication between the client and the HDFS, only secure authentication using SPNEGO.

In order to transfer data between DataNode during the shuffle phase of a MapReduce job (that is, moving intermediate results between the Map and Reduce parts of the job), Hadoop uses Transport layer Security (TLS) to protect the communication channel through HTTP Secure (HTTPS).

Remote procedure calls: the second channel is the system call to the remote procedure (RPC) of various systems and frameworks in the Hadoop cluster. Like data transfer activities, Hadoop has its own RPC native protocol, called HadoopRPC, for Hadoop API client communication, Hadoop internal service communication, and monitoring, heartbeat, and other non-data, non-user activities. HadoopRPC supports SASL for secure transmission, and the default settings are Kerberos and DIGEST-MD5, depending on the traffic type and security settings.

User interface: the third channel includes a variety of Web-based user interfaces in the Hadoop cluster. For secure transportation, the solution is simple; these interfaces use HTTPS.

Overview of TLS / SSL certificates

You can sign a certificate in three different ways:

Types

instructions

Certificate signed by the public CA

Recommend. Using a certificate signed by a trusted public CA simplifies deployment because the default Java client already trusts most public CA. Obtain certificates from trusted well-known (public) CA, such as Symantec and Comodo

Certificate signed by internal CA

If your organization has its own certificate, obtain the certificate from the organization's internal CA. Using internal CA can reduce costs (although clustering configuration may require establishing a trust chain for certificates signed by internal CA, depending on your IT infrastructure).

Self-signed certificate

It is not recommended for production deployment. The use of self-signed certificates requires that each client be configured to trust a specific certificate (in addition to generating and distributing certificates). However, self-signed certificates apply to non-production (test or proof-of-concept) deployments.

TLS / SSL encryption of CDH components

Cloudera recommends that you use Kerberos authentication to protect the cluster before enabling encryption such as SSL on the cluster. If you enable SSL for a cluster that has not been configured for Kerberos authentication, a warning is displayed.

Hadoop services differ in the use of SSL, as follows:

The HDFS,MapReduce and YARN daemons act as both SSL servers and clients.

The HBase daemon acts only as a SSL server.

The Oozie daemon acts only as a SSL server.

Hue acts as a SSL client for all of the above.

At startup, the daemon acting as the SSL server loads the KeyStore. When the client connects to the SSL server daemon, the server transfers the certificate loaded at startup to the client, which then uses its truststore to verify the server's certificate.

For information about setting up SSL / TLS for CDH services, see the applicable component guide.

04

-

Data Protection in Hadoop Project

The following table lists the various encryption features that CDH components and Cloudera Manager can take advantage of.

Project

Encryption for Data-in-Transit

Encryption for Data-at-Rest

(HDFS Encryption + Navigator Encrypt + Navigator Key Trustee)

HDFS

SASL (RPC), SASL (DataTransferProtocol)

Yes

MapReduce

SASL (RPC), HTTPS (encrypted shuffle)

Yes

YARN

SASL (RPC)

Yes

Accumulo

Partial-Only for RPCs and Web UI (Not directly configurable in Cloudera Manager)

Yes

Flume

TLS (Avro RPC)

Yes

HBase

SASL-For web interfaces, inter-component replication, the HBase shell and the REST, Thrift 1 and Thrift 2 interfaces

Yes

HiveServer2

SASL (Thrift), SASL (JDBC), TLS (JDBC, ODBC)

Yes

Hue

TLS

Yes

Impala

TLS or SASL between impalad and clients, but not between daemons

Oozie

TLS

Yes

Pig

N/A

Yes

Search

TLS

Yes

Sentry

SASL (RPC)

Yes

Spark

None

Yes

Sqoop

Partial-Depends on the RDBMS database driver in use

Yes

Sqoop2

Partial-You can encrypt the JDBC connection depending on the RDBMS database driver

Yes

ZooKeeper

SASL (RPC)

No

Cloudera Manager

TLS-Does not include monitoring

Yes

Cloudera Navigator

TLS-Also see Cloudera Manager

Yes

Backup and Disaster Recovery

TLS-Also see Cloudera Manager

Yes

05

-

Overview of encryption Mechanism

Data at rest and data in transport encryption work at different technical layers of the cluster:

Layers

Description

Application

HDFS transparent encryption is applied by HDFS client software and allows you to encrypt specific folders contained in HDFS. To securely store the required encryption keys, Cloudera recommends that you use the Cloudera Navigator key trustee server in conjunction with HDFS encryption.

You can also encrypt data that CDH components, including Impala,MapReduce,YARN or HBase, temporarily store on the local file system outside the HDFS.

Operating system

At the Linux OS file system layer, encryption can be applied to the entire volume. For example, Cloudera Navigator Encrypt can encrypt data inside and outside the HDFS, such as temporary / overflow files, configuration files, and databases that store metadata associated with the CDH cluster. Cloudera Navigator Encrypt runs as a Linux kernel module and is part of the operating system. Navigator Encrypt requires a license for Cloudera Navigator and must be configured to use Navigator Key Trustee Server.

The network

Network communication between client and server processes (HTTP,RPC or TCP / IP services) can be encrypted using industry standard TLS / SSL.

Source: https://docs.cloudera.com/cloudera-manager/7.0.3/security-overview/topics/cm-security-encryption-overview.html

At this point, the study on "Overview and usage of Cloudera data encryption" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report