Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to protect the Hadoop environment

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will give you a detailed explanation on how to protect the Hadoop environment. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Knox,Ranger simplifies security management

The Hadoop ecosystem has resources to support security. Knox and Ranger are two important Apache open source projects. Knox provides a framework for managing security and supports security implementation on Hadoop clusters.

The Ranger project focuses on developing tools and technologies to help users deploy and standardize security across Hadoop clusters. It provides a centralized framework for managing resource-level policies, such as files, folders, databases, and even specific rows and columns in the database. Ranger helps administrators implement access policies by group, data type, and so on. Ranger has different authorization functions for different Hadoop components, such as YARN, HBase, Hive, and so on.

Knox is a REST API gateway developed within the Apache community to support monitoring, authorization management, audit, and policy enforcement of Hadoop clusters. It provides a single access point for all REST interactions with the cluster. With Knox, system administrators can manage authentication through LDAP and Active Directory, federate identity management based on HTTP headers, and audit hardware on the cluster. Knox also supports enhanced security because it can be integrated with enterprise identity management solutions and is compatible with Kerberos.

Hadoop encryption

The original version of Hadoop does not contain encryption. Later versions include end-to-end encryption to protect data that is at rest in a Hadoop cluster and when moving across the network. In the current version, all data stored in HFDS or accessed through HFDS can be encrypted. Hadoop supports encryption at the disk, file system, database, and application levels.

In Hadoop core technology, HFDS has directories called encrypted zones. After the data is written to Hadoop, it is automatically encrypted (using the algorithm of the user's choice) and assigned to the encrypted area. Encryption is file-specific, not region-specific. This means that each file in the area is encrypted with its own unique data encryption key (DEK). The client uses an encrypted data encryption key (EDEK) to decrypt data from HFDS, and then uses DEK to read and write data. Encryption zones and DEK encryption occur between the file system and database levels of the architecture.

Encryption keys need to be managed, which is the job of the Hadoop key management server (that is, KMS). KMS generates encryption keys, manages access to storage keys, and manages encryption and decryption on HDFS clients. KMS is a Java Web application with client and server components that use HTTP and REST API to communicate with each other. Security in KMS includes HTTPS secure transport and support for HTTP SPNEGO Kerboros authentication.

Hadoop distribution vendors and other solution providers have developed additional security specific to their products or applicable to the entire Hadoop environment. Many enhancements focus on protecting Hadoop data as it is moved. For example, there is an Wire encryption protocol that can be applied to Hadoop data transmitted over HTTP, RPC, data transfer Protocol (DTP), and JDBC. It also provides SSL protection for JDBC clients and MapReduce reorganization, SASL for network RPC, and so on.

Hadoop authentication

Authentication in Hadoop environment has experienced rapid and extensive development. The original version of Hadoop did not contain any provisions for authenticating users because this is a limited project designed to be used in a trusted environment. Subsequent releases added limited rights management capabilities to files in HDFS, but Hadoop still failed to provide enterprise-class authentication security. Fast forward to today, the user authentication and identity management solutions that enterprises use for their core IT infrastructure can be extended to Hadoop environments.

Today, Hadoop can be configured in secure or non-secure mode. The main difference is that security mode requires authentication for each user and service. Kerberos is the basis of authentication in Hadoop security mode. The data is encrypted as part of the authentication process.

Many organizations use their Active Directory or LDAP solutions to perform authentication in a Hadoop environment. This method is previously incompatible with the Hadoop environment and well represents the maturity and development of Hadoop. The Knox API from the Apache Hadoop project is used to extend Active Directory or LDAP to the Hadoop cluster. It is also used to extend federated identity management solutions to the environment.

Hadoop access and permissions

Authenticating a user or service request does not automatically grant it unrestricted access to all data in the Hadoop cluster. Access permissions can be set for some HDFS or even for specific files and data types. As mentioned earlier, Ranger facilitates the establishment and enforcement of permissions. Other resources are also available. The HDFS permissions Guide is a component that allows administrators to set permissions in HFDS directories and files. Permissions can be set at the group and individual levels. Permissions include who can access files, update files, delete files, and so on. Service level authorization is a separate feature that verifies that a client trying to connect to a particular Hadoop service has access to that service. Like the HDFS permissions Guide, service level authorization supports individual and group permissions. Sentry is a module used with Hive, HDFS datasheets, Impala, and other components to provide more granular rights management for data and metadata in Hadoop clusters.

The Ranger,HDFS permissions Guide, Service level Authorization, and Sentry are part of the Hadoop project. Third, various commercial versions of Hadoop have built-in other permission protection as well as related security and management features.

Enterprises also typically use a variety of commercial solutions to perform data masks in Hadoop. A data mask is the practice of hiding raw data records (through encryption) so that unauthorized users cannot access them. The data mask is done in the big data environment because many applications need some information from the dataset rather than complete records. For example, an imaging clinic may need to know how many patients in an institution have had a mammogram within six months, without knowing the patient's results or a complete medical history. The patient outcome data will be related to clinicians who need different permissions.

Past and present Hadoop security

module

The original version of HADOOP

Now include / available

Encrypt

Does not include

DEK encryption is automatically applied to HFDS and moving data; other data protection features for each commercial distribution; KMS manages encryption keys; Kerberos is commonly used. Other encryption methods are Hadoop compatible / available.

Authentication mode

No

Kerberos is the foundation of the Hadoop security model; Active Directory and LDAP extend to Hadoop; identity management solutions to Hadoop.

Access and permissions

HDFS file permissions

Permissions can be set by individual, group, and role, or for specific data types and files; data masks can be applied to data that restricts access.

Using Hadoop to solve the Security problem of big data

By definition, big data is big data, but an omnipotent security method is not appropriate. The features in Hadoop enable organizations to optimize security to meet the user, compliance, and corporate requirements of all individual data assets in the Hadoop environment. Features such as roles, user and group permissions, data masks, and multiple encryption and authentication options make it feasible to provide different levels of security in a single large environment. The growing integration support between Hadoop and Active Directory,LDAP and identity management solutions enables organizations to expand their enterprise security solutions, so the Hadoop infrastructure does not have to become an island.

Security is one of the fastest changing aspects of Hadoop. The function is constantly enhanced and surpassed. For the latest security updates, contact your Apache project or Hadoop publisher.

This is the end of the article on "how to protect the Hadoop environment". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report