Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Duties of a Hadoop administrator (translator)

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Recently I read an English document related to Hadoop, which is actually part of a book. Feel very good, basically elaborated the responsibility of a hadoop administrator. Usually, friends who come into contact with hadoop at work can take a look at whether everyone already has the knowledge and skills described in this document.

Translation:

Duties of a Hadoop administrator

With the growing interest and insight in big data, various organizations are actively planning or building their big data team. To start working on data, they need a good and solid infrastructure.

Once they have the infrastructure, they need to control and specify strategies for cluster maintenance, management, and troubleshooting.

There is a growing demand for Hadoop administrators, whose work (creating and maintaining clusters) makes data analysis a real possibility.

Hadoop administrators need good system operation skills in network, operating system, and storage. In the complex network environment, they need to have a lot of knowledge about computer hardware and hardware operation.

Apache Hadoop software mainly runs on the Linux operating system, and all must have skills such as monitoring, debugging, configuration, security management and so on for the Linux operating system.

Setting up nodes for a cluster involves a lot of repetitive work, and Hadoop administrators should use these servers in a fast and efficient way, such as using management tools such as Puppet,Chef and CFEngine.

In addition to these tools, management should also have good planning skills to design and plan clusters.

Many nodes in a cluster need to copy data. For example, the fsimage file of the namenode daemon can be configured to write to different hard disks of the same node or to different nodes.

So hadoop administrators need to understand NFS mount points and how to cooperate with clusters to establish NFS mounts. Administrators may also be required to configure disk RAID on specific nodes.

Because all Hadoop services and daemons are built on Java, a basic knowledge of JVM (Java Virtual Machine Java Virtual Machine) and an understanding of Java exceptions will be very useful.

This knowledge can help administrators identify problems quickly.

Hadoop administrators should have benchmarking skills to test cluster performance in high-traffic scenarios.

The cluster is always running continuously and dealing with a large amount of data, so the cluster is more prone to failure. In order to monitor the health of the cluster, administrators need to deploy monitoring tools, such as Nagios and Ganglia.

And the administrator needs to configure alarm and monitoring for key nodes to foresee the problem in advance.

Good knowledge of footstep language programming, such as Python,Ruby, or Shell, will greatly help Hadoop administrators.

Usually, Hadoop administrators are asked to import scheduled files into HDFS in stages from external file sources. Footwork skills can help administrators automate these tasks by executing scripts.

Most importantly, Hadoop administrators should have a good understanding of Apache Hadoop's architecture and its internal operations.

The following items are some of the key hadoop operations that Hadoop administrators must master:

Plan the cluster and evaluate the amount of data the cluster needs to process to determine the number of nodes in the cluster.

Install and upgrade Apache Hadoop on the cluster.

Configure and debug Hadoop. Exe by using various configuration files of Hadoop.

Understand all Hadoop daemons, their roles and responsibilities in the cluster.

Hadoop administrators should know how to read and interpret Hadoop logs.

Add and remove nodes from the cluster.

Rebalance nodes in the cluster.

Use authentication and authentication systems to enable security mechanisms, such as Kerberos

Almost all organizations follow certain policies to back up their data, and it is the responsibility of Hadoop administrators to perform data backup.

So Hadoop administrators should be familiar with server backup and restore operations.

Original text:

Responsibilities of a Hadoop administrator

With the increase in the interest to derive insight on their big data

Organizations are now planning and building their big data teams aggressively.

To start working on their data, they need to have a good solid infrastructure.

Once they have this setup, they need several controls and system policies in place to maintain, manage,and troubleshoot their cluster.

There is an ever-increasing demand for Hadoop Administrators in the market

As their function (setting up and maintaining Hadoop clusters) is what makes analysis really possible.

The Hadoop administrator needs to be very good at system operations, networking, operating systems, and storage.

They need to have a strong knowledge of computer hardware and their operations, in a complex network.

Apache Hadoop, mainly, runs on Linux. So having good Linux skills such as monitoring, troubleshooting, confguration, and security is a must.

Setting up nodes for clusters involves a lot of repetitive tasks

And the Hadoop administrator should use quicker and effcient ways to bring up these servers using confguration management tools

Such as Puppet, Chef, and CFEngine.

Apart from these tools, the administrator should also have good capacity planning skills to design and plan clusters.

There are several nodes in a cluster that would need duplication of data

For example, the fsimage file of the namenode daemon can be confgured to write to two different disks on the same node

Or on a disk on a different node.

An understanding of NFS mount points and how to set it up within a cluster is required.

The administrator may also be asked to set up RAID for disks on specifc nodes.

As all Hadoop services/daemons are built on Java

A basic knowledge of the JVM along with the ability to understand Java exceptions would be very useful.

This helps administrators identify issues quickly.

The Hadoop administrator should possess the skills to benchmark the cluster to test performance under high traffc scenarios.

Clusters are prone to failures as they are up all the time and are processing large amounts of data regularly.

To monitor the health of the cluster, the administrator should deploy monitoring tools such as Nagios and Ganglia

And should confgure alerts and monitors for critical nodes of the cluster to foresee issues before they occur.

Knowledge of a good scripting language such as Python, Ruby, or Shell would greatly help the function of an administrator.

Often, administrators are asked to set up some kind of a scheduled file staging from an external source to HDFS.

The scripting skills help them execute these requests by building scripts and automating them.

Above all, the Hadoop administrator should have a very good understanding of the Apache Hadoop architecture and its inner workings.

The following are some of the key Hadoop-related operations that the Hadoop administrator should know:

Planning the cluster, deciding on the number of nodes based on the estimated amount of data the cluster is going to serve.

Installing and upgrading Apache Hadoop on a cluster.

Confguring and tuning Hadoop using the various confguration files available within Hadoop.

An understanding of all the Hadoop daemons along with their roles and responsibilities in the cluster.

The administrator should know how to read and interpret Hadoop logs.

Adding and removing nodes in the cluster.

Rebalancing nodes in the cluster.

Employ security using an authentication and authorization system such as Kerberos.

Almost all organizations follow the policy of backing up their data

And it is the responsibility of the administrator to perform this activity.

So, an administrator should be well versed with backups and recovery operations of servers

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report