How Hadoop projects are made up 04/25 Update SLTechnology News&Howtos

How Hadoop projects are made up

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "how to compose Hadoop project", the content is simple and easy to understand, clear organization, I hope to help you solve doubts, let Xiaobian lead you to study and learn "how to compose Hadoop project" this article bar.

Hadoop consists of several projects

The overall structure diagram is as follows

1. Hadoop Common: A module at the bottom of Hadoop architecture, which provides various tools for Hadoop subprojects, such as configuration files and log operations.

HDFS: Distributed file system that provides high-throughput application data access, HDFS acts like a traditional hierarchical file system for external clients. Files can be created, deleted, moved or renamed, and so on. But HDFS architecture is built around a specific set of nodes (see Figure 1), which is determined by its own characteristics. These nodes include NameNode (only one), which provides metadata services inside HDFS, and DataNode, which provides storage blocks for HDFS. Since there is only one NameNode, this is a disadvantage of HDFS (single point failure).

Files stored in HDFS are divided into blocks, which are then copied to multiple computers (DataNodes). This is very different from traditional RAID architectures. The size of the blocks (typically 64MB) and the number of blocks copied are determined by the client at file creation time. NameNode controls all file operations. All communication within HDFS is based on the standard TCP/IP protocol.

MapReduce: A software framework set computing cluster for distributed mass data processing.

Avro: RPC project hosted by doug cutting, mainly responsible for serialization of data. It's similar to Google's protobuf and Facebook's thrift. Avro is used to do RPC of Hadoop in the future, so that RPC module of Hadoop has faster communication speed and more compact data structure.

Hive: Similar to CloudBase, it is also a set of software that provides sql functions of data warehouse on hadoop distributed computing platform. This simplifies the aggregation of massive data stored in hadoop and ad hoc queries. Hive provides a QL query language, based on SQL, which is very convenient to use.

HBase: Based on Hadoop Distributed File System, it is an open source, scalable distributed database based on column storage model, supporting the storage of structured data for large tables.

Pig: is a high-level data flow language and execution framework for parallel computing, SQL-like language, is a high-level query language built on MapReduce, compiles some operations into Map and Reduce of MapReduce model, and users can define their own functions.

ZooKeeper: Google Chubby is an open source implementation. It is a reliable coordination system for large-scale distributed systems, providing functions such as configuration maintenance, name service, distributed synchronization, group service, etc. ZooKeeper's goal is to encapsulate complex and error-prone critical services, providing users with easy-to-use interfaces and efficient, stable systems.

Chukwa: A data collection system for managing large distributed systems contributed by Yahoo.

Cassandra: Scalable multi-master database with no single point of failure

Mahout: A Scalable Machine Learning and Data Mining Library

That's all for "How Hadoop Projects Are Made Up". Thanks for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.