Example Analysis of Hadoop distributed File system 07/11 Update SLTechnology News&Howtos

Example Analysis of Hadoop distributed File system

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail the sample analysis of Hadoop distributed file system for everyone, Xiaobian thinks it is quite practical, so share it for everyone to make a reference, I hope you can gain something after reading this article.

Hadoop Distributed File System

Hadoop Distributed File System (HDFS) is designed to be a distributed file system that runs on commodity hardware. It has a lot in common with existing distributed file systems. But at the same time, it is also very different from other distributed file systems. HDFS is a highly fault-tolerant system suitable for deployment on inexpensive machines. HDFS provides high-throughput data access and is ideal for applications on large-scale datasets. HDFS relaxes some POSIX constraints to achieve the purpose of streaming file system data. HDFS was originally developed as an infrastructure for the ApacheNutch search engine project. HDFS is part of the Apache Hadoop Core project. The project address is http://hadoop.apache.org/core/.

premises and design objectives

hardware error

Hardware errors are the norm rather than the exception. HDFS may consist of hundreds or thousands of servers, each of which stores part of the file system's data. The reality we face is that the number of components that make up a system is huge, and any component can fail, which means that there will always be a part of HDFS that doesn't work. Error detection and fast, automatic recovery are therefore core architectural goals of HDFS.

streaming data access

Applications running on HDFS, unlike regular applications, require streaming access to their data sets. HDFS is designed with data batch processing in mind rather than user interaction. More critical than low latency for data access is high throughput for data access. Many of the hard constraints set by POSIX standards are not required for HDFS applications. To improve data throughput, POSIX semantics have been modified in key areas.

large-scale datasets

Applications running on the Hadoop distributed file system HDFS have large data sets. A typical file size on HDFS is typically between gigabytes and terabytes. HDFS is therefore tuned to support large file storage. It should be able to provide overall high data transmission bandwidth, scalable to hundreds of nodes in a cluster. A single HDFS instance should be able to support tens of millions of files.

Simple consistency model

HDFS applications require a write-once-read-many file access model. After a file is created, written, and closed, it doesn't need to be changed. This assumption simplifies data consistency issues and enables high-throughput data access. Map/Reduce applications or web crawler applications are perfect for this model. There are plans to extend this model in the future to support additional write operations on files.

"Mobile computing is cheaper than mobile data"

The closer an application requests computation to the data it operates on, the more efficient it is, especially when the data is massive. Because this can reduce the impact of network congestion and improve the throughput of system data. Moving computing closer to data is clearly better than moving data closer to applications. HDFS provides an interface for applications to move themselves close to data.

Portability between heterogeneous software and hardware platforms

HDFS was designed with platform portability in mind. This feature facilitates the promotion of HDFS as a large-scale data application platform.

Namenode and Datanode

Hadoop distributed file system HDFS uses master/slave architecture. An HDFS cluster consists of a Namenode and a certain number of Datanodes. Namesode is a central server responsible for managing file system namespaces and client access to files. Datanodes in a cluster are generally one node and are responsible for managing the storage on the node on which they reside. HDFS exposes the file system namespace on which users can store data in the form of files. Internally, a file is actually divided into one or more data blocks, which are stored on a set of datanodes. Namenode performs namespace operations on file systems, such as opening, closing, and renaming files or directories. It is also responsible for determining the mapping of data blocks to specific Datanode nodes. Datanodes handle read and write requests from file system clients. Create, delete and copy data blocks under Namenode unified scheduling.

Namenode and Datanode are designed to run on ordinary commercial machines. These machines typically run the GNU/Linux operating system (OS). HDFS is developed in Java, so Namenode or Datanode can be deployed on any Java-capable machine. Because of the highly portable Java language, HDFS can be deployed on many types of machines. A typical deployment scenario is one machine running a Namenode instance, while the other machines in the cluster run a Datanode instance. This architecture does not preclude running multiple datanodes on a single machine, although it is rare.

The structure of a single Namenode in a cluster greatly simplifies the architecture of the system. Namenode is the arbiter and manager of all HDFS metadata so that user data never flows through Namenode.

About "Hadoop distributed file system sample analysis" This article is shared here, I hope the above content can be of some help to everyone, so that you can learn more knowledge, if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.