What is the use of hdfs in hadoop 04/26 Update SLTechnology News&Howtos

What is the use of hdfs in hadoop

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the use of hdfs in hadoop, has a certain reference value, friends in need can refer to. I hope you will learn a lot after reading this article. Next, let the editor take you to learn about it.

HDFS provides storage for huge amounts of data and can provide high-throughput data access. HDFS has high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput to access application data, making it suitable for applications with large datasets.

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed operation and storage.

Hadoop implements a distributed file system (Hadoop Distributed File System), one of which is HDFS.

HDFS has high fault tolerance and is designed to be deployed on low-cost (low-cost) hardware; and it provides high throughput (high throughput) to access application data, suitable for applications with very large data sets (large data set). HDFS relaxes the requirement of (relax) POSIX to access (streaming access) data in a file system as a stream.

The core design of Hadoop's framework is: HDFS and MapReduce. HDFS provides storage for huge amounts of data, while MapReduce provides computing for huge amounts of data.

HDFS

To external clients, HDFS is like a traditional hierarchical file system. You can create, delete, move, or rename files, and so on. But the architecture of HDFS is based on a specific set of nodes (see figure 1), which is determined by its own characteristics. These nodes include NameNode (only one), which provides metadata services within HDFS, and DataNode, which provides storage blocks for HDFS. Since there is only one NameNode, this is a disadvantage of HDFS version 1.x (single point of failure). There can be two NameNode in Hadoop 2.x version, which solves the problem of single node failure.

Files stored in HDFS are divided into blocks and then copied to multiple computers (DataNode). This is very different from the traditional RAID architecture. The size of the block (64MB by default in version 1.x and 128MB by default in version 2.x) and the number of blocks copied are determined by the client when the file is created. NameNode can control all file operations. All communications within HDFS are based on the standard TCP/IP protocol.

Thank you for reading this article carefully. I hope it will be helpful for everyone to share the use of hdfs in hadoop. At the same time, I also hope that you will support it, pay attention to the industry information channel, and find out if you encounter problems. Detailed solutions are waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.