What is the use of hadoop core HDFS 04/27 Update SLTechnology News&Howtos

What is the use of hadoop core HDFS

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Xiaobian to share with you what is the use of hadoop core HDFS, I believe most people still do not know how, so share this article for everyone's reference, I hope you have a lot of harvest after reading this article, let's go to understand it together!

One of Hadoop's two core elements

Mass Data Storage (HDFS)

What is HDFS?

hadoop distributed file system

is a file system that allows files to be shared across multiple hosts over a network,

Allows multiple users on multiple machines to share files and storage space.

Features:

1. Permeability. the act of actually accessing files over the network is seen by programs and users

It's like accessing a local disk.

2. Tolerance. Even if some nodes in the system are offline, the system as a whole can continue to function

without data loss.

Applicable scenarios:

Apply to write multiple queries, concurrent write is not supported, small files are not appropriate.

HDFS Architecture

master-slave structure

Master node, only one: namenode

From nodes, there are many: datanodes

namenode is responsible for:

Receive user operation request

Maintain directory structure of file system

Manage the relationship between files and blocks, between blocks and datanodes

DataNode is responsible for:

storage file

Files are stored in blocks on disk

For data security, there are multiple copies of the file

NameNode

Is the administrative node for the entire file system. It maintains the file tree for the entire file system,

File/directory meta information and a list of data blocks for each file. Receive user's operation request.

The files include (all three are stored in the linux file system):

fsimage: Metadata mirror file, stores NameNode memory metadata information for a certain period.

edits: Operation log files.

fstime: Save the time of the last checkpoint

Job characteristics:

1.Namenode always saves metedata in memory for processing "read requests."

2. when a write request arrives,namenode will first write an editlog to disk.

That is, write logs to the edits file, successfully return, will modify memory, and return to the client.

3. Hadoop maintains a fsimage file, which is a mirror image of metedata in namenode.

But fsimage doesn't always match metedata in namenode memory.

Instead, the content is updated at intervals by merging edits files.Secondary namenode

It is used to merge fsimage and edits files to update the metedata of NameNode.

DataNode

Provide real file data storage services.

The most basic storage unit:block(file block), the default size is 64M

Secondary NameNode

A solution to HA(High Available). Hot standby is not supported. configuration to

The default installation is on NameNode, but this... Not safe!

(In production environment, it is recommended to install separately)

Implementation process:

Download metadata information (fsimage,edits) from NameNode, and then merge the two to generate

The new fsimage, saved locally, is pushed to NameNode, replacing the old fsimage.

Workflow:

1.secondarynamenode notifies namenode to switch edits file

2. secondarynamenode Get fsimage and edits from namenode (via http)

3. secondarynamenode Load fsimage into memory and start merging edits

4. secondarynamenode sends new fsimage back to namenode

5. namenodenamenode Replace old fsimage with new fsimage

The entire architecture of hadoop is built on RPC.

RPC(Remote Procedure Call), RPC in client/server mode

Remote Procedure Call Protocol, which is a request for services from remote computer programs over a network,

Without understanding the protocols of the underlying network technology.

Specific implementation process:

First, the client invoking process sends an invocation message with process parameters to the service process,

Then wait for the reply message. On the server side, the process stays asleep until the call message arrives.

When an invocation message arrives, the server gets the process parameters, calculates the result, sends a reply message,

and then wait for the next call message,

Finally, the client invokes the process to receive the reply message, obtain the process result, and then invoke execution to continue.

The object provided by the server must be an interface, which extends VersioinedProtocol

Methods in an object that the client can access must be located in the object's interface.

The above is "hadoop core HDFS what to use" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.