Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

HDFS experiment (1) principle

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Original text here.

https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

Hadoop has two components: mapreduce and hdfs.

Target with HDFS

Avoid hardware failures

Hardware failures are the norm, not the exception. An HDFS instance may consist of hundreds or thousands of servers that store portions of the file system's data. The fact that there are a large number of components, each with a non-trivial probability of failure means that some components of HDFS are always non-functional. Therefore, rapid fault detection and automatic recovery is a core architectural goal of HDFS.

streaming data access

Applications running on HDFS require streaming media access to their data sets. They are not general purpose applications running on general purpose file systems. HDFS is designed for batch processing, not for interactive use by users. The emphasis is on high-throughput data access rather than low-latency data access. POSIX has requirements that are not required for targeted HDFS applications. POSIX semantics are traded in several key areas to increase data throughput.

large data sets

Applications running on HDFS have large datasets. The typical file size in HDFS is terabytes. HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale for hundreds of nodes in a single cluster. It should support tens of millions of files in one instance.

consistency model

HDFS applications require many access models that write one-time read files. Files created, written, and closed do not need to be changed. This assumption simplifies data consistency issues and allows high-throughput data access. MapReduce applications or Web crawler applications fit this model perfectly. There is a plan to support appending writes to files in the future.

Mobile computing is cheaper than mobile data

The calculations requested by an application are much more efficient if they are performed close to the data on which they are running. This is especially true when the size of the dataset is huge. This reduces network congestion and improves the overall throughput of the system. The assumption is that it is generally better to migrate computation to where the data is located rather than move data to where the application runs. HDFS provides an interface where applications will be closer to where the data resides.

Portability across heterogeneous hardware and software platforms

HDFS is designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a large application platform of choice.

Nodes and data nodes

HDFS has a master-slave architecture. An HDFS cluster consists of a single node, a master server, that manages file system namespaces and regulates client access to files. In addition, there are multiple data nodes, usually a cluster of each node, where management is connected to nodes that run on storage. HDFS file system namespaces are exposed and allow users to store data in files. Internally, a file is divided into one or more data blocks, which are stored in a set of data nodes. Namenode performs file system namespace operations such as opening, closing, and renaming files and directories. This also determines the mapping of data blocks to data nodes. Data nodes are responsible for requests from clients that read and write to the file system. Data nodes are created, deleted, and copied from NameNode instructions.

schematic

replication principle

HDFS is designed to reliably store very large files in a large cluster on the machine. It stores each file as a set of blocks; all blocks in the file are the same size except for the last block. Blocks of files are copied for fault tolerance. Block size and replication factor configurable per file. An application can specify the number of copies of a file. Replication factors can be specified at file creation and changed later. Documents are written once in HDFS, always with strict writers.

All decisions for duplicate blocks. It regularly receives heartbeats, blockreports from each data node in the cluster. A heartbeat receipt indicates that DataNode is functioning properly. A blockreport lists all blocks in DataNode.

Please accept the translation with a smile...

FS Shell File Operations

Action

Command

Create a directory named /foodir

bin/hadoop dfs -mkdir /foodir

Remove a directory named /foodir

bin/hadoop dfs -rmr /foodir

View the contents of a file named /foodir/myfile.txt

bin/hadoop dfs -cat /foodir/myfile.txt

FS shell is targeted for applications that need a scripting language to interact with the stored data.

DFSAdmin

The DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:

Action

Command

Put the cluster in Safemode

bin/hadoop dfsadmin -safemode enter

Generate a list of DataNodes

bin/hadoop dfsadmin -report

Recommission or decommission DataNode(s)

bin/hadoop dfsadmin -refreshNodes

API links are as follows, can be C or Java

http://hadoop.apache.org/docs/current/api/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report