In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Original text here.
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Hadoop has two components: mapreduce and hdfs.
Target with HDFS
Avoid hardware failures
Hardware failures are the norm, not the exception. An HDFS instance may consist of hundreds or thousands of servers that store portions of the file system's data. The fact that there are a large number of components, each with a non-trivial probability of failure means that some components of HDFS are always non-functional. Therefore, rapid fault detection and automatic recovery is a core architectural goal of HDFS.
streaming data access
Applications running on HDFS require streaming media access to their data sets. They are not general purpose applications running on general purpose file systems. HDFS is designed for batch processing, not for interactive use by users. The emphasis is on high-throughput data access rather than low-latency data access. POSIX has requirements that are not required for targeted HDFS applications. POSIX semantics are traded in several key areas to increase data throughput.
large data sets
Applications running on HDFS have large datasets. The typical file size in HDFS is terabytes. HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale for hundreds of nodes in a single cluster. It should support tens of millions of files in one instance.
consistency model
HDFS applications require many access models that write one-time read files. Files created, written, and closed do not need to be changed. This assumption simplifies data consistency issues and allows high-throughput data access. MapReduce applications or Web crawler applications fit this model perfectly. There is a plan to support appending writes to files in the future.
Mobile computing is cheaper than mobile data
The calculations requested by an application are much more efficient if they are performed close to the data on which they are running. This is especially true when the size of the dataset is huge. This reduces network congestion and improves the overall throughput of the system. The assumption is that it is generally better to migrate computation to where the data is located rather than move data to where the application runs. HDFS provides an interface where applications will be closer to where the data resides.
Portability across heterogeneous hardware and software platforms
HDFS is designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a large application platform of choice.
Nodes and data nodes
HDFS has a master-slave architecture. An HDFS cluster consists of a single node, a master server, that manages file system namespaces and regulates client access to files. In addition, there are multiple data nodes, usually a cluster of each node, where management is connected to nodes that run on storage. HDFS file system namespaces are exposed and allow users to store data in files. Internally, a file is divided into one or more data blocks, which are stored in a set of data nodes. Namenode performs file system namespace operations such as opening, closing, and renaming files and directories. This also determines the mapping of data blocks to data nodes. Data nodes are responsible for requests from clients that read and write to the file system. Data nodes are created, deleted, and copied from NameNode instructions.
schematic
replication principle
HDFS is designed to reliably store very large files in a large cluster on the machine. It stores each file as a set of blocks; all blocks in the file are the same size except for the last block. Blocks of files are copied for fault tolerance. Block size and replication factor configurable per file. An application can specify the number of copies of a file. Replication factors can be specified at file creation and changed later. Documents are written once in HDFS, always with strict writers.
All decisions for duplicate blocks. It regularly receives heartbeats, blockreports from each data node in the cluster. A heartbeat receipt indicates that DataNode is functioning properly. A blockreport lists all blocks in DataNode.
Please accept the translation with a smile...
FS Shell File Operations
Action
Command
Create a directory named /foodir
bin/hadoop dfs -mkdir /foodir
Remove a directory named /foodir
bin/hadoop dfs -rmr /foodir
View the contents of a file named /foodir/myfile.txt
bin/hadoop dfs -cat /foodir/myfile.txt
FS shell is targeted for applications that need a scripting language to interact with the stored data.
DFSAdmin
The DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:
Action
Command
Put the cluster in Safemode
bin/hadoop dfsadmin -safemode enter
Generate a list of DataNodes
bin/hadoop dfsadmin -report
Recommission or decommission DataNode(s)
bin/hadoop dfsadmin -refreshNodes
API links are as follows, can be C or Java
http://hadoop.apache.org/docs/current/api/
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.