In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you about what HDFS is in the Hadoop distributed file system. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.
HDFS design
Super large file PB level
Streaming data access
Write once and read multiple times is the most efficient access mode
Commercial hardware (does not require expensive and highly available hardware)
The node failure rate is high, and some applications are not suitable.
Data access with low time latency
Data access that requires low latency, for example, a range of tens of milliseconds is not suitable for running on HDFS
A large number of small files
Namenode stores the metadata of the file system in the content, so the total number of files that the file system can store is limited by the memory capacity of the namenode
According to experience, the storage information of each file, directory and data block is about 150 bytes.
Multi-user write, arbitrarily modify the file
Files in HDFS support only a single writer, and write operations always write data at the end of the file in an "add-only" manner, which does not support multiple writers or modifications anywhere in the file.
HDFS concept
Block default 128mb
Files whose HDFS is less than one block size will not occupy the whole block space.
Hdfs fsck /-files-blocks
Namenode and datanode
1. To back up the files that make up the persistent state of the file system metadata, the general configuration is to write the persistent state to the local disk while writing to NFS
Running the auxiliary namenode generally runs on a separate physical machine. After the primary node is dead, enabling the secondary node does not guarantee data integrity
Namenode manages the namespace of the file system. It maintains the file system tree and all files and directories in the entire tree. This information is permanently saved on the local disk in the form of two files: the namespace image file and the edit log file. Namenode also records the data node information of each block in each file, but it does not permanently preserve the location information of the block, because this information is rebuilt based on the data node information when the system starts.
Datanode stores and retrieves blocks as needed, and periodically sends a list of blocks they store to namenode
Namenode fault-tolerant mechanism
Block caching
By default, a block is cached in the memory of an datanode
Improve the performance of reading
Cache pool
Federal HDFS
Federated HDFS allows the system to extend by adding namenode
Each namenode manages a portion of the file system namespace
The pair do not communicate with each other
High availability of HDFS
Namenode needs to share editing logs through highly available shared storage.
Datanode needs to send block processing reports to both namenode at the same time, because the block mapping information is stored in namenode memory, not disk
The client needs to use a specific mechanism to deal with the failure of namenode, which is transparent to users.
The role of the secondary namenode is contained in the standby namenode
Select NFS or Group Log Manager (QJM) for shared storage
Failover and avoidance
Revoke namenode access to the shared storage directory
Block network port
Power outage and so on.
High availability cannot be achieved by backing up namenode
Architectural support for HA
Command line interface
Namenode running port 8020
From the local copy file to the hdfs / user/hadoop/ directory
Hadoop fs-copyFromLocal max_temperature.sh a.sh
Copy files back to the local file system
Hadoop fs-copyToLocal a.sh my.sh
Md5sum max_temperature.sh my.sh
Create a new directory
Hadoop fs-mkdir books
Hadoop fs-ls.
Security measures are disabled by default, and dfs.premissions.enabled should be enabled in the production environment
Execution permissions make no sense to a file
Hadoop file system
HDFS is just one of the implementations
Hadoop file system Local,HDFS,WebHDFS,Secure WebHDFS, HAR, View, FTP, S3, Azure, Swift
List the files hadoop fs-ls file:/// under the root directory of the local file system
Although running MapReduce programs can access any file system, it is recommended that you choose a distributed file system that has the advantage of data localization, such as HDFS, when dealing with large datasets
Interface
The user control file system allows the user to integrate the implemented file system as a Unix file system.
By using the Fuse-DFS module, HDFS can be mounted as a standard local file system.
NFS gateway is a better solution and should be chosen.
It is possible to mount HDFS as a local client file system using hadoop's NFSv3 gateway.
You can use the Unix program to interact with the file system
HDFS can only write files in append mode, so you can add data to the end of the file, but you cannot modify the file at random
The HTTP interface is slower than the native Java client, so don't use it to send out extra-large data
Direct access by access method (the web server embedded in namenode and datanode runs as the end node of WebHDFS, dfs.webhdfs.enabled is set to true)
Access through proxy
The HttpFS agent provides the same HTTp interface as WebHDFS to use httpfs.sh scripts
HTTP
C language (libhdfs,libwebhdfs) the underlying code is java
NFS
FUSE
Java interface
Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar URLCat hdfs:///user/hadoop/output/part-r-00000
Public class FileSystemCat {public static void main (String [] args) throws Exception {String uri = args [0]; Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (uri), conf); InputStream in = null; try {in = fs.open (new Path (uri)); IOUtils.copyBytes (in, System.out, 4096, false);} finally {IOUtils.closeStream (in) }}}
Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemCat hdfs:///user/hadoop/output/part-r-00000
Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemDoubleCat hdfs:///user/hadoop/output/part-r-00000
Create a new file
FSDataOutputStream
Directory public boolean mkdirs (Path f) throws IOException
Query file system
The FileStatus class encapsulates the metadata for files and directories in the file system
Public boolean exists (Path f) throws IOException
Write data
FSDataOutputStream create (FileSystem fs,Path file, FsPermission permission) throws IOException FSDataOutputStream append (Path f) throws IOException / / Code String localSrc = args [0]; String dst = args [1]; InputStream in = new BufferedInputStream (new FileInputStream (localSrc)); Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (dst), conf) OutputStream out = fs.create (new Path (dst), new Progressable () {public void progress () {System.out.println (".");}}); IOUtils.copyBytes (in, out, 4096, false)
FSDataInputStream seek () is a relatively expensive operation and needs to be used cautiously. It is recommended to use streaming data to build the access pattern of the application, rather than executing a large number of seek () method hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemDoubleCat hdfs:///user/hadoop/output/part-r-00000
Through FileSystem
Read data FsUrlStreamHandlerFactory IOUtils through URL
This is what the HDFS in the Hadoop distributed file system shared by the editor is. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.