Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the advantages and disadvantages of HDFS in Hadoop

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you what are the advantages and disadvantages of HDFS in Hadoop, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Advantages of HDFS:

1. Deal with oversized files

Very large files here usually refer to files with a size of one hundred megabytes or even hundreds of TB. At present, in practical applications, HDFS can already be used to store and manage PB-level data.

2. Streaming access to data

The design of HDFS is based on the "write once, read write multiple" task. This means that once a dataset is generated by a data source, it is replicated and distributed to different storage nodes and then responds to a variety of data analysis task requests. In most cases, the analysis task involves most of the data in the dataset, that is, it is more efficient for HDFS to request to read the entire dataset than to read a single record.

3. Run on a cheap business machine cluster

The Hadoop design has low emergency requirements and only needs to run on low-cost commercial hardware clusters, rather than on expensive high-availability machines. Cheap commercial machines mean that the probability of node failure in large clusters is very high. When HDFS encounters the above failure, it is designed to continue to run without obvious interruption to the user.

Disadvantages of HDFS:

1. Not suitable for low-latency data access

If you want to process some low-latency application requests that users require a short time, HDFS is not suitable. HDFS is designed to handle large data set analysis tasks, mainly to achieve high data throughput, which may require high latency as a cost.

Improvement strategy:

For applications with low latency requirements, HBase is a better choice, and try to make up for this deficiency as much as possible through upper-level data management projects. There has been a big improvement in performance, and its slogan is goes real time. Using caching or multiple master designs can reduce data request pressure on Clinet to reduce latency.

2. Unable to store a large number of small files efficiently

Because NameNode places the metadata of the file system in memory, the number of files that all file systems can hold is determined by the memory size of the NameNode. Another problem is that because the number of MapTask is determined by Splits, too much MapTask will be generated when using MR to process a large number of small files, and thread management overhead will increase job time. When Hadoop deals with many small files (the file size is smaller than the Block size in HDFS), because FileInputFormat does not divide small files, each small file will be treated as a Split and assigned a Map task, resulting in inefficiency.

For example, a 1G file will be divided into 16 64MB Split and 16 Map tasks will be assigned to process, while 10000 100Kb files will be processed by 10000 Map tasks.

Improvement strategy:

There are many ways for HDFS to handle small files. Use SequenceFile, MapFile, Har and other ways to archive small files, the principle of this method is to file small files to manage, HBase is based on this.

3. Multi-user writing and arbitrary modification of files are not supported.

There is only one writer in a file in HDFS, and the write operation can only be completed at the end of the file, that is, the append operation can only be performed. Currently, HDFS does not support multiple users to write to the same file or modify it anywhere in the file.

The above is all the content of the article "what are the advantages and disadvantages of HDFS in Hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report