In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
I. preface research background
As the current block-level re-deletion technology of the open source system linux, linux device mapper deduplication target will be widely concerned in the future. Especially in the era when all-flash all flash array will burst in the future, more and more hybrid storage schemes will be replaced by all-flash memory. How to improve the utilization of flash memory has become the research object of the major manufacturers.
Thus it can be seen that re-deletion technology is particularly important for flash memory, and even the basic element that flash memory can achieve low cost, another element is: compression.
two。 The basic concept of re-deletion
Re-deletion is a very old knowledge. In history, many people have studied data re-deletion, so I will not introduce it here.
Here we borrow Wikipedia to take a quick look at re-deletion technology:
For details, see https://en.wikipedia.org/wiki/Data_deduplication {in computing, deduplication is a specialized data compression technology used to deduplicate copies of data. Related and slightly synonymous terms are intelligent (data) compression and single instance (data) storage. This technology is used to improve storage utilization and can also be applied to network data transmission to reduce the number of bytes that must be sent. During deduplication, a unique block or byte pattern is identified and stored during analysis. As the analysis continues, other blocks are compared with the stored copy, and whenever a match occurs, the redundant block is replaced with a small reference to the stored block. Given that the same byte pattern can occur dozens, hundreds, or even thousands of times (the matching frequency depends on the block size), the amount of data that must be stored or transferred can be greatly reduced. This type of deduplication is different from deduplication performed by standard file compression tools such as LZ77 and LZ78. Although these tools identify short duplicate substrings in a single file, the purpose of storage-based deduplication is to examine large amounts of data and identify the same large parts (such as the whole file or most files) so that only one copy is stored. You can compress the copy separately by using a single file compression technique. For example, a typical email system might contain 100 instances of the same 1 MB (megabyte) file attachment. Each time you back up the email platform, all 100 attachment instances are saved, requiring 100 MB of storage space. With deduplication, only one attachment instance is actually stored; subsequent instances are referenced back to the saved copy, with a deduplication rate of about 100 to 1. }
From the introduction of Wikipedia, we can see that re-deletion plays a very important role in specific application scenarios (such as email system).
3. Open source re-deletion technology
Currently, there are several kinds of open source deduplication:
Dm dedup, openzfs, btrfs, opendedup, etc.
Except for dm dedup, all the other functions are file system-level re-deletions.
So dm dedup is the only open source project I know of that has been re-deleted at the block level.
4. The difference between file-level re-deletion and block-level re-deletion
There is no difference between file and block-level re-deletion in nature. Their purpose is to compare duplicate data and use references instead of instances to save space, but from a stack point of view, they are very different. As we all know, we build a storage system, in which the functional completeness is at the bottom of the I stack, then its scope will be larger, the compatibility will be better, but the application awareness will be worse. We know that in general linux systems, file systems are generally built on block devices, so if the re-delete function is at the block level, it can be compatible with stable file systems that lack the re-delete function, such as ext4,xfs. So I think this is a very important point for the re-delete function at the block level. Another point is that for those export systems that directly need block storage (openstack cinder,vmware exsi, and some cluster file systems), whether it is server san or standard san, it is better to implement block-level re-deletion. If you use openzfs/btrfs to support the export of block-level subvolumes, there will be a great loss in performance, so it is very meaningful to directly implement the re-delete function at the block level.
[this article is only posted by 51cto blogger "underlying Storage Technology" https://blog.51cto.com/12580077, official account release: storage Valley], if you need to reprint, please contact me, thank you.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.