Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to find and delete duplicate files in Linux

2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to find and delete duplicate files in Linux, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Find and delete duplicate files in Linux

For the purposes of this guide, I will discuss the following three tools:

Rdfind

Fdupes

FSlint

These three tools are free and open source and run on most Unix-like systems.

1. Rdfind

Rdfind, which means redundant data find (redundant data Lookup), is a free and open source tool that finds duplicate files by visiting directories and subdirectories. It is based on the content of the file rather than the name of the file. Rdfind uses a sorting algorithm to distinguish between original and duplicate files. If you have two or more identical files, Rdfind will intelligently find the original files and identify the remaining files as duplicate files. Once a copy of the file is found, it will report back to you. You can decide whether to delete or use hard links or symbolic (soft) links instead.

Install Rdfind

Rdfind exists in AUR. Therefore, in Arch-based systems, you can install it using any of the Yay AUR program helpers as below.

$yay-S rdfind

On Debian, Ubuntu, Linux Mint:

$sudo apt-get install rdfind

On Fedora:

$sudo dnf install rdfind

On RHEL, CentOS:

$sudo yum install epel-release$ sudo yum install rdfind

Usage

Once the installation is complete, run the Rdfind command with only the directory path to scan for duplicate files.

$rdfind ~ / Downloads

As you can see from the screenshot above, the Rdfind command scans the ~ / Downloads directory and stores the results in a file called results.txt under the current working directory. You can see the names of possible duplicate files in the results.txt file.

$cat results.txt# Automatically generated# duptype id depth size device inode priority nameDUPTYPE_FIRST_OCCURRENCE 1469 8 9 2050 15864884 1 / home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test5.regexDUPTYPE_WITHIN_SAME_TREE-1469 8 9 2050 15864886 1 / home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test6.regex [...] DUPTYPE_FIRST_OCCURRENCE 13 0 403635 2050 15740257 1 / home/sk/Downloads/Hyperledger (1) .pdfDUPTYPE_WITHIN_SAME_TREE-13 0 403635 2050 15741071 1 / home/sk/Downloads/Hyperledger.pdf# end of file

By checking the results.txt file, you can easily find those duplicate files. You can delete them manually if you like.

In addition, you can use the-dryrun option to find all duplicate files without modifying anything else, and output summary information on the terminal.

$rdfind-dryrun true ~ / Downloads

Once duplicate files are found, you can replace them with hard links or symbolic links.

Use hard links instead of all duplicate files, run:

$rdfind-makehardlinks true ~ / Downloads

Use symbolic links / soft links instead of all duplicate files, run:

$rdfind-makesymlinks true ~ / Downloads

There are some empty files in the directory. You may want to ignore them. You can use the-ignoreempty option as follows:

$rdfind-ignoreempty true ~ / Downloads

If you no longer want these old files, delete duplicate files instead of replacing them with hard or soft links.

Delete duplicate files and run:

$rdfind-deleteduplicates true ~ / Downloads

If you don't want to ignore empty files, delete them with all duplicate files. Run:

$rdfind-deleteduplicates true-ignoreempty false ~ / Downloads

For more details, see the help section:

$rdfind-help

Man pages:

$man rdfind2. Fdupes

Fdupes is another command-line tool that identifies and removes duplicate files in specified directories and subdirectories. This is a free and open source tool written in C language. Fdupes identifies duplicate files by comparing the file size, some MD5 signatures, all MD5 signatures, and * * performing byte-by-byte comparison check.

Similar to the Rdfind tool, Fdupes comes with very few options to perform operations, such as:

Recursively search for duplicate files in directories and subdirectories

Exclude empty and hidden files from the calculation

Show duplicate file size

Delete a duplicate file immediately

Use different owners / groups or permission bits to exclude duplicate files

More

Install Fdupes

Fdupes exists in the default repository of most Linux distributions.

On Arch Linux and its variants such as Antergos and Manjaro Linux, install it using Pacman as follows.

$sudo pacman-S fdupes

On Debian, Ubuntu, Linux Mint:

$sudo apt-get install fdupes

On Fedora:

$sudo dnf install fdupes

On RHEL, CentOS:

$sudo yum install epel-release$ sudo yum install fdupes

Usage

Fdupes is very simple to use. Just run the following command to find duplicate files in the directory, such as: ~ / Downloads.

$fdupes ~ / Downloads

Sample output from my system:

/ home/sk/Downloads/Hyperledger.pdf/home/sk/Downloads/Hyperledger (1) .pdf

You can see that there is a duplicate file in the / home/sk/Downloads/ directory. It shows only duplicate files in the parent directory. How do I display duplicate files in a subdirectory? Use the-r option as below.

$fdupes-r ~ / Downloads

Now you will see the duplicate files in the / home/sk/Downloads/ directory and subdirectories.

Fdupes can also be used to quickly find duplicate files from multiple directories.

$fdupes ~ / Downloads ~ / Documents/ostechnix

You can even search multiple directories and recursively search one of them, as follows:

$fdupes ~ / Downloads-r ~ / Documents/ostechnix

The above command will search for duplicate files in the ~ / Downloads directory, ~ / Documents/ostechnix directory and its subdirectories.

Sometimes, you may want to know the size of duplicate files in a directory. You can use the-S option as follows:

$fdupes-S ~ / Downloads403635 bytes each:/home/sk/Downloads/Hyperledger.pdf/home/sk/Downloads/Hyperledger (1). Pdf

Similarly, to display the size of duplicate files in the parent and subdirectories, use the-Sr option.

We can use the-n and-An options to exclude blank files and hidden files, respectively.

$fdupes-n ~ / Downloads$ fdupes-A ~ / Downloads

When searching for duplicate files in the specified directory, * * commands will exclude zero-length files, and subsequent commands will exclude hidden files.

Summarize duplicate file information, using the-m option.

$fdupes-m ~ / Downloads1 duplicate files (in 1 sets), occupying 403.6 kilobytes

Delete all duplicate files, using the-d option.

$fdupes-d ~ / Downloads

Sample output:

[1] / home/sk/Downloads/Hyperledger Fabric Installation.pdf [2] / home/sk/Downloads/Hyperledger Fabric Installation (1). Pdf Set 1 of 1, preserve files [1-2, all]:

This command will prompt you to keep or delete all other duplicate files. Enter any number to keep the corresponding file and delete the rest of the file. You need to be more careful when using this option. If you are not careful, you may delete the original file.

If you want to keep * files in each duplicate file collection and delete other files without prompting, use the-dN option (not recommended).

$fdupes-dN ~ / Downloads

Delete duplicate files when they are encountered, using the-I flag.

$fdupes-I ~ / Downloads

For more details on Fdupes, see the help section and the man page.

$fdupes-help$ man fdupes3. FSlint

FSlint is another tool for finding duplicate files, and sometimes I use it to remove unwanted duplicate files from the Linux system and free up disk space. Unlike the other two tools, FSlint has two modes: GUI and CLI. So it's more friendly for beginners. FSlint finds not only duplicate files, but also bad symbolic links, bad name files, temporary files, bad user ID, empty directories, non-condensed binaries, and so on.

Install FSlint

FSlint exists in AUR, so you can install it using any AUR helper.

$yay-S fslint

On Debian, Ubuntu, Linux Mint:

$sudo apt-get install fslint

On Fedora:

$sudo dnf install fslint

On RHEL,CentOS:

$sudo yum install epel-release$ sudo yum install fslint

Once the installation is complete, launch it from the menu or application launcher.

The FSlint GUI is shown as follows:

As you can see, FSlint has a friendly interface and is clear at a glance. In the "Search path" column, add the directory path you want to scan, and click the "Find" button in the lower left corner to find duplicate files. Verify that the recursion option can recursively search for duplicate files in directories and subdirectories. FSlint will quickly scan the given directory and list duplicate files.

Select duplicate files from the list to clean up, or you can select "Save", "Delete", "Merge" and "Symlink" to manipulate them.

In the "Advanced search parameters" column, you can specify the excluded path when searching for duplicate files.

FSlint command line options

FSlint provides the following CLI toolset to find duplicate files in your file system.

Findup-find duplicate files

Findnl-find the name specification (the file name in question)

Findu8-look for illegal utf8-encoded file names

Findbl-find bad links (problematic symbolic links)

Findsn-find files with the same name (file names that may conflict)

Finded-find an empty directory

Findid-find the files of dead users

Findns-find non-compact executables

Findrs-look for extra whitespace in the file name

Findtf-find temporary files

Findul-find libraries that may not be used

Zipdir-Recycle wasted space under the ext2 directory entry

All of these tools are located under / usr/share/fslint/fslint/fslint.

For example, look for duplicate files in a given directory and run:

$/ usr/share/fslint/fslint/findup ~ / Downloads/

Similarly, the command to find an empty directory is:

$/ usr/share/fslint/fslint/finded ~ / Downloads/

Get more details about each tool, for example: findup, run:

$/ usr/share/fslint/fslint/findup-help

For more details on FSlint, see the help section and the man page.

$/ usr/share/fslint/fslint/fslint-the above is how to find and delete duplicate files in Linux. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report