How does Linux find duplicate files in the system and quickly free disk space 04/16 Update SLTechnology News&Howtos

How does Linux find duplicate files in the system and quickly free disk space

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

What this article shares with you is about how Linux finds duplicate files in the system and quickly releases disk space. The editor thinks it is very practical, so he shares it with you. I hope you can get something after reading this article.

Whether it is a Windows computer or a Linux computer, there are more or less duplicate files in the process of use. These files will not only take up our disk, but also drag down our system, so it is necessary to get rid of these duplicate files.

Here are 6 ways to find duplicate files in the system so that you can quickly free up hard disk space!

1. Use the diff command to compare files

In our normal operation, the easiest way to compare the differences between two files is probably to use the diff command. The output of the diff command will use the and > symbol to show the difference between the two files, and using this feature we can find the same file.

When there are differences between the two files, the diff command outputs the differences:

$diff index.html backup.html2438a2439,2441 > > That's all there is to report. >

If your diff command has no output, it means the two files are the same:

$diff home.html index.html$

However, the disadvantage of the diff command is that it can only compare two files at a time, which must be very inefficient if we want to compare multiple files.

two。 Use checksum

The checksum command cksum calculates the contents of the file into a very long number (such as 2819078353 228029) according to a certain algorithm. Although the results are not absolutely unique, the possibility that different documents will lead to the same checksum is similar to that of the Chinese men's football team in the World Cup.

$cksum * .html2819078353 228029 backup.html4073570409 227985 home.html4073570409 227985 index.html

In our above operation, we can see that the second and third file checksums are the same, so we can assume that the two files are the same.

3. Use the find command

Although the find command does not have the option to find duplicate files, it can be used to search for files by name or type and run the cksum command. The specific operation is as follows.

$find. -name "* .html"-exec cksum {}\; 4073570409 227985. / home.html2819078353 228029. / backup.html4073570409 227985. / index.html

4. Use the fslint command

The fslint command can be used specifically to find duplicate files. But there is a note here, that is, we need to give it a starting position. If we need to run a large number of files, this command may take a long time to complete the lookup.

Fslint.-- file name lint--Invalid utf8 names--file case lint- -- DUPlicate files in links--suspect links--Empty Directories./.gnupg- -- Temporary Files--duplicate/conflicting Names--Bad ids-Non Stripped executables

Tips: we must install fslint on the system and add it to the search path:

$export PATH=$PATH:/usr/share/fslint/fslint

5. Use the rdfind command

The rdfind command will also look for duplicate (identical) files. Called "redundant data lookup", this command can determine which files are the original files based on the file date, which is helpful for us to choose to delete duplicates because it deletes newer files.

$rdfind ~ Now scanning "/ home/alvin", found 12 files.Now have 12 files in total.Removed 1 files due to nonunique device and inode.Total size is 699498 bytes or 683 KiBRemoved 9 files due to unique sizes from list.2 files left.Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.It seems like you have 2 files that are not uniqueTotally, 223 KiB can be reduced.Now making results file results.txt

We can also run it in dryrun.

$rdfind-dryrun true ~ (DRYRUN MODE) Now scanning "/ home/alvin" Found 12 files. (DRYRUN MODE) Now have 12 files in total. (DRYRUN MODE) Removed 1 files due to nonunique device and inode. (DRYRUN MODE) Total size is 699352 bytes or 683 KiBRemoved 9 files due to unique sizes from list.2 files left. (DRYRUN MODE) Now eliminating candidates based on first bytes:removed 0 files from list.2 files left. (DRYRUN MODE) Now eliminating candidates based on last bytes:removed 0 files from list.2 files left. (DRYRUN MODE) Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left. (DRYRUN MODE) It seems like you have 2 files that are not unique (DRYRUN MODE) Totally 223 KiB can be reduced. (DRYRUN MODE) Now making results file results.txt

The rdfind command also provides options such as ignoring empty files (- ignoreempty) and following symbolic links (- followsymlinks). Its common options are explained in detail below.

It is important to note that the rdfind command provides the option to delete duplicate files using the-deleteduplicates true setting. As the name implies, using this option will automatically delete duplicate files.

$rdfind-deleteduplicates true.... Deleted 1 files.

Of course, the premise is that we must also install the rdfind command on the system.

6. Use the fdupes command

The fdupes command can also easily identify duplicate files and provides a number of useful options. In the simplest operation, it puts duplicate files together, as follows:

$fdupes ~ / home/alvin/UPGRADE/home/alvin/mytwin/home/alvin/lp.txt/home/alvin/lp.man/home/alvin/penguin.png/home/alvin/penguin0.png/home/alvin/hideme.png

The-r option represents recursion, indicating that it will use recursion to find duplicate files under each directory. However, it is important to have many duplicate files under Linux (such as the user's .bashrc and .profile files), which will cause a system exception if deleted.

# fdupes-r / home/home/shark/home.html/home/shark/index.html/home/dory/.bashrc/home/eel/.bashrc/home/nemo/.profile/home/dory/.profile/home/shark/.profile/home/nemo/tryme/home/shs/tryme/home/shs/arrow.png/home/shs/PNGs/arrow.png/home/shs/11/files_11.zip/home/shs/ERIC/file_11.zip/home/shs/penguin0.jpg/ Home/shs/PNGs/penguin.jpg/home/shs/PNGs/penguin0.jpg/home/shs/Sandra_rotated.png/home/shs/PNGs/Sandra_rotated.png

Common options for the fdupes command are shown in the following table:

What is Linux system Linux is a free-to-use and free-spread UNIX-like operating system, is a POSIX-based multi-user, multi-task, multi-threaded and multi-CPU operating system, using Linux can run major Unix tools, applications and network protocols.

This is how Linux finds duplicate files in the system and quickly frees up disk space. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.