Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to identify files with the same content on Linux

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to identify documents with the same content on Linux. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Compare files with the diff command

Perhaps the easiest way to compare two files is to use the diff command. The output will show the differences in your file. The symbol represents whether there are additional lines of text in the * () file passed by the parameter. In this example, there are additional lines of text in backup.html.

$diff index.html backup.html2438a2439,2441 > > That's all there is to report. >

If diff has no output, it means that the two files are the same.

$diff home.html index.html$

The disadvantage of diff is that it can only compare two files at a time and you must specify which files to compare. Some commands in this post can find multiple duplicate files for you.

Use checksum

The cksum (checksum) command calculates the checksum of the file. A checksum is a mathematical simplification that converts text content into a long number, such as 2819078353 228029. Although checksums are not entirely unique, there is little chance that different checksums will be the same.

$cksum * .html2819078353 228029 backup.html4073570409 227985 home.html4073570409 227985 index.html

In the above example, you can see how the second and third files that produce the same checksum can be defaulted to the same.

Use the find command

Although the find command does not have the option to find duplicate files, it can still be used to find files by name or type and run the cksum command. For example:

$find. -name "* .html"-exec cksum {}\; 4073570409 227985. / home.html2819078353 228029. / backup.html4073570409 227985. / index.html uses the fslint command

The fslint command can be specifically used to find duplicate files. Notice that we gave it a starting position. If it needs to traverse a considerable number of files, it will take some time to complete. Notice how it lists duplicate files and looks for other problems, such as empty directories and bad ID.

Fslint.-- file name lint--Invalid utf8 names--file case lint- -- DUPlicate files

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report