In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to delete duplicate pictures in a folder with python". In daily operation, I believe many people have doubts about how to delete duplicate pictures in a folder with python. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how to delete duplicate pictures in a folder with python". Next, please follow the editor to study!
The first part: judge whether the two pictures are the same.
To find duplicate pictures, it is inevitable to judge whether the two pictures are the same or not. It's easy to judge two pictures. The picture can be regarded as an array. Just compare whether the two arrays are equal or not. But this is too simple and rude, because it is inefficient to compare each element of the two arrays one by one. To avoid comparing two large arrays as much as possible:
First, compare the size (byte) of the two pictures. If the size is different, the two pictures are different.
Under the premise of the same size of the two pictures, compare the size (length and width) of the two pictures. If the size is not the same, the two pictures are not the same.
Under the premise of the same size of the two pictures, compare the contents of the two pictures (that is, array elements). If the contents are not the same, the pictures are different.
In this way, when the size of the picture or the size of the picture is different, it is considered that the two pictures are different, so the part of comparing the array elements can be omitted, and the efficiency up~
Import shutilimport numpy as npfrom PIL import Imageimport osdef compare picture size (dir_image1, dir_image2): with open (dir_image1, "rb") as F1: size1 = len (f1.read ()) with open (dir_image2) "rb") as f2: size2 = len (f2.read ()) if (size1 = = size2): result = "same size" else: result = "different size" return resultdef compare picture size (dir_image1 Dir_image2): image1 = Image.open (dir_image1) image2 = Image.open (dir_image2) if (image1.size = = image2.size): result = "same size" else: result = "different size" return resultdef compare picture content (dir_image1 Dir_image2): image1 = np.array (Image.open (dir_image1)) image2 = np.array (Image.open (dir_image2)) if (np.array_equal (image1, image2)): result = "same content" else: result = "different content" return resultdef compare whether two pictures are the same (dir_image1 Dir_image2): # compare whether two pictures are the same # step 1: compare whether the size is the same # step 2: compare whether the length and width are the same # step 3: compare whether each pixel is the same # if the previous step is different Then the two pictures must be different result = "two pictures are different" size = compare picture size (dir_image1, dir_image2) if (size = = "same size"): size = compare picture size (dir_image1, dir_image2) if (size = = "same size"): content = compare picture content (dir_image1 Dir_image2) if (content = = "same content"): result = "two pictures are the same" return result part 2: determine whether there are duplicate pictures in the folder
To determine whether there are the same pictures as picture An in the folder, you need to traverse all the pictures in the folder and determine whether the two pictures are the same one by one. If the folder has 1000 pictures, the first picture needs to be compared with the remaining 999 pictures, the second picture needs to be compared with the remaining 998 pictures, the third picture needs to be compared with the remaining 997 pictures, and so on. What you do in this program is to sort all the pictures by picture size (byte), and then perform a traversal comparison. The result is that repeated pictures are more likely to appear in succession (because the repeated pictures are the same size).
If _ _ name__ = ='_ main__': load_path ='E:\ test picture set (not duplicated)'# folder to be deduplicated save_path ='E:\ test picture set (duplicate photos) # empty folder to store detected duplicate photos os.makedirs (save_path, exist_ok=True) # get picture list file_map Dictionary {file path filename: file size image_size} file_map = {} image_size = 0 # traverse files and folders (including subdirectories) under filePath for parent, dirnames, filenames in os.walk (load_path): # for dirname in dirnames: # print ('parent is% s, dirname is% s' (parent) Dirname)) for filename in filenames: # print ('parent is% s, filename is% s'% (parent, filename)) # print ('the full name of the file is% s'% os.path.join (parent, filename)) image_size = os.path.getsize (os.path.join (parent, filename)) file_map.setdefault (os.path.join (parent, filename) The list of pictures obtained by image_size) # is sorted by file size image_size file_map = sorted (file_map.items (), key=lambda d: d [1], reverse=False) file_list = [] for filename, image_size in file_map: file_list.append (filename) # take out duplicate pictures file_repeat = [] for currIndex Filename in enumerate (file_list): dir_image1 = file_ list [currIndex] dir_image2 = file_ list [currIndex + 1] result = compare whether two pictures are the same (dir_image1, dir_image2) if (result = = "two pictures are the same"): file_repeat.append (file_ list [currIndex + 1]) print ("the same picture:", file_ list [currIndex] File_ list [currIndex + 1]) else: print ('different pictures:', file_list [currIndex], file_ list [currIndex + 1]) currIndex + = 1 if currIndex > = len (file_list)-1: break # move duplicate pictures to a new folder Reduce the weight of the original folder for image in file_repeat: shutil.move (image, save_path) print ("removing duplicate photos:", image) part 3: the result of the program running
If there are 10 pictures A, 5 pictures B and 1 picture C under the folder, 1 picture A, 1 picture B and 1 picture C remain under the folder after the program is finished; other pictures are moved to the folder specified by save_path.
Part IV: attention
The program code can be copied directly, and the parameters of load_path and save_path need to be modified.
Make sure that the load_path folder is a file type in image format (.jpg .png .jpeg), and there can be no files in other formats (for example, .mp4). Please process the folder with the explorer first, and the boss can directly modify the code to read the files of the specified type under the folder.
At this point, the study on "how to delete duplicate pictures in a folder with python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.