A script to locate duplicate image files between two sets of folders (e.g. Camera Roll folders vs other folders).
It is fairly common, for instance, to copy image files from your Camera Roll, and then leave traces of those edited copies around after modifying them. This script helps to locate those duplicates, assuming that they still have their original the image metadata (i.e. Date Taken
).
- The default duplicate criteria is to match only by
Date Taken
(e.g.2021-01-01T00:11:22+0000
).- Choose whether the duplicate criteria should also include file size and file hash.
- Searches two groups of folders (i.e. source and other) for all descendent image files with an existing
Date Taken
attribute. - Compares files of the two groups of folders, identifying duplicates using the criteria you defined
- Finally, exports duplicates into a
duplicates.json
file.
For DateTaken-only criteria, the key is DateTaken
, where DateTaken
is in ISO 8601
format.
{
"2021-01-01T00:11:22+0000": [
"C:\\path\\to\\Camera Roll\\source.jpg", // The first file is the source file.
"C:\\path\\to\\other folder\\duplicate.jpg", // The rest are duplicates.
...
],
...
}
For DateTaken, length, and file hash criteria, the key is DateTaken-Length-FileHash
, where DateTaken
is in ISO 8601
format, Length
is a integer in bytes, and FileHash
is an SHA256
hash value.
{
"2021-01-01T00:11:22+0000-1234567-XXXXXXXXXX": [
"C:\\path\\to\\Camera Roll\\source.jpg", // The first file is the source file.
"C:\\path\\to\\other folder\\duplicate.jpg", // The rest are duplicates.
...
],
...
}