-
Notifications
You must be signed in to change notification settings - Fork 0
8. Class Reference
Work in progress
The Processor
class is the core component responsible for identifying and grouping duplicate files
based on the multi step processing workflow. It utilizes various algorithms and grouping strategies
to efficiently process and classify files into sets of similar files.
Processor(Grouper grouper, Collection<Algorithm<?>> algorithms)
-
Parameters:
-
grouper
- AGrouper
instance to perform the initial division of files based on a distinction predicate (e.g., CRC32 checksum). -
algorithms
- A collection ofAlgorithm
objects applied to the files during the "Algorithm Application" step. The order of the algorithms matter for the processing.
-
-
Throws:
-
NullPointerException
- If eithergrouper
, oralgorithms
are null, or the algorithm collection is empty or containsnull
elements.
-
-
Purpose: Initializes the
Processor
with the provided grouping strategy and set of algorithms for processing the files.
Map<File, Set<File>> process(@NotNull Collection<@NotNull File> files) throws IOException
-
Parameters:
-
files
- A collection ofFile
objects to be processed. Typically, these files are of the same type (e.g., images) and are grouped based on similarity.
-
-
Returns:
- A
Map
where the key is a file considered the "original" in a group of similar files, and the value is a set of files considered duplicates or similar files.
- A
-
Throws:
-
NullPointerException
- If the input collection containsnull
or isnull
. -
IOException
- If any I/O error occurs during processing.
-
-
Purpose: This method processes the input collection of files through the following steps:
- Initial Division: Files are divided into subsets based on a distinction predicate.
- Algorithm Application: A series of algorithms is applied to the subsets to refine the grouping further.
- Original File Identification: The first file in each group is identified as the "original", and the groups are reorganized accordingly.
private Set<Set<File>> algorithmsApplication(@NotNull Set<Set<File>> groupedFiles) throws IOException
-
Parameters:
-
groupedFiles
- A set of sets of files, where each set represents a group of similar files.
-
-
Returns:
- A new set of sets of files after applying all algorithms and consolidating the groups.
-
Throws:
-
IOException
- If any error occurs during the algorithm application.
-
-
Purpose: This method applies each algorithm in the
algorithms
collection to the grouped files and consolidates the results by merging groups with identical keys and removing groups with only one file.
private <T> Map<T, Set<File>> applyAlgorithm(@NotNull Algorithm<T> algorithm, @NotNull Set<Set<File>> groupedFiles)
-
Parameters:
-
algorithm
- TheAlgorithm
to apply the grouped files. -
groupedFiles
- A set of sets of files to process with the algorithm.
-
-
Returns:
- A
Map
where the key is the characteristic (e.g., perceptual hash or CRC32 checksum) and the value is a set of files sharing that characteristic.
- A
-
Purpose: This method applies a single algorithm to the grouped files and returns a map of results.
private Set<Set<File>> postAlgorithmConsolidation(@NotNull Map<?, Set<File>> algorithmOutput)
-
Parameters:
-
algorithmOutput
- A map containing the results of the algorithm application, where the key is a shared characteristic and the value is a set of files that share that characteristic.
-
-
Returns:
- A set of sets of files after consolidating the results by removing groups with only one file and merging groups with identical keys.
-
Purpose: This method consolidates the results of an algorithm by eliminating groups that contain only one file and merging groups with identical keys.
private Map<File, Set<File>> originalDistinction(@NotNull Set<Set<File>> groupedFiles)
-
Parameters:
-
groupedFiles
- A set of sets of files representing groups of similar files.
-
-
Returns:
- A new
Map
where:- The key is the "original" file (the first file in each group).
- The value is a
Set
of files considered duplicates or similar files.
- A new
-
Throws:
-
NullPointerException
- IfgroupedFiles
containsnull
.
-
-
Purpose: This method identifies the "original" file in each group and reorganizes the groups into a map, where each key is the original file and each value is a set of similar files (including the original file itself).
private Set<File> consolidate(@NotNull Set<File> s1, @NotNull Set<File> s2)
-
Parameters:
-
s1
- The first set to merge. -
s2
- The second set to merge.
-
-
Returns:
- A new set containing all elements from both
s1
ands2
.
- A new set containing all elements from both
-
Purpose: This method merges two sets into one, ensuring that all elements from both sets are included.
- The
Processor
class uses aLogger
instance (logger
) from the SLF4J API to log messages during the various stages of file processing. For example, it logs the start of processing, division of files, application of algorithms, and the identification of original files.
Grouper grouper = new Crc32Grouper();
List<Algorithm<?>> algorithms = List.of(new PerceptualHash(), new PixelByPixel());
Processor processor = new Processor(grouper, algorithms);
Collection<File> files = List.of(new File("image1.jpg"), new File("image2.jpg"));
Map<File, Set<File>> result = processor.process(files);
result.forEach((original, duplicates) -> {
System.out.println("Original: " + original);
duplicates.forEach(duplicate -> System.out.println(" Duplicate: " + duplicate));
});