-
Notifications
You must be signed in to change notification settings - Fork 0
4. Extending the Library
The library has been designed with extensibility in mind, allowing developers to introduce custom components to modify or extend its behavior. The key areas for extension include:
- Grouping files using a custom
Grouper
- Defining custom algorithms with the
Algorithm
interface - Applying custom file filters using the
FilePredicate
interface
The Processor
class orchestrates the workflow, relying on these extensible components (Grouper
, Algorithm
), where FileOperator
class manages pre-validation process and rely on FilePredicate
component.
The Grouper
interface allows grouping a collection of files into subsets that share a specific characteristic (e.g., files with the same checksum).
@FunctionalInterface
public interface Grouper {
Set<Set<File>> divide(Collection<File> col) throws IOException;
}
How to Extend:
Implement the Grouper
interface to define a custom grouping strategy. For example, if you want to group files based on their size, you can implement:
public class SizeGrouper implements Grouper {
@Override
public Set<Set<File>> divide(Collection<File> col) throws IOException {
return col.stream()
.collect(Collectors.groupingBy(File::length))
.values().stream()
.map(HashSet::new)
.collect(Collectors.toSet());
}
}
Usage:
Pass your custom Grouper
to the Processor
:
Grouper grouper = new SizeGrouper();
Set<Algorithm<?>> algorithms = Set.of(new PerceptualHash());
Processor processor = new Processor(grouper, algorithms);
The Algorithm
interface allows you to refine previously grouped files further. It maps files into subsets based on a shared characteristic, producing a key-value map, where the key is a characteristic, and the value is a set of files.
@FunctionalInterface
public interface Algorithm<K> {
Map<K, Set<File>> apply(Set<File> group);
}
How to Extend:
Implement the Algorithm
interface to define your custom processing logic. For example, grouping files based on their name hash:
public class NameHashAlgorithm implements Algorithm<Integer> {
@Override
public Map<Integer, Set<File>> apply(Set<File> group) {
return group.stream()
.collect(Collectors.groupingBy(file -> file.getName().hashCode(),
Collectors.toSet()));
}
}
Usage:
Add your custom algorithm to the Processor
:
Grouper grouper = new Crc32Grouper();
Algorithm<Integer> nameHashAlgorithm = new NameHashAlgorithm();
Processor processor = new Processor(grouper, List.of(nameHashAlgorithm));
The FilePredicate
interface is a functional interface for applying custom validation or filtering logic to files. Unlike the standard Predicate
, it allows for IOException
.
@FunctionalInterface
public interface FilePredicate {
boolean test(File file) throws IOException;
}
How to Extend:
You can implement the FilePredicate
interface to validate files based on specific criteria. For example, ensuring that files are readable and non-empty:
public class ReadableNonEmptyFilePredicate implements FilePredicate {
@Override
public boolean test(File file) throws IOException {
return file.canRead() && file.length() > 0;
}
}
Usage:
The FilePredicate
is used internally by the FileValidator
class, which is used internally by the FileOperator
class, to check file validity during processing. By providing custom predicates, you can control which files are accepted or filtered out.
Example usage with FileOperator
:
FilePredicate predicate = new ReadableNonEmptyFilePredicate();
FileOperator fileOperator = new FileOperator(predicate, Integer.MAX_VALUE);
List<File> validFiles = fileOperator.load(Arrays.asList(
new File("example1.txt"),
new File("example2.txt")
));
System.out.println("Number of valid files: " + validFiles.size());
While the Processor
class orchestrates the workflow of file grouping and algorithm application, its core behavior is not directly modifiable. However, by providing custom Grouper
, or Algorithm
implementations, you can customize the processing pipeline.
Workflow Recap:
-
Initial Grouping: File are grouped using a
Grouper
. - Algorithm Application: Custom algorithms further refine file subsets.
- Original File Identification: The first file returned by the iterator of a set is treated as the 'original' file.
By extending the key interfaces (Grouper
, Algorithm
), you gain complete control over the grouping logic, and refinement steps.
public class PlainTest {
public static void main(String[] args) throws IOException {
// Step 1: Define custom predicate for file validation
FilePredicate predicate = new ReadableNonEmptyFilePredicate();
FileOperator fileOperator = new FileOperator(predicate, Integer.MAX_VALUE);
// Step 2: Load files and filter using the predicate
List<File> files = fileOperator.load(Arrays.asList(
new File("file1.jpg"),
new File("file2.jpg")
));
// Step 3: Define Grouper and Algorithm for processing
Grouper grouper = new SizeGrouper(); // Group files by size
Algorithm<Integer> algorithm = new NameHashAlgorithm(); // Refine groups by name hash
// Step 4: Create Processor with custom Grouper and Algorithm
Processor processor = new Processor(grouper, List.of(algorithm));
// Step 5: Process the files and identify duplicates
Map<File, Set<File>> result = processor.process(files);
// Step 6: Print the results
result.forEach((original, duplicates) -> {
System.out.println("Original: " + original);
duplicates.forEach(duplicate -> System.out.println(" Duplicate: " + duplicate));
});
}
}
By leveraging the flexible design of the library, you can build tailored workflows for various file-processing needs. Whether it’s grouping files, applying advanced algorithms, or validating inputs, the extensible components provide all the necessary tools.