Skip to content

4. Extending the Library

Maksymilian edited this page Dec 9, 2024 · 4 revisions

Extending the Library

The library has been designed with extensibility in mind, allowing developers to introduce custom components to modify or extend its behavior. The key areas for extension include:

  • Grouping files using a custom Grouper
  • Defining custom algorithms with the Algorithm interface
  • Applying custom file filters using the FilePredicate interface

The Processor class orchestrates the workflow, relying on these extensible components (Grouper, Algorithm), where FileOperator class manages pre-validation process and rely on FilePredicate component.


Custom Grouper Implementation

The Grouper interface allows grouping a collection of files into subsets that share a specific characteristic (e.g., files with the same checksum).

@FunctionalInterface
public interface Grouper {
    Set<Set<File>> divide(Collection<File> col) throws IOException;
}

How to Extend:

Implement the Grouper interface to define a custom grouping strategy. For example, if you want to group files based on their size, you can implement:

public class SizeGrouper implements Grouper {
    @Override
    public Set<Set<File>> divide(Collection<File> col) throws IOException {
        return col.stream()
            .collect(Collectors.groupingBy(File::length))
            .values().stream()
            .map(HashSet::new)
            .collect(Collectors.toSet());
    }
}

Usage:

Pass your custom Grouper to the Processor:

Grouper grouper = new SizeGrouper();
Set<Algorithm<?>> algorithms = Set.of(new PerceptualHash());
Processor processor = new Processor(grouper, algorithms);

Custom Algorithm Implementation

The Algorithm interface allows you to refine previously grouped files further. It maps files into subsets based on a shared characteristic, producing a key-value map, where the key is a characteristic, and the value is a set of files.

@FunctionalInterface
public interface Algorithm<K> {
    Map<K, Set<File>> apply(Set<File> group);
}

How to Extend:

Implement the Algorithm interface to define your custom processing logic. For example, grouping files based on their name hash:

public class NameHashAlgorithm implements Algorithm<Integer> {
    @Override
    public Map<Integer, Set<File>> apply(Set<File> group) {
        return group.stream()
            .collect(Collectors.groupingBy(file -> file.getName().hashCode(),
                                           Collectors.toSet()));
    }
}

Usage:

Add your custom algorithm to the Processor:

Grouper grouper = new Crc32Grouper();
Algorithm<Integer> nameHashAlgorithm = new NameHashAlgorithm();
Processor processor = new Processor(grouper, List.of(nameHashAlgorithm));

Custom FilePredicate Implementation

The FilePredicate interface is a functional interface for applying custom validation or filtering logic to files. Unlike the standard Predicate, it allows for IOException.

@FunctionalInterface
public interface FilePredicate {
    boolean test(File file) throws IOException;
}

How to Extend:

You can implement the FilePredicate interface to validate files based on specific criteria. For example, ensuring that files are readable and non-empty:

public class ReadableNonEmptyFilePredicate implements FilePredicate {
    @Override
    public boolean test(File file) throws IOException {
        return file.canRead() && file.length() > 0;
    }
}

Usage:

The FilePredicate is used internally by the FileValidator class, which is used internally by the FileOperator class, to check file validity during processing. By providing custom predicates, you can control which files are accepted or filtered out.

Example usage with FileOperator:

FilePredicate predicate = new ReadableNonEmptyFilePredicate();
FileOperator fileOperator = new FileOperator(predicate, Integer.MAX_VALUE);

List<File> validFiles = fileOperator.load(Arrays.asList(
    new File("example1.txt"),
    new File("example2.txt")
));

System.out.println("Number of valid files: " + validFiles.size());

Extending Processor Behavior

While the Processor class orchestrates the workflow of file grouping and algorithm application, its core behavior is not directly modifiable. However, by providing custom Grouper, or Algorithm implementations, you can customize the processing pipeline.

Workflow Recap:

  1. Initial Grouping: File are grouped using a Grouper.
  2. Algorithm Application: Custom algorithms further refine file subsets.
  3. Original File Identification: The first file returned by the iterator of a set is treated as the 'original' file.

By extending the key interfaces (Grouper, Algorithm), you gain complete control over the grouping logic, and refinement steps.

Example: Full Custom Workflow

public class PlainTest {
    public static void main(String[] args) throws IOException {
        // Step 1: Define custom predicate for file validation
        FilePredicate predicate = new ReadableNonEmptyFilePredicate();
        FileOperator fileOperator = new FileOperator(predicate, Integer.MAX_VALUE);

        // Step 2: Load files and filter using the predicate
        List<File> files = fileOperator.load(Arrays.asList(
            new File("file1.jpg"),
            new File("file2.jpg")
        ));

        // Step 3: Define Grouper and Algorithm for processing
        Grouper grouper = new SizeGrouper();           // Group files by size
        Algorithm<Integer> algorithm = new NameHashAlgorithm(); // Refine groups by name hash

        // Step 4: Create Processor with custom Grouper and Algorithm
        Processor processor = new Processor(grouper, List.of(algorithm));

        // Step 5: Process the files and identify duplicates
        Map<File, Set<File>> result = processor.process(files);

        // Step 6: Print the results
        result.forEach((original, duplicates) -> {
            System.out.println("Original: " + original);
            duplicates.forEach(duplicate -> System.out.println("  Duplicate: " + duplicate));
        });
    }
}

By leveraging the flexible design of the library, you can build tailored workflows for various file-processing needs. Whether it’s grouping files, applying advanced algorithms, or validating inputs, the extensible components provide all the necessary tools.