Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Partition module performance #2

Open
DanGould opened this issue Sep 28, 2024 · 1 comment
Open

Improve the Partition module performance #2

DanGould opened this issue Sep 28, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@DanGould
Copy link
Contributor

Suggestions for Improvement

@nothingmuch has suggested the following improvements in the pull request #1:

  1. The Filter implementations rely on Bloom filters to skip checking negatives. However, the filters are arguably sized inappropriately, specifying a false positive rate for the size of the set, not its corresponding sumset, which may be substantially larger. This is particularly problematic with Bloom filters, as opposed to say cuckoo or quotient filters which are both resizable, and their FPR does not asymptotically approach 100% when close to capacity.

  2. An LFU cache would handle positive matches, avoiding expensive computations (e.g., is_subset_sum).

  3. If adjusting the bit vector-based powerset enumeration to use a grey code instead of 2's complement (i.e., normal integer incrementing), the sums could be adjusted by subtracting or adding one element at a time, allowing the previous sum value to be reused more easily. This might make an LRU cache a better fit than LFU for positive cases.

  4. Filter.contains is invoked repeatedly on a fixed value (left set of SumFilteredPartitionIterator).

  5. The trait object for Filter seems unnecessary, preventing monomorphisation and any optimizations that would enable, and adding indirection in fairly hot code paths.

@DanGould DanGould added the enhancement New feature or request label Sep 28, 2024
@nothingmuch
Copy link
Collaborator

FWIW the reason I didn't open an issue is (and i need to sleep on this i have zombie brain after long flight) that i think a better approach would be to refactor the iterators so that they have a simpler structure first, then think about performance improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants