Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize clump #656

Open
kordejong opened this issue Jun 7, 2024 · 1 comment
Open

Optimize clump #656

kordejong opened this issue Jun 7, 2024 · 1 comment

Comments

@kordejong
Copy link
Member

Clump contains a serial step to stitch local clumps, determined in parallel, together. Part of this serial steps is the most expensive step of the whole algorithm, and it prevents good performance and scalability. Revisit the code and try to make this step less expensive.

@kordejong
Copy link
Member Author

Use the fact that when comparing checking whether a collection of global clump IDs is shared / overlaps with the collection used in neighbouring partitions, we can stop comparing once there is not overlap. We don't have to compare each collection with each other collection. More distant collections are more likely to not contain clumps that should be merged with clumps in the current partition. Strategy is to decrease the number of times sets need to be compared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant