Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DisjointSets: template for extension? #911

Closed
jacob-roth opened this issue Sep 6, 2024 · 3 comments
Closed

DisjointSets: template for extension? #911

jacob-roth opened this issue Sep 6, 2024 · 3 comments

Comments

@jacob-roth
Copy link

jacob-roth commented Sep 6, 2024

I have an application where I need to extend the functionality of IntDisjointSets. In one use case, I need to handle the following:

  1. track the min and max element of each set
  2. track the average of elements in each set

My first thought was to allocate additional arrays for min, max, count, and avg and then modify root_union! in the following way, based on 298:

function root_union!(s::IntDisjointSet{T}, x::T, y::T) where {T<:Integer}
    parents = s.parents
    rks = s.ranks
    # start modifications
    count = s.count
    mn = s.min
    mx = s.max
    avg = s.avg
    # end modifications
    @inbounds xrank = rks[x]
    @inbounds yrank = rks[y]

    if xrank < yrank
        x, y = y, x
    elseif xrank == yrank
        rks[x] += one(T)
    end
    @inbounds parents[y] = x

    # start modifications
    cx = count[x]
    cy = count[y]
    cxy = cx + cy
    @inbounds mn[x] = min(mn[x], mn[y])
    @inbounds mx[x] = max(mx[x], mx[y])
    @inbounds avg[x] = cx/cxy * avg[x] + cy/cxy * avg[y]
    @inbounds count[x] = cxy
    # end modifications

    s.ngroups -= one(T)
    return x
end

I was wondering (1) if there might be any optimizations to this code and (2) if it could be useful for having a template indicating how to best modify the functionality of disjoint sets? My actual use case is slightly more complicated than this, but if this is a "good" template for min/max/avg, then I will follow it for my use case.

@eulerkochy
Copy link
Member

eulerkochy commented Sep 6, 2024

  1. One minor optimisation I can think of, is keeping the sum and count, and calculate average out of it. Will help to reduce precision errors.
  2. Not sure, if I fully understand what you mean by "template" in this context, but we can sort of have a merge / union function of two DisjointSubset, which can be overloaded. The base merge/union function can do the re-ranking, while nuanced functions can handle (commutative?) operations.

@jacob-roth
Copy link
Author

jacob-roth commented Sep 6, 2024

Ah good call about sum/count for precision, thanks! In terms of template, what I meant to ask was: should I write a new root_union! function to overload? Or what do you mean about the nuanced functions?

I guess if I'm tracking size, I can do union-by-size and eliminate rank allocations and update the root_union! accordingly. Do you know if there was a reason rank was chosen over size initially (besides being based on Boost per 297)?

Also, another question I have is related to an earlier discourse post of mine: would implementing path-compression + splitting/halving be something worth considering?

Maybe I'll close this issue and put the path-compression+splitting question in another issue.

@eulerkochy
Copy link
Member

Hi @jacob-roth, sorry missed your reply.

I answered your question on Discourse. Yes, go ahead for a PR implementing path-compression. Make sure to benchmark it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants