Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortedSet confusion with keys that sort (but not compare) the same #528

Closed
maleadt opened this issue Sep 12, 2019 · 4 comments
Closed

SortedSet confusion with keys that sort (but not compare) the same #528

maleadt opened this issue Sep 12, 2019 · 4 comments

Comments

@maleadt
Copy link

maleadt commented Sep 12, 2019

I'm having some issues with SortedSet. Reading the docs, I came up with the following:

julia> struct Foobar
       foo::Int
       bar::Int
       end

julia> foo(foobar::Foobar) = foobar.foo
foo (generic function with 1 method)

julia> const ByFoo = Base.By(foo)
Base.Order.By{typeof(foo)}(foo)


julia> a = Foobar(1,2)
Foobar(1, 2)

julia> push!(set, a)
SortedSet(Foobar[Foobar(1, 2)],
Base.Order.By{typeof(foo)}(foo))

julia> b = Foobar(1, 3)
Foobar(1, 3)

julia> push!(set, b)
SortedSet(Foobar[Foobar(1, 3)],
Base.Order.By{typeof(foo)}(foo))

So when using this ordering, for which two objects a and b sort identically, the SortedSet overwrites a when inserting b... This seems counter-intuitive, and not mentioned in the docs?

Anyhow, on a second read-through I noticed some details I missed (but IIUC do not explain the above) about Base.lt and DataStructures.eq, so I tried tweaking my comparator to have a proper egality function:

julia> DataStructures.eq(::typeof(ByFoo), a, b) = a == b

julia> DataStructures.eq(ByFoo, a, b)
false

# keys are different!


julia> set = SortedSet{Foobar}(ByFoo)
SortedSet(Foobar[],
Base.Order.By{typeof(foo)}(foo))

julia> push!(set, a)
SortedSet(Foobar[Foobar(1, 2)],
Base.Order.By{typeof(foo)}(foo))

julia> push!(set, b)
SortedSet(Foobar[Foobar(1, 2), Foobar(1, 3)],
Base.Order.By{typeof(foo)}(foo))

julia> # yay!

julia> haskey(set, b)
true

julia> haskey(set, a)
false

julia> # booo

julia> delete!(set, a)
ERROR: KeyError: key Foobar(1, 2) not found

This seems even worse, especially I can actually get a and b by iterating the set!

Did I confuse SortedSet by defining an invalid comparator? Either way, maybe I've missed something but I think the docs could use some elaboration here.

@StephenVavasis
Copy link
Contributor

SortedSet has the property that if two keys a and b are identical according the sort ordering, and first a is pushed then b, then indeed b will overwrite a. I believe this is what you observed. You are correct that this behavior is not documented, and perhaps the documentation should be clarified. This behavior is the only reasonable behavior when you consider that SortedSet is implemented as a SortedDict in which the value field is nothing. In the case of SortedDict, it is more apparent that the new (k,v) pair needs to overwrite the old (k,v) pair.

Defining an equality comparison is not necessary (and does not change the behavior in the last paragraph), but may improve performance as noted in the docs. However, the equality comparison must be consistent with the lt comparison. In other words, it must be the case that eq(a,b) if and only if !(a<b) and !(b<a). It appears that the equality comparison that you defined is not consistent with the lt comparison, so you are breaking the rules for key ordering and therefore the data structure does not work properly.

@maleadt
Copy link
Author

maleadt commented Sep 12, 2019

This behavior is the only reasonable behavior when you consider that SortedSet is implemented as a SortedDict in which the value field is nothing. In the case of SortedDict, it is more apparent that the new (k,v) pair needs to overwrite the old (k,v) pair.

It wasn't to me because even though the keys have the same ordering, isequal/== returned false. Of course, I don't have experience with how SortedDict is implemented, so it probably makes sense.

if and only if

I suspected as much, but missed it in the docs...

Thanks for the quick response!

@StephenVavasis
Copy link
Contributor

I can try to improve the docs on this topic, but I'm not sure how. I see some of the docs in-line with the code, other docs in the github directory docs/src, and then there is also documentation on github.io linked from the data structures home page that has a link labeled "edit on github". Could someone who understands how the docs are laid out explain the correct way to create a PR for documentation corrections?

@StephenVavasis
Copy link
Contributor

Closed via #787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants