Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor distances between geometries #98

Open
juliohm opened this issue Mar 30, 2021 · 12 comments
Open

Refactor distances between geometries #98

juliohm opened this issue Mar 30, 2021 · 12 comments
Labels
enhancement New feature or request

Comments

@juliohm
Copy link
Member

juliohm commented Mar 30, 2021

Currently we have a single evaluate(distance, g1, g2) but as pointed out by @lassepe over Zulip, we could refactor this into at least two types of distances between geometries: mindistance and maxdistance. We could also consider Hausdorff distances in the future.

I think we can preserve evaluate(distance, p1, p2) where p1 and p2 are points, and also provide the colwise and pairwise variants like in Distances.jl.

@juliohm juliohm added the enhancement New feature or request label Mar 30, 2021
@serenity4
Copy link
Member

serenity4 commented Mar 30, 2021

What should maxdistance do? Return the distance between a point and e.g. the farthest part of another geometry? Or is it intended for a particular set of metrics (non-Euclidean for example)?

Also, from a naming perspective, why was evaluate chosen instead of simply distance?

@serenity4
Copy link
Member

For example, I could have seen distance(metric, p1, p2), or am I wrong about the use of metric and distance?

@juliohm
Copy link
Member Author

juliohm commented Mar 30, 2021

What should maxdistance do? Return the distance between a point and e.g. the farthest part of another geometry? Or is it intended for a particular set of metrics (non-Euclidean for example)?

Exactly. The distance between the point and the farthest point in the geometry.

Also, from a naming perspective, why was evaluate chosen instead of simply distance?

I tried to follow the Distances.jl API so that whenever we pass points as inputs we can call evaluate, colwise and pairwise with any of the distances in that package.

@lassepe
Copy link
Contributor

lassepe commented Mar 30, 2021

Here are two cents from my perspective:

  1. The correct way to implement the Distances seems to be to add a dispatch to the functor for the metric, rather than Distances.evaluate. By this means, the generic fallbacks for evaluate and pairwise et al should work out of the box.

  2. I would vote for only using the metric types from Distances but not implement their distance API for anything beyond Meshes.Point because I would consider it function piracy to add dispatches beyond vector-like types. Instead, it seems cleaner to have package internal mindistance and maxdistance that share the metric types with Distances.

  3. It seems natural to have something like closest that returns a NamedTuple of the farthest point and distance since both pieces of information are naturally computed in one pass. Then mindistance can just fallback to a call of that function and discard the point.

@juliohm
Copy link
Member Author

juliohm commented Mar 30, 2021

All very good points @lassepe. Can you submit a PR to each item separately so that we can more easily review and brainstorm? That would be great. There are lot of breaking changes, and we want to make sure we are not missing anything in the transition.

@lassepe
Copy link
Contributor

lassepe commented Apr 6, 2021

In view of @juliohm's comment in #108

Regarding the iterator interface for points, we can't add it. A point is not the same thing as its coordinates in a coordinate reference system. A tuple of coordinates can be seen as iterable, but a point is an abstract concept with multiple types of coordinates.

I would argue that we should not implement Distances.evaluate at all for Point. The README explicitly states that Distances.evaluate et al. is meant to be called on iterators or vectors. In fact, the design of the package makes it hard to properly add additional distances because much of the functionality is created using @eval to add dispatches for concrete types. Thus, the only really clean way to overload the functionality properly is by also using @eval in our package, e.g.

for M in (Distances.metrics..., Distances.weightedmetrics...)
    @eval @inline (dist::$M)(a, b) = dist(coordinates(a), coordinates(b))
end

I think this is not desirable because the tuples metrics and weightedmetrics are not exportet and thus wouldn't consider them to be subject to SemVer. Julia 1.5 admits functors on abstract types so we we could have

(dist::PreMetric)(a::Point, b::Point) = dist(coordinates(a), coordinates(b))

However, this inevitably creates an ambiguity error since the @eval code in Distances.jl creates functors for concrete subtypes of PreMetrics but with the abstract Any argument type.


Therefore, I would suggest to deprecate Distances.evaluate(::PreMetric, ::Point, ::Point) all together and instead use mindistance for points as well. By this means, there is one consistent function to be used with Union{Geometry,Point} and we don't end up with a partially overloaded interface to Distances.jl. We would still use their metric types and mindistance should call into Distances.evaluate to compute the distances for different metrics.

@juliohm
Copy link
Member Author

juliohm commented Apr 6, 2021

Instead of thinking in terms of distance metrics, maybe we should think in terms of the underlying geometric objects involved. For example, the minimum distance between two geometries is the length of the arc connecting these geometries. In a Euclidean space, this arc is a Segment and the length is the Euclidean distance between the segment endpoints. If we were on a spherical domain, this segment would be replaced by a more general arc and the length would be the arc length.

Based on this viewpoint, we should consider addressing this issue in steps:

  1. Introduce a shortestarc(point, geometry; metric=Euclidean()) function that connects the point to the geometry with the shortest arc (e.g. Segment) according to the metric.
  2. Deprecate evaluate methods involving geometries in favor of a more explicit call to measure(shortestarc(point, geometry)).
  3. Possibly add helper functions to the Euclidean space case

What do you think?

@serenity4
Copy link
Member

serenity4 commented Apr 6, 2021

I like the idea, but there are situations where such an arc isn't unique. I don't know if it would be guaranteed to exist either.

In general, I am quite uncomfortable with the idea of discussing mindistance and maxdistance as if both were distances. Fundamentally, a distance (or metric, it's the same) is expressed between points. Now, we can surely extend the notion of distance to be applicable for point/set and set/set pairs, but even there as you pointed out we can define several extensions: a distance between a point and the nearest point of a set, a distance between a point and the farthest point of a set, and the distance between two sets by considering nearest or farthest point pairs. What I mean is, fundamentally you just want a function to compute distances between points and functions to compute the nearest/farthest point/point pairs between a point/set and a set. And, as I saw mentioned in Zulip, if you want the distance between a point and the centroid of a geometry, you compute first the centroid then the distance on the resulting points instead of trying to define a distance directly.

We can then have something like my_distance(p::Point, g::Geometry; metric=Euclidean())::Number = my_distance(p, nearest(p, g; metric)::Point; metric) (replace nearest by farthest/centroid depending on your use case). Even there a distance is just an alias for a metric really. So, we can be more concise if we let the user do something like

my_distance = Euclidean()
my_distance(p, nearest(p, g, distance))

The drawback is, we lose the default option for Euclidean. But I believe the specification of the type of metric we use for geometries to be much more fundamental than just querying distances after some geometry computations, and I am ready to bet that there are places where we hardcode (even implicitly) distance computations with a euclidean metric (I'd be pleasantly surprised to be wrong). As a whole, I'd say that we need to chose a metric space before doing geometry processing, not just at one point to query distances. What I'm trying to say is that I see this more as a global/package option of some kind. Let's keep that for another time though.

@juliohm
Copy link
Member Author

juliohm commented Apr 6, 2021

Very good thoughts as usual @serenity4 and I fully support the idea of encoding somehow the metric space as a type that we can pass to algorithms. I wonder if we should open a separate issue to discuss this metric space representation before we come back to this issue on distances. We could also discuss it here directly since distances and metric spaces are intimately connected.

I started thinking about the nearest point function above, then switched to thinking about arcs. You think it is better if we stick to nearest points and then compute distances between points only as opposed to measures of geometries? Can you elaborate on the corner cases where shortest arcs fall apart (non-unique)? We should really dive into this exotic metric spaces, or at least the spherical one in the near future to achieve a flexible design in the project. It is a challenging task.

@serenity4
Copy link
Member

serenity4 commented Apr 6, 2021

Can you elaborate on the corner cases where shortest arcs fall apart (non-unique)?

In the picture I linked above, if you take a Manhattan/taxicab metric, you have multiple shortest paths (all red/orange/blue paths yield the same distance). It's a figure on a discrete domain, but in the continuous case it's the same; just that you have an infinite number of shortest paths.
You can also take the example of antipodal points on a sphere using the Haversine distance. Their distance is unique (pi * R), but you can take any arc that joins the two poles that will realize this distance.

Perhaps you could first compute one possible shortest arc (even if multiple may exist) according to a heuristic, and take the measure of that arc to return a distance; I think that's doable. But maybe we'd have more direct formulas rather than trying to figure out this arc in the first place?

I am also wondering whether we should use Distances.jl or reimplement metrics. The library has been defined to work very well for computations with iterators as was pointed out in this issue (it's like most of the library really), and (correct me if I'm wrong) we don't need that. The small effort we would spend in reimplementing their core types/formulas for points could be balanced out with not having to interface with this external library and having plain control over the type hierarchy.
At some point, if there is a need, we could provide an extension with Requires.jl to make our distances work with pairwise/colwise from Distances.jl via converting our metric types to their metrics.

I wonder if we should open a separate issue to discuss this metric space representation before we come back to this issue on distances.

From my point of view, having a way to define a metric space first is the more natural way. Once we have access to a metric, no need to ask for a metric parameter for evaluating distances. If one wants to do geometry processing in Euclidean space and then compute e.g. a spherical distance at some time, he/she should just construct another metric (e.g. from Distances.jl or Meshes.jl if we reimplement them) and use it directly.

I think it's best to open a separate issue, for there will surely be some brainstorming required.

@lassepe
Copy link
Contributor

lassepe commented Apr 6, 2021

FWIW: I ended up drafting the nearest feature in #113 (there called closest_point) and while doing so I also realized that even the nearest point is not necessarily unique (which of course is obvious as soon as one considers non-convex geometries).

@serenity4
Copy link
Member

serenity4 commented Apr 6, 2021

Yes, that's not going to be easy. If we just want distances it's fine, but otherwise it is more involved. Hopefully we'll be able to rely on primitives for the most part where solutions are easy to find. Just as a quick note for later, we may want to define something like nearest_any and nearest_all when there is more than one solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants