-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Simplex tree] Generalization of filtration values in Simplex_tree #1122
base: master
Are you sure you want to change the base?
[Simplex tree] Generalization of filtration values in Simplex_tree #1122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
* If the filtration is 1-critical and totally ordered (as in one parameter persistence), the union is simply | ||
* the minimum of the two values. If the filtration is 1-critical, but not totally ordered (possible for | ||
* multi-persistence), than the union is also the minium if the two given values are comparable and the | ||
* method should throw an error if they are not, as a same simplex should not exist at those two values. | ||
* Finally, if the filtration is k-critical, FiltrationValue should be able to store an union of values and this | ||
* method adds the values of @p f2 in @p f1 and removes the values from @p f1 which are comparable and greater than | ||
* other values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer describing the general case (k-critical), and then say that if the type cannot represent this result (limited to 1-critical), it should throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the formulation better now?
@@ -1948,8 +2024,8 @@ class Simplex_tree { | |||
* than it was before. However, `upper_bound_dimension()` will return the old value, which remains a valid upper | |||
* bound. If you care, you can call `dimension()` to recompute the exact dimension. | |||
*/ | |||
bool prune_above_filtration(Filtration_value filtration) { | |||
if (std::numeric_limits<Filtration_value>::has_infinity && filtration == std::numeric_limits<Filtration_value>::infinity()) | |||
bool prune_above_filtration(const Filtration_value_rep filtration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the multiparameter case (partial order), I wonder what makes the most sense: removing everything that is larger than something, or removing everything that is not smaller than something? (In 1d the difference only matters for NaN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The less risky would be to remove everything larger than something as those have to be pruned for sure.
A compromise would be to remove everything larger and to set to infinity all parameters of a filtration value whose value is larger.
src/Simplex_tree/include/gudhi/Simplex_tree/Simplex_tree_siblings.h
Outdated
Show resolved
Hide resolved
…b.com:hschreiber/gudhi-devel into simplex_tree_filtration_value_generalization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments (on everything but the crucial issue of naming the 2 functions 😉 )
if (!(has_children(res_insert.first))) { | ||
res_insert.first->second.assign_children(new Siblings(curr_sib, *vi)); | ||
} | ||
res_insert = insert_node_<false, true, false>(curr_sib, *vi, filtration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first reaction is that insert_node_<false, true, false>
is not very readable. But after looking at it for a while, I'd say ok, we can always revisit this later if we have a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree... But I found it better than having 4 methods doing more or less the same with just small variations. The other solution would have been to just keep the try_emplace
and the update_simplex_tree_after_node_insertion
in the method and letting the rest in the calling methods, but this would have lessen the point of insert_node_
. Methods like insert_simplex_raw
are so much more readable now.
If someone has a nicer solution, they are welcome!
// dimension may need to be lowered | ||
dimension_to_be_lowered_ = true; | ||
return true; | ||
//if filt and simplex.second.filtration() are non comparable, should return false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1-parameter, the old behavior is to remove (return true) a simplex with filtration value NaN. Do we really want to change that?
What is the main motivation
- matching the concept ?
- keeping NaN
- preparing for multiparameter (but the version David described requires more work, at least in the k-critical case, so we will likely disable this function in the beginning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to match the multi-parameter case and I forgot somehow about the NaN case... If we disable this for multi-parameter instead, I would roll back the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a condition to includes the NaNs also. If you prefer that we disable this methods for any filtration value which is not an arithmetic native type, I will just put back the old conditions. Otherwise, I would add in the doc of prune_above_filtration
that operator<
is used to judge if a simplex should be removed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"arithmetic native type" is not the right condition. It is perfectly fine to use this function when the filtration value type is CGAL::Lazy_exact_nt
or mpq_class
or any other non-native scalar type.
The problematic case is multiparameter (which in the other branch can be tested with something like Options::is_multi_parameter
), where the code will need heavier modifications (according to David's description of what the function should do, for instance), and thus it is safer to disable the function for now.
The main drawback of the old code is that it uses <=
which isn't currently part of the concept (discussed in another comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, "arithmetic native type" is not the right condition, but I don't see how to discriminate the "right" classes from the "wrong" classes otherwise (I can imagine that someone can come up with other weird cases for filtration other than multi-peristence). In particular, if the reason we want to forbid multi-persistence is that the condition "<" is not always the right choice. "<" is not wrong per-se for multi-persistence, we are just not sure if it is the usual use case for it, if there is one.
So, I was just suggesting that we either explicitly forbid it for everything non trivial or that we allow it for everyone, but we highlight in the documentation, that "above" means "operator<" and if it is not what the user wants, he shouldn't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forbidding non-builtin Filtration_value would be a regression. It should remain possible to use for instance CGAL::Gmpq
, it represents a subset of reals and is perfectly fine for 1-parameter filtrations. Trying to list of options
- We could explain in the changelog that people need to implement unify/intersect functions for their code to keep working (the doc already says it).
- We could remove the restriction to arithmetic from the default unify/intersect functions. Now only users who want something different from min/max (typically for multipersistence) need to provide their own implementation (which C++ will automatically pick instead of the generic one if it is more specialized).
- if
!Options::is_multi_parameter
, do not call unify/intersect but the default min/max instead (we can have a wrapper in Simplex_tree so we don't have to repeat that test for every use).
I like option 2, I think that's what you were suggesting in the comment above? If not, can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like solution 2 the best. Even though I was referring to rec_prune_above_filtration
in the comment above which does not make use of intersect/unify. But the spirit is the same.
src/Simplex_tree/include/gudhi/Simplex_tree/simplex_tree_options.h
Outdated
Show resolved
Hide resolved
src/Simplex_tree/include/gudhi/Simplex_tree/simplex_tree_options.h
Outdated
Show resolved
Hide resolved
src/Simplex_tree/include/gudhi/Simplex_tree/simplex_tree_options.h
Outdated
Show resolved
Hide resolved
/** | ||
* @private | ||
* @brief Given two filtration values, stores in the first value the greatest common upper bound of the two values. | ||
* If a filtration value has value `NaN`, it should be considered as the lowest value possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? Ah, that's what make_filtration_non_decreasing expects (IIRC we only added a little bit of support for NaN in Simplex_tree to help Alpha_complex). Well actually, NaN means "not set" there (so it should be overridden by anything), which happens to be (almost?) equivalent to -inf in this context, but if we wanted to handle it in unify_births it might correspond to +inf in that context.
I wonder if we should try to formalize a bit what NaN means in the Simplex_tree before extending it to more than a hack in make_filtration_non_decreasing? On the other hand, as long as we don't document the behavior publicly, we can still change it later.
I think in the current uses only f1 can be a NaN (in case we decide to give an asymmetric definition...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was just to comply to what make_filtration_non_decreasing
expected.
NaN meaning "not set" makes sense to me. Otherwise, I am not sure what other use NaN could have when not "there is a problem here".
But you are right, we should decide, if we want to handle NaN in general or not and how (except if we keep everything in the backend as for now and in general, we just pretend that NaN is not possible). For example, unify_birth
does not handle NaN, because we didn't before, but if we introduce NaN to push_to_smallest_common_upper_bound
because needed in make_filtration_non_decreasing
, it looks a bit inconsistent for an "outside use".
* its subsimplices of same filtration value) provides an indexing scheme | ||
* (see IndexingTag). | ||
*/ | ||
struct FiltrationValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sometimes wonder if we should rename it to FiltrationValueForSimplexTree. But the name of concepts is not that important, it can always be changed later since it does not appear in users' code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concepts of FiltrationValue are not the same for other classes, it would make sense to rename it for the long run. But I have the feeling that for now, everything is more or less complying with the needs of the simplex tree, as it is mostly used as double
.
src/Simplex_tree/include/gudhi/Simplex_tree/simplex_tree_options.h
Outdated
Show resolved
Hide resolved
// dimension may need to be lowered | ||
dimension_to_be_lowered_ = true; | ||
return true; | ||
//if filt and simplex.second.filtration() are non comparable, should return false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forbidding non-builtin Filtration_value would be a regression. It should remain possible to use for instance CGAL::Gmpq
, it represents a subset of reals and is perfectly fine for 1-parameter filtrations. Trying to list of options
- We could explain in the changelog that people need to implement unify/intersect functions for their code to keep working (the doc already says it).
- We could remove the restriction to arithmetic from the default unify/intersect functions. Now only users who want something different from min/max (typically for multipersistence) need to provide their own implementation (which C++ will automatically pick instead of the generic one if it is more specialized).
- if
!Options::is_multi_parameter
, do not call unify/intersect but the default min/max instead (we can have a wrapper in Simplex_tree so we don't have to repeat that test for every use).
I like option 2, I think that's what you were suggesting in the comment above? If not, can you clarify?
* @param ignore_infinite_values If true, simplices with filtration values at infinity are not cached. | ||
*/ | ||
template<typename Comparator> | ||
void initialize_filtration(Comparator&& is_before_in_filtration, bool ignore_infinite_values = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "ignore_infinite_values" enough, or do we want to take the chance to allow some arbitrary predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could also use some arbitrary predicate. Some method telling us if a simplex should be ignored or not. Would it replace the ignore_infinite_values
version or add another overload with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. initialize_filtration(bool ignore_infinite_values=false)
is already in a release, so we shouldn't break it. But the new overload that takes a comparator is for advanced users and already takes a functor, so it shouldn't hurt much to pass it a lambda as second argument. So I would tend to say "replace", not "add". But I am open to other choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I replaced the second parameter with a functor in the new overload of initialize_filtration
and also added some unit tests to make sure it works as intended.
@@ -2595,17 +2676,15 @@ class Simplex_tree { | |||
if (members_size > 0) { | |||
if constexpr (!Options::stable_simplex_handles) sib->members_.reserve(members_size); | |||
Vertex_handle vertex; | |||
Filtration_value filtration; | |||
Filtration_value filtration(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of the multi-parameter filtration values is {-inf}
not {}
. It seemed more appropriate here to have an empty vector. But I also didn't really verified what deserialize_trivial
does, or more precisely, how memcpy
works for vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes, an empty vector is good, since the only thing we will do with it is overwrite it. That constructor is confusing though. When I see Filtration_value(n)
, I don't really expect {-inf,-inf,...,-inf} n times. For plain double
, Filtration_value filtration;
(without (0)
) is nicer because it gives sanitizers a chance to warn if there is a case where we forget to assign a value.
But ok for now, we can rediscuss that in the multiparameter branch.
Actually, this seems to be in the deserialization function, does that work in the multiparameter branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now another reason why I did this. Sometimes filtration
does not get initialized by default with 0 exactly, but with just a very small value close to 0. And then, the constructor of Filtration_simplex_base_dummy
complains and throws in Debug mode.
Actually, this seems to be in the deserialization function, does that work in the multiparameter branch?
I naively though that it does as the methods looked quite general, but it does not... So I added some methods to the FiltrationValue concept (saying that here are only necessary in the case of de/serialization) and also some tests in simplex_tree_serialization_unit_test.cpp
to make it work.
* @brief Given two filtration values, stores in the first value the lowest common upper bound of the two values. | ||
* The overload for arithmetic types like `double` or `int` is already implemented as the maximum of the two | ||
* given values and can also be used for non native arithmetic types like `CGAL::Gmpq` as long as it has an | ||
* `operator<`. The overload is available with @ref Gudhi::intersect in @ref simplex_tree_options.h "". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mention simplex_tree_options.h here
- the name of the file does not match the content (the 2 functions look like they could have their own header, although it doesn't matter if we don't advertise their location too much)
- users should
#include <Gudhi/Simplex_tree.h>
if they want this function, they should not directly include simplex_tree_options.h which is an implementation detail. Doxygen "forces" us to show more details than desirable, but let's limit the damage where possible.
Making the doc of the functions public means that the NaN behavior is documented now. I hope this won't come back to bite us in the [rear end]...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the mention of the file. For the NaN behaviour, I could take it out of the description and put it aside in a simple //
comment which will be ignored by doxygen. Otherwise, I did not found a way to mix public and private with doxygen for a same description.
…version of initialize_filtration + change is_floating_point to has_quit_NaN for more generality + additional unit tests
* @param ignore_simplex Method taking a simplex handle as input and returns true if and only if the corresponding | ||
* simplex should be ignored in the filtration and therefore not be cached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could remind users here that it is their responsibility to ensure that the result is a filtration, i.e. when a face of a simplex is ignored, the simplex should be ignored as well.
Co-authored-by: Marc Glisse <marc.glisse@inria.fr>
Enables
Filtration_value
to be something else thandouble
/float
in the simplex tree. In particular, is thought as a preparation for the multi simplex tree of PR #976, where the filtration value will be entirely new classes.