Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Simplex tree] Generalization of filtration values in Simplex_tree #1122

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

hschreiber
Copy link
Collaborator

Enables Filtration_value to be something else than double / float in the simplex tree. In particular, is thought as a preparation for the multi simplex tree of PR #976, where the filtration value will be entirely new classes.

Copy link
Member

@mglisse mglisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
Comment on lines 54 to 60
* If the filtration is 1-critical and totally ordered (as in one parameter persistence), the union is simply
* the minimum of the two values. If the filtration is 1-critical, but not totally ordered (possible for
* multi-persistence), than the union is also the minium if the two given values are comparable and the
* method should throw an error if they are not, as a same simplex should not exist at those two values.
* Finally, if the filtration is k-critical, FiltrationValue should be able to store an union of values and this
* method adds the values of @p f2 in @p f1 and removes the values from @p f1 which are comparable and greater than
* other values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer describing the general case (k-critical), and then say that if the type cannot represent this result (limited to 1-critical), it should throw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the formulation better now?

src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
@@ -1948,8 +2024,8 @@ class Simplex_tree {
* than it was before. However, `upper_bound_dimension()` will return the old value, which remains a valid upper
* bound. If you care, you can call `dimension()` to recompute the exact dimension.
*/
bool prune_above_filtration(Filtration_value filtration) {
if (std::numeric_limits<Filtration_value>::has_infinity && filtration == std::numeric_limits<Filtration_value>::infinity())
bool prune_above_filtration(const Filtration_value_rep filtration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multiparameter case (partial order), I wonder what makes the most sense: removing everything that is larger than something, or removing everything that is not smaller than something? (In 1d the difference only matters for NaN)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The less risky would be to remove everything larger than something as those have to be pruned for sure.

A compromise would be to remove everything larger and to set to infinity all parameters of a filtration value whose value is larger.

Copy link
Member

@mglisse mglisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments (on everything but the crucial issue of naming the 2 functions 😉 )

src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
if (!(has_children(res_insert.first))) {
res_insert.first->second.assign_children(new Siblings(curr_sib, *vi));
}
res_insert = insert_node_<false, true, false>(curr_sib, *vi, filtration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first reaction is that insert_node_<false, true, false> is not very readable. But after looking at it for a while, I'd say ok, we can always revisit this later if we have a better idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree... But I found it better than having 4 methods doing more or less the same with just small variations. The other solution would have been to just keep the try_emplace and the update_simplex_tree_after_node_insertion in the method and letting the rest in the calling methods, but this would have lessen the point of insert_node_. Methods like insert_simplex_raw are so much more readable now.
If someone has a nicer solution, they are welcome!

src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
// dimension may need to be lowered
dimension_to_be_lowered_ = true;
return true;
//if filt and simplex.second.filtration() are non comparable, should return false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1-parameter, the old behavior is to remove (return true) a simplex with filtration value NaN. Do we really want to change that?
What is the main motivation

  • matching the concept ?
  • keeping NaN
  • preparing for multiparameter (but the version David described requires more work, at least in the k-critical case, so we will likely disable this function in the beginning)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to match the multi-parameter case and I forgot somehow about the NaN case... If we disable this for multi-parameter instead, I would roll back the changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a condition to includes the NaNs also. If you prefer that we disable this methods for any filtration value which is not an arithmetic native type, I will just put back the old conditions. Otherwise, I would add in the doc of prune_above_filtration that operator< is used to judge if a simplex should be removed or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"arithmetic native type" is not the right condition. It is perfectly fine to use this function when the filtration value type is CGAL::Lazy_exact_nt or mpq_class or any other non-native scalar type.
The problematic case is multiparameter (which in the other branch can be tested with something like Options::is_multi_parameter), where the code will need heavier modifications (according to David's description of what the function should do, for instance), and thus it is safer to disable the function for now.
The main drawback of the old code is that it uses <= which isn't currently part of the concept (discussed in another comment).

Copy link
Collaborator Author

@hschreiber hschreiber Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, "arithmetic native type" is not the right condition, but I don't see how to discriminate the "right" classes from the "wrong" classes otherwise (I can imagine that someone can come up with other weird cases for filtration other than multi-peristence). In particular, if the reason we want to forbid multi-persistence is that the condition "<" is not always the right choice. "<" is not wrong per-se for multi-persistence, we are just not sure if it is the usual use case for it, if there is one.
So, I was just suggesting that we either explicitly forbid it for everything non trivial or that we allow it for everyone, but we highlight in the documentation, that "above" means "operator<" and if it is not what the user wants, he shouldn't use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forbidding non-builtin Filtration_value would be a regression. It should remain possible to use for instance CGAL::Gmpq, it represents a subset of reals and is perfectly fine for 1-parameter filtrations. Trying to list of options

  1. We could explain in the changelog that people need to implement unify/intersect functions for their code to keep working (the doc already says it).
  2. We could remove the restriction to arithmetic from the default unify/intersect functions. Now only users who want something different from min/max (typically for multipersistence) need to provide their own implementation (which C++ will automatically pick instead of the generic one if it is more specialized).
  3. if !Options::is_multi_parameter, do not call unify/intersect but the default min/max instead (we can have a wrapper in Simplex_tree so we don't have to repeat that test for every use).

I like option 2, I think that's what you were suggesting in the comment above? If not, can you clarify?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like solution 2 the best. Even though I was referring to rec_prune_above_filtration in the comment above which does not make use of intersect/unify. But the spirit is the same.

src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
/**
* @private
* @brief Given two filtration values, stores in the first value the greatest common upper bound of the two values.
* If a filtration value has value `NaN`, it should be considered as the lowest value possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? Ah, that's what make_filtration_non_decreasing expects (IIRC we only added a little bit of support for NaN in Simplex_tree to help Alpha_complex). Well actually, NaN means "not set" there (so it should be overridden by anything), which happens to be (almost?) equivalent to -inf in this context, but if we wanted to handle it in unify_births it might correspond to +inf in that context.
I wonder if we should try to formalize a bit what NaN means in the Simplex_tree before extending it to more than a hack in make_filtration_non_decreasing? On the other hand, as long as we don't document the behavior publicly, we can still change it later.
I think in the current uses only f1 can be a NaN (in case we decide to give an asymmetric definition...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was just to comply to what make_filtration_non_decreasing expected.
NaN meaning "not set" makes sense to me. Otherwise, I am not sure what other use NaN could have when not "there is a problem here".

But you are right, we should decide, if we want to handle NaN in general or not and how (except if we keep everything in the backend as for now and in general, we just pretend that NaN is not possible). For example, unify_birth does not handle NaN, because we didn't before, but if we introduce NaN to push_to_smallest_common_upper_bound because needed in make_filtration_non_decreasing, it looks a bit inconsistent for an "outside use".

* its subsimplices of same filtration value) provides an indexing scheme
* (see IndexingTag).
*/
struct FiltrationValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes wonder if we should rename it to FiltrationValueForSimplexTree. But the name of concepts is not that important, it can always be changed later since it does not appear in users' code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concepts of FiltrationValue are not the same for other classes, it would make sense to rename it for the long run. But I have the feeling that for now, everything is more or less complying with the needs of the simplex tree, as it is mostly used as double.

src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
src/Simplex_tree/concept/FiltrationValue.h Outdated Show resolved Hide resolved
// dimension may need to be lowered
dimension_to_be_lowered_ = true;
return true;
//if filt and simplex.second.filtration() are non comparable, should return false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forbidding non-builtin Filtration_value would be a regression. It should remain possible to use for instance CGAL::Gmpq, it represents a subset of reals and is perfectly fine for 1-parameter filtrations. Trying to list of options

  1. We could explain in the changelog that people need to implement unify/intersect functions for their code to keep working (the doc already says it).
  2. We could remove the restriction to arithmetic from the default unify/intersect functions. Now only users who want something different from min/max (typically for multipersistence) need to provide their own implementation (which C++ will automatically pick instead of the generic one if it is more specialized).
  3. if !Options::is_multi_parameter, do not call unify/intersect but the default min/max instead (we can have a wrapper in Simplex_tree so we don't have to repeat that test for every use).

I like option 2, I think that's what you were suggesting in the comment above? If not, can you clarify?

src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
* @param ignore_infinite_values If true, simplices with filtration values at infinity are not cached.
*/
template<typename Comparator>
void initialize_filtration(Comparator&& is_before_in_filtration, bool ignore_infinite_values = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "ignore_infinite_values" enough, or do we want to take the chance to allow some arbitrary predicate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could also use some arbitrary predicate. Some method telling us if a simplex should be ignored or not. Would it replace the ignore_infinite_values version or add another overload with it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. initialize_filtration(bool ignore_infinite_values=false) is already in a release, so we shouldn't break it. But the new overload that takes a comparator is for advanced users and already takes a functor, so it shouldn't hurt much to pass it a lambda as second argument. So I would tend to say "replace", not "add". But I am open to other choices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I replaced the second parameter with a functor in the new overload of initialize_filtration and also added some unit tests to make sure it works as intended.

src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
@@ -2595,17 +2676,15 @@ class Simplex_tree {
if (members_size > 0) {
if constexpr (!Options::stable_simplex_handles) sib->members_.reserve(members_size);
Vertex_handle vertex;
Filtration_value filtration;
Filtration_value filtration(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of the multi-parameter filtration values is {-inf} not {}. It seemed more appropriate here to have an empty vector. But I also didn't really verified what deserialize_trivial does, or more precisely, how memcpy works for vectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, an empty vector is good, since the only thing we will do with it is overwrite it. That constructor is confusing though. When I see Filtration_value(n), I don't really expect {-inf,-inf,...,-inf} n times. For plain double, Filtration_value filtration; (without (0)) is nicer because it gives sanitizers a chance to warn if there is a case where we forget to assign a value.
But ok for now, we can rediscuss that in the multiparameter branch.
Actually, this seems to be in the deserialization function, does that work in the multiparameter branch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember now another reason why I did this. Sometimes filtration does not get initialized by default with 0 exactly, but with just a very small value close to 0. And then, the constructor of Filtration_simplex_base_dummy complains and throws in Debug mode.

Actually, this seems to be in the deserialization function, does that work in the multiparameter branch?

I naively though that it does as the methods looked quite general, but it does not... So I added some methods to the FiltrationValue concept (saying that here are only necessary in the case of de/serialization) and also some tests in simplex_tree_serialization_unit_test.cpp to make it work.

* @brief Given two filtration values, stores in the first value the lowest common upper bound of the two values.
* The overload for arithmetic types like `double` or `int` is already implemented as the maximum of the two
* given values and can also be used for non native arithmetic types like `CGAL::Gmpq` as long as it has an
* `operator<`. The overload is available with @ref Gudhi::intersect in @ref simplex_tree_options.h "".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should mention simplex_tree_options.h here

  • the name of the file does not match the content (the 2 functions look like they could have their own header, although it doesn't matter if we don't advertise their location too much)
  • users should #include <Gudhi/Simplex_tree.h> if they want this function, they should not directly include simplex_tree_options.h which is an implementation detail. Doxygen "forces" us to show more details than desirable, but let's limit the damage where possible.

Making the doc of the functions public means that the NaN behavior is documented now. I hope this won't come back to bite us in the [rear end]...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the mention of the file. For the NaN behaviour, I could take it out of the description and put it aside in a simple // comment which will be ignored by doxygen. Otherwise, I did not found a way to mix public and private with doxygen for a same description.

…version of initialize_filtration + change is_floating_point to has_quit_NaN for more generality + additional unit tests
src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
Comment on lines 1246 to 1247
* @param ignore_simplex Method taking a simplex handle as input and returns true if and only if the corresponding
* simplex should be ignored in the filtration and therefore not be cached.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could remind users here that it is their responsibility to ensure that the result is a filtration, i.e. when a face of a simplex is ignored, the simplex should be ignored as well.

src/Simplex_tree/include/gudhi/Simplex_tree.h Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants