Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the function to compute the topological disorder of a persiste… #1053

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TristanMG
Copy link

…nce diagram. The function is located in "vector_methods.py"

…nce diagram. The function is located in "vector_methods.py"
@mglisse
Copy link
Member

mglisse commented Jun 3, 2024

Hello,
thank you for your contribution.
I am trying to think what the best way would be to integrate topological disorder in gudhi, the following is just a brain dump.
The gudhi.representations module does not look like a great fit: it contains sklearn-compatible classes, works on lists of diagrams, and expects a diagram to be a list of (birth,death) (usually numpy.array) without dimension.
We are trying to move away from the format (dim,(birth,death)). See https://gudhi.inria.fr/python/latest/cubical_complex_sklearn_itf_ref.html or #925 for the new format.
If we assume a format like [dgm_dim0, dgm_dim1, ...], then using sklearn.compose.ColumnTransformer with the current Entropy class almost computes the right thing: all that is missing is the normalization (1-entropy/log(n)), and the sum. We could add an option to Entropy to do this specific normalization, and leave it to the user to compute the sum if they want to. I think that's my current preference.
We could consider having sklearn-like transformers that work on lists of "complete diagrams" (when we have the diagrams in several dimensions as in the new format), not just lists of single diagrams as currently, for cases where ColumnTransformer is not appropriate.
We could also consider a function that takes one full diagram (old or new format?) and computes the TD directly, as in the current code in this PR, but then it would probably need to go to a separate submodule, not representations, since it would not fit (pun!) with the sklearn interface.
Once the API is decided, a simple test exercising the function will be needed before the PR can be merged (an example in the doc may work if it is written in a way that it is actually run and checked, or something in src/python/test).
We tend to put references in biblio/bibliography.bib and reference them with :cite:. That's not as nice for people who read the doc through help(function_name), but it yields a nicer web doc.

@TristanMG
Copy link
Author

TristanMG commented Jun 18, 2024 via email

@mglisse
Copy link
Member

mglisse commented Jun 22, 2024

would you prefer us to modify the function Entropy.transform, adding the option for the user to either compute the persistent entropy or the topological disorder, or to create a new function Entropy.topological_disorder?

What I had in mind, in order to stick to the scikit-learn API, was to pass an option to __init__ saying that we want the disorder instead of the entropy (maybe allowing mode="disorder" would be better than a separate Boolean argument, but I didn't think about it very long). transform, after it has computed the scalar entropy, would check if this option was set, and in that case replace the entropy ent with 1-ent/log n before storing it in the output list.
This would require a few tweaks (mode=="scalar"mode!="vector", various things in the doc), but nothing big.

The nx2 arrays that Entropy takes are already the new format — for a single dimension. With the new format for multiple dimensions, ColumnTransformer+Entropy should give the "split" version from your code, and it would still be up to the user to sum that to a single scalar. If you think it is useful, we could consider a convenience function that calls ColumnTransformer+Entropy+sum (or something roughly equivalent), but for that one I'd rather wait until the new format becomes the main one in Gudhi.

Does that sound ok? Don't hesitate to speak if you have a different opinion about what would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants