-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the function to compute the topological disorder of a persiste… #1053
base: master
Are you sure you want to change the base?
Conversation
…nce diagram. The function is located in "vector_methods.py"
Hello, |
Dear Marc,
Thank you for your email. We have discussed your proposed solutions and we
have a few questions.
We first thought of putting the function computing topological disorder in
the same file as the Entropy class, for their similarities. We are fine
adding an option to the Entropy instead of a stand-alone function. With
this approach, would you prefer us to modify the function
Entropy.transform, adding the option for the user to either compute the
persistent entropy or the topological disorder, or to create a new function
Entropy.topological_disorder?
I assume that if we choose this approach, it will make the transition to
the new format of persistence diagrams easier by just modifying the Entropy
class?
We will add a test file in src/python/test/.
Best wishes,
Tristan
Le lun. 3 juin 2024 à 23:20, Marc Glisse ***@***.***> a
écrit :
… Hello,
thank you for your contribution.
I am trying to think what the best way would be to integrate topological
disorder in gudhi, the following is just a brain dump.
The gudhi.representations module does not look like a great fit: it
contains sklearn-compatible classes, works on lists of diagrams, and
expects a diagram to be a list of (birth,death) (usually numpy.array)
without dimension.
We are trying to move away from the format (dim,(birth,death)). See
https://gudhi.inria.fr/python/latest/cubical_complex_sklearn_itf_ref.html
or #925 <#925> for the new
format.
If we assume a format like [dgm_dim0, dgm_dim1, ...], then using
sklearn.compose.ColumnTransformer with the current Entropy class almost
computes the right thing: all that is missing is the normalization
(1-entropy/log(n)), and the sum. We could add an option to Entropy to do
this specific normalization, and leave it to the user to compute the sum if
they want to. I think that's my current preference.
We could consider having sklearn-like transformers that work on lists of
"complete diagrams" (when we have the diagrams in several dimensions as in
the new format), not just lists of single diagrams as currently, for cases
where ColumnTransformer is not appropriate.
We could also consider a function that takes one full diagram (old or new
format?) and computes the TD directly, as in the current code in this PR,
but then it would probably need to go to a separate submodule, not
representations, since it would not fit (pun!) with the sklearn interface.
Once the API is decided, a simple test exercising the function will be
needed before the PR can be merged (an example in the doc may work if it is
written in a way that it is actually run and checked, or something in
src/python/test).
We tend to put references in biblio/bibliography.bib and reference them
with :cite:. That's not as nice for people who read the doc through
help(function_name), but it yields a nicer web doc.
—
Reply to this email directly, view it on GitHub
<#1053 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCNLCHG2QVJRY7BOBNS4R3LZFTT2RAVCNFSM6AAAAABIDRYZM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBWGIZDGOJYGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
What I had in mind, in order to stick to the scikit-learn API, was to pass an option to The nx2 arrays that Entropy takes are already the new format — for a single dimension. With the new format for multiple dimensions, ColumnTransformer+Entropy should give the "split" version from your code, and it would still be up to the user to sum that to a single scalar. If you think it is useful, we could consider a convenience function that calls ColumnTransformer+Entropy+sum (or something roughly equivalent), but for that one I'd rather wait until the new format becomes the main one in Gudhi. Does that sound ok? Don't hesitate to speak if you have a different opinion about what would be useful. |
…nce diagram. The function is located in "vector_methods.py"