Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New format for persistence #925

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

mglisse
Copy link
Member

@mglisse mglisse commented Jul 28, 2023

(not ready: missing tests at least, and doing the same for cubical)
As a step towards using more numpy arrays, this provides an option to SimplexTree.persistence() to output a list of arrays (1 numpy array per dimension), instead of our current list of tuple (all together, with the dimension being part of the tuple).
I open the PR to start the discussion.

return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
v = self.pcohptr.intervals_by_dimension()
return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do the usual workaround for empty dgm to get a shape (0,2).

@VincentRouvreau
Copy link
Contributor

(this PR requires to merge master for compilation to pass)

import gudhi
rips_complex = gudhi.RipsComplex(points=[[1, 1], [7, 0], [4, 6], [9, 6], [0, 14], [2, 19], [9, 17]],
                                 max_edge_length=12.0)

simplex_tree = rips_complex.create_simplex_tree(max_dimension=1)
simplex_tree.persistence(output_type='old')
# [(0, (0.0, inf)), (0, (0.0, 8.94427190999916)), (0, (0.0, 7.280109889280518)), (0, (0.0, 6.082762530298219)),
#  (0, (0.0, 5.830951894845301)), (0, (0.0, 5.385164807134504)), (0, (0.0, 5.0))]
simplex_tree.persistence(output_type='array by dimension')
# [array([[0.        , 5.        ],
#        [0.        , 5.38516481],
#        [0.        , 5.83095189],
#        [0.        , 6.08276253],
#        [0.        , 7.28010989],
#        [0.        , 8.94427191],
#        [0.        ,        inf]])]

This format is quite interesting and was initiated from discussion on #395

@martinroyer
Copy link
Collaborator

I don't have much to add except that I am greatly in favor of this and already using the new format using the following simple piece of translator code, if it is any help to anyone / anywhere:

def _diag_to_list_by_dim_format(pdiagram, tda_max_dim=None):
    """ transforms list of Tuple(dimension, Tuple(x,y)) into list of ndarray[[x_0, y_0], [x_1, y_1], ...]
     for each dimension, so if Tuple(x,y) has dimension i it will be found in the ith ndarray of
     the resulting list, e.g.:
         [(2, (1.0577792405537423, 1.1003878733068035)),
          (0, (0.0, np.inf)),
          (0, (0.0, 1.0556057201636535)),
          (0, (0.0, 0.8756047102452433))]
     transforms into
         [array([[0.        ,        inf],
                 [0.        , 1.05560572],
                 [0.        , 0.87560471]]),
         array([], shape=(0, 2), dtype=float64),
         array([[1.05777924, 1.10038787]])].
    """
    max_dimension = max(dim for (dim, _) in pdiagram) if tda_max_dim is None else tda_max_dim
    by_dim_pdiagram = [np.array([_ for (dim, _) in pdiagram if dim == i]) for i in range(0, max_dimension + 1)]
    return [diag if len(diag) else np.empty(shape=[0, 2]) for diag in by_dim_pdiagram]

I also suggested future tests for diagram vectorizers in the new format there: https://github.com/GUDHI/gudhi-devel/pull/1017/files#diff-509c3dbe85dd6b515d6d3e33d0f6c074686dcb27367baa653eeedeeec534f1c8. As Marc hinted it allows for nice interactions with sklearn.compose.ColumnTransformer for instance.

:returns: The persistence of the simplicial complex.
:rtype: list of pairs(dimension, pair(birth, death))
"""
self.compute_persistence(homology_coeff_field, min_persistence, persistence_dim_max)
return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array by dimension" seems a bit long to me, is "arrays" ok ? and 'old' isn't very telling : "tuples" ?
Also, should we raise a DeprecationWarning to change the default to the array one afterward ?

return self.pcohptr.get_persistence()
if output_type == 'array by dimension':
v = self.pcohptr.intervals_by_dimension()
return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples are slightly faster than lists in python, but that's negligible here.

@@ -612,7 +612,7 @@ cdef class SimplexTree:
"""
self.get_ptr().expansion_with_blockers_callback(max_dim, callback, <void*>blocker_func)

def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False):
def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False, output_type = 'old'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Not completely related to this PR, but I think we should do type annotations here, i.e.,

from typing import Optional,Literal, Union
def persistence(self, homology_coeff_field:int=11, min_persistence:float=0., persistence_dim_max:Optional[int] = None, output_type:Literal['old','array by dimension'] = 'old') -> list:
    pass

especially to have auto-completion for long arguments such as "array by dimension".
I also changed the False to a None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants