New format for persistence #925

mglisse · 2023-07-28T11:19:06Z

(not ready: missing tests at least, and doing the same for cubical)
As a step towards using more numpy arrays, this provides an option to SimplexTree.persistence() to output a list of arrays (1 numpy array per dimension), instead of our current list of tuple (all together, with the dimension being part of the tuple).
I open the PR to start the discussion.

mglisse · 2023-07-28T11:21:45Z

src/python/gudhi/simplex_tree.pyx

-        return self.pcohptr.get_persistence()
+        if output_type == 'array by dimension':
+            v = self.pcohptr.intervals_by_dimension()
+            return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?


We should probably do the usual workaround for empty dgm to get a shape (0,2).

VincentRouvreau · 2023-11-29T08:19:56Z

(this PR requires to merge master for compilation to pass)

import gudhi
rips_complex = gudhi.RipsComplex(points=[[1, 1], [7, 0], [4, 6], [9, 6], [0, 14], [2, 19], [9, 17]],
                                 max_edge_length=12.0)

simplex_tree = rips_complex.create_simplex_tree(max_dimension=1)
simplex_tree.persistence(output_type='old')
# [(0, (0.0, inf)), (0, (0.0, 8.94427190999916)), (0, (0.0, 7.280109889280518)), (0, (0.0, 6.082762530298219)),
#  (0, (0.0, 5.830951894845301)), (0, (0.0, 5.385164807134504)), (0, (0.0, 5.0))]
simplex_tree.persistence(output_type='array by dimension')
# [array([[0.        , 5.        ],
#        [0.        , 5.38516481],
#        [0.        , 5.83095189],
#        [0.        , 6.08276253],
#        [0.        , 7.28010989],
#        [0.        , 8.94427191],
#        [0.        ,        inf]])]

This format is quite interesting and was initiated from discussion on #395

martinroyer · 2024-06-25T10:18:48Z

I don't have much to add except that I am greatly in favor of this and already using the new format using the following simple piece of translator code, if it is any help to anyone / anywhere:

def _diag_to_list_by_dim_format(pdiagram, tda_max_dim=None):
    """ transforms list of Tuple(dimension, Tuple(x,y)) into list of ndarray[[x_0, y_0], [x_1, y_1], ...]
     for each dimension, so if Tuple(x,y) has dimension i it will be found in the ith ndarray of
     the resulting list, e.g.:
         [(2, (1.0577792405537423, 1.1003878733068035)),
          (0, (0.0, np.inf)),
          (0, (0.0, 1.0556057201636535)),
          (0, (0.0, 0.8756047102452433))]
     transforms into
         [array([[0.        ,        inf],
                 [0.        , 1.05560572],
                 [0.        , 0.87560471]]),
         array([], shape=(0, 2), dtype=float64),
         array([[1.05777924, 1.10038787]])].
    """
    max_dimension = max(dim for (dim, _) in pdiagram) if tda_max_dim is None else tda_max_dim
    by_dim_pdiagram = [np.array([_ for (dim, _) in pdiagram if dim == i]) for i in range(0, max_dimension + 1)]
    return [diag if len(diag) else np.empty(shape=[0, 2]) for diag in by_dim_pdiagram]

I also suggested future tests for diagram vectorizers in the new format there: https://github.com/GUDHI/gudhi-devel/pull/1017/files#diff-509c3dbe85dd6b515d6d3e33d0f6c074686dcb27367baa653eeedeeec534f1c8. As Marc hinted it allows for nice interactions with sklearn.compose.ColumnTransformer for instance.

DavidLapous · 2024-06-25T12:00:20Z

src/python/gudhi/simplex_tree.pyx

        :returns: The persistence of the simplicial complex.
        :rtype:  list of pairs(dimension, pair(birth, death))
        """
        self.compute_persistence(homology_coeff_field, min_persistence, persistence_dim_max)
-        return self.pcohptr.get_persistence()
+        if output_type == 'array by dimension':


"array by dimension" seems a bit long to me, is "arrays" ok ? and 'old' isn't very telling : "tuples" ?
Also, should we raise a DeprecationWarning to change the default to the array one afterward ?

DavidLapous · 2024-06-25T12:08:14Z

src/python/gudhi/simplex_tree.pyx

-        return self.pcohptr.get_persistence()
+        if output_type == 'array by dimension':
+            v = self.pcohptr.intervals_by_dimension()
+            return [ np.asarray(dgm) for dgm in v ] # std::move(dgm)?


Tuples are slightly faster than lists in python, but that's negligible here.

DavidLapous · 2024-06-25T12:17:53Z

src/python/gudhi/simplex_tree.pyx

@@ -612,7 +612,7 @@ cdef class SimplexTree:
        """
        self.get_ptr().expansion_with_blockers_callback(max_dim, callback, <void*>blocker_func)

-    def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False):
+    def persistence(self, homology_coeff_field=11, min_persistence=0, persistence_dim_max = False, output_type = 'old'):


-- Not completely related to this PR, but I think we should do type annotations here, i.e.,

from typing import Optional,Literal, Union def persistence(self, homology_coeff_field:int=11, min_persistence:float=0., persistence_dim_max:Optional[int] = None, output_type:Literal['old','array by dimension'] = 'old') -> list: pass

especially to have auto-completion for long arguments such as "array by dimension".
I also changed the False to a None.

New format for persistence

f0c851f

mglisse commented Jul 28, 2023

View reviewed changes

mglisse mentioned this pull request Jan 4, 2024

Archipelago - dict transformer for vectorizing persistence diagrams #1017

Closed

mglisse mentioned this pull request Jun 3, 2024

Adding the function to compute the topological disorder of a persiste… #1053

Open

DavidLapous reviewed Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New format for persistence #925

New format for persistence #925

mglisse commented Jul 28, 2023

mglisse Jul 28, 2023

VincentRouvreau commented Nov 29, 2023

martinroyer commented Jun 25, 2024

DavidLapous Jun 25, 2024

DavidLapous Jun 25, 2024

DavidLapous Jun 25, 2024

New format for persistence #925

Are you sure you want to change the base?

New format for persistence #925

Conversation

mglisse commented Jul 28, 2023

mglisse Jul 28, 2023

Choose a reason for hiding this comment

VincentRouvreau commented Nov 29, 2023

martinroyer commented Jun 25, 2024

DavidLapous Jun 25, 2024

Choose a reason for hiding this comment

DavidLapous Jun 25, 2024

Choose a reason for hiding this comment

DavidLapous Jun 25, 2024

Choose a reason for hiding this comment