The recently developed field of Topological Data Analysis (TDA) provides multiple computational methods that can be used to understand complex systems. These techniques include Mapper, persistent homology, path signatures, and sheaf theory. The goal of this repository is to offer an introduction to these ideas by way of a simple example. All notebooks use a Julia kernel, but a few include calls to Python packages.
Please contact annsize at seas dot upenn dot edu for questions.
These collections of notebooks can be used as standalone files, but for ease they can also be run inside a docker container. The following are instrucitons for running in a docker container (recommended).
- Install docker (I use Docker Desktop).
- In the terminal, navigate to this directory and execute
make build
. - Once the docker image has finished building, you can execute
make run
to use a read-only (no saving) version, or I prefer to mount the directory by instead executingdocker run -it --rm -p 8888:8888 -v $(pwd):/home/jovyan/ ph_tutorial:deploy0520
. - Copy the http://127.0.0... path into your favorite browser (warning this code only tested on Chrome).
- Enjoy learning about topological data analysis!
In brief, persistent homology chronicles evolving cavities within a filtered simplicial complex. Please see 1, 2, 3 for an introduction. In the PersistentHomologyExample.ipynb
notebook, we use the Eirene package to compute the persistent homology of a subset of the NYC food data from Open Food Facts. Other packages to compute persistent homology include Gudhi(C++, Python), Javaplex(MATLAB), TDA(R), Dionysus(C++, Python), and Ripser(C++, Python).
The Mapper algorithm helps to visualize and intuit high dimensional data. It effectively is a clustering algorithm, but it outputs a graph (or simplicial complex) with nodes as clusters and connections representing similar clusters. More explicitly, Mapper segments the data into overlapping bins, clusters within each bin, and then creates the output graph (or simplicial complex) with each node corresponding to a cluster and connections corresponding to clusters that share data points. For more information, please see 4, 5. The documentation of kmapper
is extensive, so we include only a short example in the MapperExamples.ipynb
notebook. The giotto-tda
package also provides a generic Mapper implementation.
Computing elements of the path signature helps us understand relationships between elements of a time series, specifically lead-lag relationships. Using ideas from 6, 7, the authors of 8 used path signatures to understand differences between brain activity from individuals with tinnitus and from healthy controls. The example in PathSignaturesExamples.ipynb
uses the iisignature
package and calculates the lead matrix (matrix showing lead-lag relationships) for generated paths. Additionally the example examines the effect of random noise in the sample on the calculated lead matrix.
Graphs with data atop nodes that satisfy constraints across edges can often be modeled with the sheaf formalism. For an introduction to sheaves, please see 9, 10, 11. The brief example in SheavesExample.ipynb
demonstrates how to construct a sheaf and calculate the consistency radius using pysheaf
. The pysheaf package has multiple examples, and ours draws from this example but uses the specific case of Figure 3a in Blevins and Bassett 2020. To work with sheaf laplacians, please check out SheafLearning.jl.