Extracting topological features to enhance standard algorithms with qualitative geometric information
This repository contains source code for a tutorial on applying computational topology in machine learning. To demonstrate the use of persistent homology, we apply it to the MNIST dataset of handwritten digits.
For more information, you can read the blog post or check out the interactive example.
Ensure you have the following dependencies installed:
- Python 3
- Dionysus 2 for computing persistent homology
- Boost version 1.55 or higher for Dionysus 2
- NumPy for loading data and computing
- Scikit-learn for machine learning algorithms
- Scikit-image for image pre-processing
- Matplotlib for plotting
- Networkx for drawing graphs
To get the data, run the prepare_data.py
script:
cd scripts
python3 prepare_data.py
This script downloads and saves 10,000 images of digits as NumPy arrays X_10000.npy
and y_10000.npy
in the data directory.
To extract the features, run the tda_digits.py
script:
$ cd src
$ python3 tda_digits.py
This generates figures for digit 8, which can be found in the example directory.
For detailed instructions on using the functions and classes, refer to the Jupyter notebooks: Example.ipynb
and Classification.ipynb
located in the notebooks directory.
-
Aaron Adcock, Erik Carlsson, Gunnar Carlsson, "The Ring of Algebraic Functions on Persistence Bar Codes", Apr 2013. https://arxiv.org/abs/1304.0530
-
Dmitriy Morozov, "Dionysus 2 documentation". https://mrzv.org/software/dionysus2/