Skip to content
This repository has been archived by the owner on Mar 2, 2023. It is now read-only.

Commit

Permalink
Merge branch 'release/0.0.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
jmbr committed May 20, 2017
2 parents 706cb07 + 07d47ac commit 6a021be
Show file tree
Hide file tree
Showing 29 changed files with 1,858 additions and 0 deletions.
10 changes: 10 additions & 0 deletions GNUmakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#############################################################################
# Development make file.
#############################################################################

SOURCE_DIR := diffusion_maps

$(SOURCE_DIR)/version.py:
@scripts/make-version > $@

.PHONY: $(SOURCE_DIR)/version.py
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2017 Juan M. Bello Rivas <jmbr@superadditive.com>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
65 changes: 65 additions & 0 deletions README.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#+TITLE: Diffusion Maps and Geometric Harmonics for Python
#+AUTHOR: Juan M. Bello Rivas
#+EMAIL: jmbr@superadditive.com
#+DATE: <2017-05-20 Sat>

* Overview

The =diffusion-maps= library for Python provides a fast and accurate implementation of diffusion maps[fn:1] and geometric harmonics[fn:2]. Its speed stems from the use of sparse linear algebra and (optionally) graphics processing units to accelerate computations.
The included code routinely solves eigenvalue problems 3 x faster than SciPy using GPUs on matrices with over 200 million non-zero entries.

The package includes a command-line utility for the quick calculation of diffusion maps on data sets.

Some of the features of the =diffusion-maps= module include:

- Fast evaluation of distance matrices using nearest neighbors.

- Fast and accurate computation of eigenvalue/eigenvector pairs using sparse linear algebra.

- Optional GPU-accelerated sparse linear algebra routines.

- Optional interface to the [[https://github.com/opencollab/arpack-ng][ARPACK-NG]] library.

- Simple and easily modifiable code.

[fn:1] Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5–30. http://doi.org/10.1016/j.acha.2006.04.006

[fn:2] Coifman, R. R., & Lafon, S. (2006). Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions. Applied and Computational Harmonic Analysis, 21(1), 31–52. http://doi.org/10.1016/j.acha.2005.07.005

#+CAPTION: Geometric harmonics for $z = sin(x^2 + y^2)$.
#+NAME: fig:geometric-harmonics
[[./geometric-harmonics.png]]

* Prerequisites

The library is implemented in Python 3.5+ and uses [[http://www.numpy.org/][NumPy]] and [[https://www.scipy.org/][SciPy]]. It is recommended to install [[https://mathema.tician.de/software/pycuda/][PyCUDA]] to enable the GPU-accelerated eigenvalue solver.

The =diffusion-maps= command can display the resulting diffusion maps using [[https://matplotlib.org/][Matplotlib]] if it is available.

* Installation

Use ~python setup.py install~ to install on your system or ~python setup.py install --user~ for a user-specific installation.

* Command-line utility

The ~diffusion-maps~ command reads data sets stored in NPY format. The simplest way to use it is to invoke it as follows:

#+BEGIN_SRC bash
diffusion-maps DATA-SET.NPY EPSILON-VALUE
#+END_SRC

There exist parameters to save and visualize different types of results, to specify how many eigenvalue/eigenvector pairs to compute, etc. See the help page displayed by:

#+BEGIN_SRC bash
diffusion-maps --help
#+END_SRC

* Additional documentation

[[http://www.sphinx-doc.org/en/stable/][Sphinx]]-based API documentation is available in the =doc/= folder. Run

#+BEGIN_SRC bash
make -C doc html
#+END_SRC

to build the documentation.
7 changes: 7 additions & 0 deletions diffusion_maps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Diffusion maps module.
"""

from .diffusion_maps import *
from .geometric_harmonics import *
from .version import *
61 changes: 61 additions & 0 deletions diffusion_maps/clock.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"""Clock for keeping track of the wall time.
"""


__all__ = ['ClockError', 'Clock']

import datetime
import time

from typing import Optional # noqa: F401. Used for mypy.


class ClockError(Exception):
"""Invalid clock operation."""
pass


class Clock:
"""Clock for keeping track of time.
"""
def __init__(self) -> None:
self.start = None # type: Optional[float]
self.stop = None # type: Optional[float]

def tic(self) -> None:
"""Start the clock."""
self.start = time.monotonic()
self.stop = None

def toc(self) -> None:
"""Stop the clock."""
assert self.start is not None
self.stop = time.monotonic()

def __str__(self) -> str:
"""Human-readable representation of elapsed time."""
if self.start is None:
raise ClockError('The clock has not been started')
else:
start = datetime.datetime.fromtimestamp(self.start)

if self.stop is None:
stop = datetime.datetime.fromtimestamp(time.monotonic())
else:
stop = datetime.datetime.fromtimestamp(self.stop)

delta = stop - start

return str(delta)

def __enter__(self):
if self.start is None and self.stop is None:
self.tic()

return self

def __exit__(self, exc_type, exc_value, traceback):
if self.start is not None:
self.toc()
152 changes: 152 additions & 0 deletions diffusion_maps/command_line_interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
#!/usr/bin/env python
# PYTHON_ARGCOMPLETE_OK -*- mode: python -*-

"""Command line interface for the computation of diffusion maps.
"""

import argparse
try:
import argcomplete
except ImportError:
argcomplete = None
import logging
import os
import sys

import numpy as np

import diffusion_maps.default as default
import diffusion_maps.version as version
from diffusion_maps import downsample, DiffusionMaps
from diffusion_maps.profiler import Profiler
from diffusion_maps.plot import plot_eigenvectors


def output_eigenvalues(ew: np.array) -> None:
"""Output the table of eigenvalues.
"""
logging.info('List of eigenvalues:')
fields = ['Real part', 'Imaginary part']
fmt = '{:>12} | {:<21}'
logging.info('-' * 30)
logging.info(fmt.format(*fields))
logging.info('-' * 30)
fmt = '{:+2.9f} | {:+2.9f}'
for eigenvalue in ew:
logging.info(fmt.format(eigenvalue.real, eigenvalue.imag))


def use_cuda(args: argparse.Namespace) -> bool:
"""Determine whether to use GPU-accelerated code or not.
"""
try:
import pycuda # noqa
use_cuda = True and not args.no_gpu
except ImportError:
use_cuda = False

return use_cuda


def main():
parser = argparse.ArgumentParser(description='Diffusion maps')
parser.add_argument('data_file', metavar='FILE', type=str,
help='process %(metavar)s (should be in NPY format)')
parser.add_argument('epsilon', metavar='VALUE', type=float,
help='kernel bandwidth')
parser.add_argument('-n', '--num-samples', type=float, metavar='NUM',
required=False, help='number of data points to use')
parser.add_argument('-e', '--num-eigenpairs', type=int, metavar='NUM',
required=False, default=default.num_eigenpairs,
help='number of eigenvalue/eigenvector pairs to '
'compute')
parser.add_argument('-c', '--cut-off', type=float, required=False,
metavar='DISTANCE', help='cut-off to use to enforce '
'sparsity in the diffusion maps computation.')
parser.add_argument('-o', '--output-data', type=str, required=False,
default='actual-data.npy', metavar='FILE', help='save '
'actual data used in computation to %(metavar)s')
parser.add_argument('-w', '--eigenvalues', type=str,
default='eigenvalues.dat', required=False,
metavar='FILE', help='save eigenvalues to '
'%(metavar)s')
parser.add_argument('-v', '--eigenvectors', type=str,
default='eigenvectors.npy', required=False,
metavar='FILE', help='save eigenvectors to '
'%(metavar)s')
parser.add_argument('-m', '--matrix', type=str, required=False,
metavar='FILE', help='save transition matrix to '
'%(metavar)s')
parser.add_argument('-p', '--plot', action='store_true', default=False,
help='plot first two eigenvectors')
parser.add_argument('--no-gpu', action='store_true', required=False,
help='disable GPU eigensolver')
parser.add_argument('--debug', action='store_true', required=False,
help='print debugging information')
parser.add_argument('--profile', required=False, metavar='FILE',
type=argparse.FileType('w', encoding='utf-8'),
help='run under profiler and save report to '
'%(metavar)s')

args = parser.parse_args(sys.argv[1:])

if args.debug is True:
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
else:
logging.basicConfig(level=logging.INFO, format='%(message)s')

prog_name = os.path.basename(sys.argv[0])
logging.info('{} {}'.format(prog_name, version.v_long))
logging.info('')
logging.info('Reading data from {!r}...'.format(args.data_file))

orig_data = np.load(args.data_file)
if args.num_samples:
data = downsample(orig_data, int(args.num_samples))
else:
data = orig_data

logging.info('Computing {} diffusion maps with epsilon = {:g} '
'on {} data points...'
.format(args.num_eigenpairs-1, args.epsilon, data.shape[0]))

with Profiler(args.profile):
dm = DiffusionMaps(data, args.epsilon,
num_eigenpairs=args.num_eigenpairs,
use_cuda=use_cuda(args))

if args.profile:
args.profile.close()

output_eigenvalues(dm.eigenvalues)

if args.matrix:
logging.info('Saving transition matrix to {!r}'
.format(args.matrix))
import scipy.io
scipy.io.mmwrite(args.matrix, dm.kernel_matrix)

if args.eigenvalues:
logging.info('Saving eigenvalues to {!r}'
.format(args.eigenvalues))
np.savetxt(args.eigenvalues, dm.eigenvalues)

if args.eigenvectors:
logging.info('Saving eigenvectors to {!r}'
.format(args.eigenvectors))
np.save(args.eigenvectors, dm.eigenvectors)

if args.output_data and args.num_samples:
logging.info('Saving downsampled data to {!r}'
.format(args.output_data))
np.save(args.output_data, data)

if args.plot is True:
plot_eigenvectors(data, dm.eigenvectors)


if __name__ == '__main__':
main()
49 changes: 49 additions & 0 deletions diffusion_maps/cpu_eigensolver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""Compute dominant eigenvectors of a sparse matrix.
"""

__all__ = ['eigensolver']

from typing import Optional, Tuple

import numpy as np
import scipy.sparse
import scipy.sparse.linalg

from . import default


def eigensolver(matrix: scipy.sparse.csr_matrix,
num_eigenpairs: int = default.num_eigenpairs,
sigma: Optional[float] = None,
initial_vector: Optional[np.array] = None) \
-> Tuple[np.array, np.array]:
"""Solve eigenvalue problem for sparse matrix.
Parameters
----------
matrix : scipy.sparse.csr_matrix
A matrix in compressed sparse row format.
num_eigenpairs : int, optional
Number of eigenvalue/eigenvector pairs to obtain.
sigma : float, optional
Find eigenvalues close to the value of sigma. The default value is
near 1.0.
initial_vector : np.array, optional
Initial vector to use in the Arnoldi iteration. If not set, a vector
with all entries equal to one will be used.
Returns
-------
ew : np.array
Eigenvalues in descending order of magnitude.
ev : np.array
Eigenvectors corresponding to the eigenvalues in `ew`.
"""
if initial_vector is None:
initial_vector = np.ones(matrix.shape[0])
ew, ev = scipy.sparse.linalg.eigs(matrix, k=num_eigenpairs, which='LM',
sigma=sigma, v0=initial_vector)
ii = np.argsort(np.abs(ew))[::-1]
return ew[ii], ev[:, ii].T
Loading

0 comments on commit 6a021be

Please sign in to comment.