Skip to content

Commit

Permalink
notebooks: refactor basic - add n-dimensional (carsonfarmer#46)
Browse files Browse the repository at this point in the history
  • Loading branch information
jGaboardi authored Jun 17, 2024
1 parent ee2c8cb commit c451f25
Show file tree
Hide file tree
Showing 3 changed files with 1,178 additions and 463 deletions.
112 changes: 27 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,27 @@ In the limit of arbitrarily many subsets, each new addition or point moved by a
all nearest neighbors, to keep the in-degree of each point low, and
2. When we insert a point, we don't bother updating other points' neighbors.

**Total space:** $20n$ bytes (could be reduced to $4n$ at some cost in update time).

**Time per insertion or single distance update:** $O(n)$.

**Time per deletion or point update:** $O(n)$ expected, $O(n^2)$ worst case.

**Time per closest pair:** $O(n)$.
<table>
<tr>
<td align="center" colspan="2"><b>Complexity</b></td>
</tr>
<tr>
<td><i>Total space</i></td>
<td>$20n$ bytes (could be reduced to $4n$ at some cost in update time)</td>
</tr>
<tr>
<td><i>Time per insertion or single distance update</i></td>
<td>$O(n)$ </td>
</tr>
<tr>
<td><i>Time per deletion or point update</i></td>
<td>$O(n)$ expected, $O(n^2)$ worst case</td>
</tr>
<tr>
<td><i>TTime per closest pair</i></td>
<td>$O(n)$</td>
</tr>
</table>

This `Python` version of the algorithm combines ideas and code from the [closest-pair data structure testbed (C++)](https://www.ics.uci.edu/~eppstein/projects/pairs/Source/testbed/) developed around a [series of papers](https://www.ics.uci.edu/~eppstein/projects/pairs/Papers/) by Eppstein *et al.*

Expand Down Expand Up @@ -62,88 +76,16 @@ pytest -v fastpair --cov fastpair

Currently `fastpair` is tested against Python 3.{10,11,12}.

## Features

In the following examples we use the `random` module to generate data.

```python
from fastpair import FastPair, interact
import random

def rand_tuple(dim=2):
return tuple([random.random() for _ in range(dim)])
```

### Basics

The simplest way to use a `FastPair` data-structure is to initialize one and then update it with data points (via the `+=` operator).
In this first example, we create a sequence of $50 \times 10$ uniform random points and add them to a `FastPair` object:

```python
points = [rand_tuple(10) for _ in range(50)]
# Create empty data-structure with `min_points=10` and
# using a Euclidean distance metric
fp = FastPair()
fp.build(points) # Add points all at once and build conga line to start
```

You can then add additional points, and start to query the data-structure for
the closest pair of points. As points are added, the data-structure responds
and updates accordingly
(see [this paper](http://dl.acm.org/citation.cfm?id=351829) for details):

```python
fp += rand_tuple(10)
fp += rand_tuple(10)

# This is the 'FastPair' algorithm, should be fast for large n
fp.closest_pair()
# There is also a brute-force version, can be fast for smaller n
fp.closest_pair_brute_force()
```

`FastPair` has several useful properties and methods, including checking the size of the data-structure (i.e., how many points are currently stored), testing for containment of a given point, various methods for computing the closest pair, finding the neighbor of a given point, computing multiple distances at once, and even merging points (clusters):
## Utilizing `FastPair`

```python
len(fp)
rando = rand_tuple(10)
points[0] in fp # True
rando in fp # False
fp() # Compute closest pair
neigh = fp.find_neighbor(rando) # Neighbor of 'outside' point
fp.sdist(rando) # Compute distances from rando to all points in fp
```
This notebooks linked below are designed as interactive, minimum tutorials in working with `fastpair` and require additional dependencies, which can be installed with:

To illustrate the `merge`ing methods, here is a simple example of hierarchical clustering, treating `points` as the 'centroids' of various clusters:

```python
for i in range(len(fp)-1):
# First method... do it manually:
dist, (a, b) = fp.closest_pair()
c = interact(a, b) # Compute mean centroid
fp -= b
fp -= a
fp += c
# Alternatively... do it all in one step:
# fp.merge_closest()
len(fp) # 1
```bash
pip install -e .[tests,notebooks]
```

Finally, plotting should be pretty obvious to those familiar with `matplotlib` (or other `Python` plotting facilities).

```python
import matplotlib.pyplot as plt

points = [rand_tuple(2) for _ in range(50)] # 2D points
fp = FastPair().build(points)
dist, (a, b) = fp.closest_pair()

plt.figure()
plt.scatter(*zip(*fp.points))
plt.scatter(*zip(a, b), color="red")
plt.title("Closest pair is {:.2} units apart.".format(dist))
plt.show()
```
* [`basics_usage.iypnb`](https://github.com/carsonfarmer/fastpair/notebooks/basics_usage.iypnb): Understanding the `fastpair` functionality and data structure
* [`n-dimensional_pointsets`](https://github.com/carsonfarmer/fastpair/notebooks/n-dimensional_pointsets.iypnb): Querying point clouds

## License

Expand Down
Loading

0 comments on commit c451f25

Please sign in to comment.