Skip to content

antononcube/Raku-Algorithm-KDimensionalTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Algorithm::KDimensionalTree

Raku package with implementations of the K-Dimensional Tree (K-D Tree) algorithm.

Remark: This package should not be confused with "Algorithm::KdTree", [ITp1], which provides Raku bindings to a C-implementation of the K-D Tree algorithm. (A primary motivation for making this package, "Algorithm::KDimensionalTree", was to have a pure-Raku implementation.)

Remark: The versions "ver<0.1.0+>" with the API "api<1>" of this package are better utilized via the package "Math::Nearest".

Compared to the simple scanning algorithm this implementation of the K-D Tree algorithm is often 2÷15 times faster for randomly generated low dimensional data. (Say. less than 6D.) It can be 200+ times faster for "real life" data, like, Geo-locations of USA cities.

The implementation is tested for correctness against Mathematica's Nearest. (See the resource files.)

Features

  • Finds both top-k Nearest Neighbors (NNs).
    • See the method k-nearest.
  • Finds NNs within a ball with a given radius.
    • See the method nearest-within-ball.
  • Can return indexes and distances for the found NNs.
    • The result shape is controlled with the adverb :values.
  • Works with arrays of numbers, arrays of arrays of numbers, and arrays of pairs.
  • Utilizes distance functions from "Math::DistanceFunctions".
    • Which can be specified by their string names, like, "bray-curtis" or "cosine-distance".
  • Allows custom distance functions to be used.

Installation

From Zef ecosystem:

zef install Algorithm::KDimensionalTree

From GitHub:

zef install https://github.com/antononcube/Raku-Algorithm-KDimensionalTree.git

Usage examples

Setup

use Algorithm::KDimensionalTree;
use Data::TypeSystem;
use Text::Plot;
# (Any)

Set of points

Make a random set of points:

my @points = ([(^100).rand, (^100).rand] xx 30).unique;
deduce-type(@points);
# Vector(Vector(Atom((Numeric)), 2), 30)

Create the K-dimensional tree object

my $kdTree = Algorithm::KDimensionalTree.new(@points);

say $kdTree;
# Algorithm::KDimensionalTree(points => 30, distance-function => &euclidean-distance)

Nearest k-neighbors

Use as a search point one from the points set:

my @searchPoint = |@points.head;
# [89.27640360563157 68.36845738910054]

Find 6 nearest neighbors:

my @res = $kdTree.k-nearest(@searchPoint, 6);
.say for @res;
# [89.27640360563157 68.36845738910054]
# [91.96352893754062 73.56174634865347]
# [80.0160216888336 69.80400458389097]
# [99.2564424012451 76.5797786385005]
# [79.5081497932748 52.564329376808196]
# [76.82139591710131 47.8616016873568]

Plot

Plot the points, the found nearest neighbors, and the search point:

my @point-char =  <* ⏺ ▲>;
say <data nns search> Z=> @point-char;
say text-list-plot(
[@points, @res, [@searchPoint,]],
:@point-char,
x-limit => (0, 100),
y-limit => (0, 100),
width => 60,
height => 20);
# (data => * nns => ⏺ search => ▲)
# ++----------+-----------+----------+-----------+----------++       
# +                                                          + 100.00
# |             *                                            |       
# |           *                          *                   |       
# +                                                          +  80.00
# |                                                    ⏺    ⏺|       
# |               *                    *         ⏺    ▲      |       
# |                     *           *                        |       
# +                                                          +  60.00
# |      *      *                   *           ⏺            |       
# |                                            ⏺            *|       
# +    *                                 **       *          +  40.00
# |                                                          |       
# |                            *       *           *         |       
# |                   *                                      |       
# +              *                                           +  20.00
# |               *                                          |       
# | *                              *           *             |       
# +                                                          +   0.00
# ++----------+-----------+----------+-----------+----------++       
#  0.00       20.00       40.00      60.00       80.00      100.00

TODO

  • TODO Implementation
    • DONE Using distance functions from an "universal" package
      • E.g. "Math::DistanceFunctions"
    • DONE Using distance functions other than Euclidean distance
    • DONE Returning properties
      • DONE Points
      • DONE Indexes
      • DONE Distances
      • DONE Labels
      • DONE Combinations of those
      • This is implemented by should be removed.
        • There is another package -- "Math::Nearest" -- that handles all nearest neighbors finders.
        • Version "0.1.0 with "api<1>" is without the .nearest method.
    • DONE Having an umbrella function nearest
      • Instead of creating a KDTree object etc.
      • This might require making a functor nearest-function
      • This is better done in a different package
    • TODO Make the nearest methods work with strings
      • For example, using Hamming distance over a collection of words.
      • Requires using the distance function as a comparator for the splitting hyperplanes.
      • This means, any objects can be used as long as they provide a distance function.
  • DONE Extensive correctness tests
    • Derived with Mathematica / WL (see the resources)
  • TODO Documentation
    • DONE Basic usage examples with text plots
    • TODO More extensive documentation with a Jupyter notebook
      • Using "JavaScript::D3".
    • TODO Corresponding blog post
    • MAYBE Corresponding video

References

[AAp1] Anton Antonov, Math::DistanceFunctions Raku package, (2024), GitHub/antononcube.

[AAp2] Anton Antonov, Math::Nearest Raku package, (2024), GitHub/antononcube.

[AAp3] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::Plot Raku package, (2022), GitHub/antononcube.

[ITp1] Itsuki Toyota, Algorithm::KdTree Raku package, (2016-2024), GitHub/titsuki.