Skip to content

Latest commit

 

History

History
44 lines (27 loc) · 1.83 KB

README.md

File metadata and controls

44 lines (27 loc) · 1.83 KB

neighbourhood_watch

Introduction

The first week's programming assignment for the UCSanDiego online course is centred around the Nearest Neighbour classifiers.

Classifiers

We provide three types of classifiers:

  1. A simple 1-NN classifier without any preprocessing methods based on L1 and L2 distance functions
  2. A 1-NN BallTree classifier
  3. A 1-NN KDTree classifier

Datasets

So far, we have focused our tests on two datasets:

  1. MNIST
  2. Spine

The MNIST dataset we use is a part of the well-known MNIST dataset. This dataset consists of 7500 train cases and 1000 test ones.

The Spine data set contains information from 310 patients. For each patient, there are: six measurements (the x) and a label (the y). The label has 3 possible values, ’NO’ (normal), ’DH’ (herniated disk), or ’SL’ (spondilolysthesis).

Processing methods

For the MNIST dataset, the train and test data are already separated. But, for the Spine dataset, we use the method of cross-validation with a factor of 5 to test our models on.

Dependencies

This project uses Python 3.10.12 to run and for a list of requirements consult the requirements.txt list.

Run

To run the project, configure the conf.yaml with data about the preprocessing method and dataset features. Then run the entry point classification.py.

Results

The following table shows the average error and accumulative time of running on the two datasets.

Dataset MNIST Spine
Naive Time: 33.88242554664612(s), Avg. Error: 0.046 Time: 0.2945997714996338(s), Avg. Error: 0.36129032258064514
BallTree Time: 6.562798976898193(s), Avg. Error: 0.046 Time: 0.0025920867919921875(s), Avg. Error: 0.36129032258064514
KDTree Time: 8.31318998336792(s), Avg. Error: 0.046 Time: 0.002149820327758789(s), Avg. Error: 0.36129032258064514