k-NN-Implementation

k-NN is a supervised, lazy algorithm that compares incoming test data with training data to find similar instances, using which test data is classified. For a given test instance, the algorithm searches for k number of closest instances/neighbors from training data. Once this set of closest instances is obtained, the class label occurring most number of times is assigned to the test instance; ties are broken arbitrarily.

The three distance/similarity measures implemented here are,

Euclidean distance: Is the ordinary, straight-line distance between two given points.
Polynomial kernel: Is defined by k(x, y) = (x*y + 1) ^ d where d is the degree of the polynomial and is specified by the kernel degree parameter. These kernels are well suited for problems where all the training data is normalized.
Radial basis kernel: Is defined by k(x, y) = exp{((-|| x – y ||) ^ 2) / (sigma ^ 2)}. Sigma is an adjustable parameter and plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. For this implementation 0.97 is found to be resulting in good accuracy.

Following are the sources for the three datasets that were used to test the implementation and its classification accuracy,

Input to the code:

Dataset
Count of folds (k) for k-fold cross validation
Number of nearest neighbors i.e. k-NN

Output:

Each (test) record/instance and respective prediction
Overall accuracy for each type of distance/similarity measure

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
kNN.py		kNN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k-NN-Implementation

About

Releases

Packages

Languages

nirajdewani/k-NN-Implementation

Folders and files

Latest commit

History

Repository files navigation

k-NN-Implementation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages