NUMPY FORMULAS

Implementation of math known formulas in Numpy

I made this repo in order to improve my mathematical python skills. I saw it necessary because I was taking the Data Mining course at my University. In this course I learned a lot of things about distances, matrices, proximities, etc. And I took the opportunity to get a little fun with the Numpy library. Feel free to use it if it is useful to you or to improve it if you think so! ✌

Image taken from realpython.com/numpy-tutorial/

If you like this Repo, Please click the ⭐

Distances

Distance measures play an important role in machine learning.

A distance measure is an objective score that summarizes the relative difference between two objects in a problem domain. Most commonly, the two objects are rows of data that describe a subject (such as a person, car, or house), or an event (such as a purchase, a claim, or a diagnosis).

Euclidean
Manhattan
Minkowski
Superior/Chebyshev
Cosine

Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

Min-Max
Z-Norm

Proximity Measure

Proximity measures refer to the Measures of Similarity and Dissimilarity. Similarity and Dissimilarity are important because they are used by a number of data mining techniques, such as clustering, nearest neighbour classification, and anomaly detection.

Nominal
Ordinal
Binary
Mixed

Impurity Measure

Measure of impurity is very important for any tree based algorithms, it will mainly helps us to decide the root node.

In a given dataset that contains class for the predicted/dependent variable (like Yes,No,Neutral etc..), we can measure homogeneity or heterogeneity of the table based on the classes. We say a dataset is pure or homogeneous if it contains only a single class(either YES or NO). If a dataset contains several classes, then we say that the table is impure or heterogeneous(Combination of YES and NO). There are several ways to measure degree of impurity. Most well known ways to measures are given below:

Gini index
Entropy/Information Gain
Classification Error

Contact

Miguel Ángel Macías - 👨‍💻Linkedin

My Personal Website: ✨mangelladev.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NUMPY FORMULAS

Contents

Distances

Normalization

Proximity Measure

Impurity Measure

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

NUMPY FORMULAS

Contents

Distances

Normalization

Proximity Measure

Impurity Measure

Contact