Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 5.32 KB

README.md

File metadata and controls

81 lines (59 loc) · 5.32 KB

NUMPY FORMULAS

Implementation of math known formulas in Numpy

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

I made this repo in order to improve my mathematical python skills. I saw it necessary because I was taking the Data Mining course at my University. In this course I learned a lot of things about distances, matrices, proximities, etc. And I took the opportunity to get a little fun with the Numpy library. Feel free to use it if it is useful to you or to improve it if you think so! ✌

Logo

Image taken from realpython.com/numpy-tutorial/


If you like this Repo, Please click the ⭐

Contents

Distances

Distance measures play an important role in machine learning.

A distance measure is an objective score that summarizes the relative difference between two objects in a problem domain. Most commonly, the two objects are rows of data that describe a subject (such as a person, car, or house), or an event (such as a purchase, a claim, or a diagnosis).

Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

Proximity Measure

Proximity measures refer to the Measures of Similarity and Dissimilarity. Similarity and Dissimilarity are important because they are used by a number of data mining techniques, such as clustering, nearest neighbour classification, and anomaly detection.

Impurity Measure

Measure of impurity is very important for any tree based algorithms, it will mainly helps us to decide the root node.

In a given dataset that contains class for the predicted/dependent variable (like Yes,No,Neutral etc..), we can measure homogeneity or heterogeneity of the table based on the classes. We say a dataset is pure or homogeneous if it contains only a single class(either YES or NO). If a dataset contains several classes, then we say that the table is impure or heterogeneous(Combination of YES and NO). There are several ways to measure degree of impurity. Most well known ways to measures are given below:

Contact

Miguel Ángel Macías - 👨‍💻Linkedin

My Personal Website: ✨mangelladev.com