Welcome to my collection of machine learning and algorithm implementations. This repository contains various Python scripts that solve specific tasks and demonstrate different algorithms. Each script is accompanied by a brief explanation and an example of usage.
- Overview
- Files and Implementations
- 1. BFS Deque Implementation
- 2. Dots generating function
- 3. Restoring function coefficients
- 4. Levenshtein_distance
- 5. Tags accuracy
- 6. Easy convolution
- 7. Reverse-engineer the convolution kernel
- 8. K_means_clustering algorythm
- 9. Compression without information loss
- 10. Restore damaged images
- 11. Find cat
- 12. Binary classification example
This repository showcases a variety of implementations, ranging from basic algorithmic tasks to more complex machine learning models. Each file is designed to be standalone, allowing for easy understanding and execution of the code.
I called it a SANDBOX , because it's my space where I can play around and test new approaches.
Filename: bfs_deque_implementation_01.py
Description: The script is for finding of all paths on the n*m grid from multiple cells to the target. The legal moves are those equal to the knights move in chess.
-1 is returned in case the destination cannot be reached from at least one cell.
Filename: dots_generator_02.py
Description: The task is given 2000 sets of 100 dots on a 2d plan, predict its generating function.
Filename: restoring_coefficients_03.py
The chunk of code that restores the function coefficients, given the datapoints
The formula of the given function:
Where:
-
$\epsilon_i$ are the random variables that take values from the interval [-0.001, 0.001].
Filename: Levenshtein_distance_04.py
Description: Levenshtein distance/ editor's distance, or the number of single-symbol operations to convert
a string into another string.
Returns the count of such operations.
The algorithm is performed with building up the matrix.
Given strings
where
Where:
-
$m(a,b)$ =$0$ and$1$ otherwise; -
$min(a,b,c)$ returns minimal value of the arguments.
Filename: tags_accuracy_05.py
Description: The custom metric to measure the similarity between to sets of labels.
It accounts for the number of pairs of tags that are both equal in both sets of predictions, relative to the whole number of pairs.
Filename: convolve_06.py
Description: Just a convolution performed in pure numpy.
Filename: reverse_engineer_convolution_kernel_07.py
Description: Find the kernel
Convolution Operation:
where:
-
$O(x, y)$ is the output image pixel at position ( (x, y) ). -
$K(i, j)$ are the elements of the kernel. -
$I(x+i, y+j)$ are the pixels from the input image that overlap with the kernel. - The summations are over the kernel dimensions.
Finding the Kernel:
Given the system of linear equations derived from the convolution operation:
Solve the system of equations to find the values of
More explanation in the file reverse_engineer_convolution_kernel_07.py
Filename: K_means_clustering_08.py
Description: The code solves the problem of maximizing the retailer's profit. The task is to scatter the pick points on a map, with retailer profit defined as:
where:
-
$k_i$ - the set of houses for which the delivery point$c_i$ is the closest -
$d_{h,c} = (x_h - c_x)^2 + (y_h - c_y)^2$ - the Euclidean distance from house$h$ in district$k$ to the delivery point$c$ in the same district -
$C$ - the expected profit from all open delivery points -
$c_{\text{cost}}$ - the cost of opening one delivery point.
I implement the kmeans clustering algorithm to find the optimal distribution of pick points on the map
Filename: compression_medal_09.py
Description: Perform PCA feature reduction, so that the mean linear regression error doesn't drop below the certain benchmark on the cross-validation set of features
Filename: image_denoiser_10.py
Description: The class for image restoration. It implements the simple algorithm to recover 85% of omitting pixel values through picking the random non-zero value in its nearest surrounding.
Some of the examples of restored images:
Filename: find_cat_11.py
Description: The script is provided as an example of using deque-structure for effective implementation of the BFS on a chess-like grid.
Filename: binary_classification_12.py
Description: The example of the data-mining pipeline, with testing different models(logistic regressino, gradient boosting, random tree) for the classification task.