GitHub - kuta-ndze/Dimensionality_Reduction_Techniques: Dimensionality reduction techniques with UCIML datasets from Kirill Eremenko intuition

`Dimensionality_Reduction_Techniques`

Principal Component Analaysis (PCA)

Goal is to identify and detect strong correlation within variables i.e finding dimensions of maximum variance, reduce the dimensions of a d-dimensional dataset by projecting it onto a (k)-dimensional subspace (where k<d)
Unlike linear regression it attempts to learn the relationship between X and Y values quantified by finding a list of principal axes.
PCA can be highly affected by outliers in the data.
A good analysis of UCIML wine dataset applying PCA with 2 components and building a Logistic Regression classifier.
- PCAclassifier
  
  Visualising the Train set Visualizing the Test set
In this type of algorithms we could try multiple number of components starting at 2, if we see that the model underperforms to get the optimal number of features.

from sklearn.decomposition import PCA
pca = PCA(n_components = "choose optimal nbr of components")
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

Linear Discriminant Analaysis (LDA)

LDA differs from PCA in that in addition to finding the component axises, we are interested in the axes that maximize the separation between multiple classes.
Both are all linear transformation techniques use for dimensionality reduction.
PCA is described as unsupervised but LDA is supervised because of the relation to the dependent variable.
The goal of LDA is to project feature space ( a dataset of n-dimensional samples) onto a small subspace k(where k <= n-1) while maintaining the class-discriminatory information.
Five steps method for the algorithm as well. The application of LDA before the classifier below.
- LDAclassifier
  
  Visualising the Train set Visualizing the Test set
The implementation of LDA is different from PCA module.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
#to apply LDA need both features and dependent variables
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

Kernel Principal Component Analysis

The Kernal PCA in most cases will always outperform the normal PCA.
We have applied KernelPCA to the UCIML Wine dataset
- KernelPCAclassifier
  
  Visualising the Train set Visualizing the Test set
The implemention of KernelPCA

from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components = 2, kernel = 'rbf')  #radial base function
X_train = kpca.fit_transform(X_train)
X_test = kpca.transform(X_test)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
KernelPCA		KernelPCA
LDA		LDA
PCA		PCA
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Dimensionality_Reduction_Techniques`

About

Releases

Packages

Languages

kuta-ndze/Dimensionality_Reduction_Techniques

Folders and files

Latest commit

History

Repository files navigation

Dimensionality_Reduction_Techniques

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Dimensionality_Reduction_Techniques`

Packages