This project delves into clustering, a core unsupervised learning technique, to group data points based on inherent similarities. Various clustering algorithms are applied to datasets to identify patterns and meaningful structures without labeled data.
-
Clustering Techniques
- K-Means Clustering: Partitions data into a predefined number of clusters using centroid minimization.
- Hierarchical Clustering: Forms a dendrogram to group similar points hierarchically.
- DBSCAN (Density-Based Spatial Clustering): Groups points based on density and separates noise.
-
Data Preprocessing
- Handled missing values and normalized data to improve clustering accuracy.
-
Evaluation Metrics
- Silhouette scores to assess cluster quality.
- Visualization techniques (e.g., scatter plots) to understand cluster distribution.
UnsupervisedLearning.py
- The Jupyter Notebook containing:
- Data preprocessing steps.
- Implementation of clustering algorithms.
- Evaluation and visualizations.
- The Jupyter Notebook containing:
- K-Means: Effective for well-separated, spherical clusters but requires specifying the number of clusters.
- Hierarchical Clustering: Useful for visualizing relationships between clusters but computationally expensive for large datasets.
- DBSCAN: Handles noisy data well and discovers clusters of arbitrary shapes.
Install required libraries with:
pip install numpy pandas matplotlib scikit-learn scipy