K-Means is a popular unsupervised machine learning algorithm used for clustering data into groups based on their features. It partitions the data into ( k ) distinct clusters, where each data point belongs to the cluster with the nearest mean.
To run the K-Means clustering algorithm, make sure you have the following libraries installed:
pip install numpy pandas matplotlib scikit-learn
You can use the K-Means clustering algorithm on any dataset. Here's a simple example:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample data
data = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
# Create K-Means model
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
# Predict cluster labels
labels = kmeans.predict(data)
# Plot the clusters
plt.scatter(data[:, 0], data[:, 1], c=labels, s=100, cmap='viridis')
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
- Initialization: Randomly select ( k ) initial centroids.
- Assignment Step: Assign each data point to the nearest centroid.
- Update Step: Calculate the mean of the assigned points and update the centroids.
- Repeat: Repeat the assignment and update steps until convergence (when the centroids no longer change significantly).
K-Means clustering is a powerful tool for data analysis and can be applied to various problems, from customer segmentation to image compression. Experiment with different datasets to see how K-Means performs in various scenarios!