-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-k-means-clust.Rmd
77 lines (50 loc) · 1.14 KB
/
05-k-means-clust.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "K Means Clustering"
author: "Kundan K. Rao"
date: "01/12/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Clustering
## K-means clustering
deciding optimal number of clusters
```{r}
library(factoextra)
fviz_nbclust(data.scaled, kmeans, method = "wss")
```
Observation :-
We can see elbow at k = 3.
Modified plot :-
```{r}
fviz_nbclust(data.scaled, kmeans, method = "wss")+
geom_vline(xintercept = 3, linetype = 2)
```
```{r}
# Compute k-means with k = 3
set.seed(123)
km.res <- kmeans(data.scaled, 3, nstart = 25)
print(km.res)
```
```{r}
aggregate(data, by=list(cluster=km.res$cluster), mean)
```
```{r}
data.clust.kmeans <- cbind(data, cluster = km.res$cluster)
head(data.clust.kmeans)
```
Visualising k-means:-
```{r}
fviz_cluster(km.res, data = data.scaled,
palette = c( "#00AFBB", "#E7B800", "#FC4E07"),
ellipse.type = "euclid", # Concentration ellipse
star.plot = TRUE, # Add segments from centroids to items
repel = TRUE, # Avoid label overplotting (slow)
ggtheme = theme_minimal()
)
```
```{r}
par(mfrow=c(1,2))
plot(data,col=(km.out$cluster+1),pch=20,cex=2)
```