-
Notifications
You must be signed in to change notification settings - Fork 0
/
CA_MCA.Rmd
executable file
·112 lines (72 loc) · 2.9 KB
/
CA_MCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
title: "Correspondence analysis"
author: "liuc"
date: "11/9/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Correspondence analysis(CA)
对应分析,以及去势对应分析等常用于分类数据,当研究多个分类变量的关系时,用卡方检验或对数优势线性模型难以直观简单的
给出各变量间的关系。而对应分析可以给出直观的结果解释。它把R型因子分析和Q型因子分析结合起来,以少数几个公共因子的综合指标去描述研究对象在空间上的联系。
CA作为PCA的一个扩展,其结果展示也和PCA类似。
CA有时会产生弓形效应,DCA可用以消除这个问题。
Correspondence Analysis is a multivariate statistical technique used to explore relationships between categorical variables. It is often applied to contingency tables, where the rows and columns represent different categories, and the cell values represent the frequencies or counts.
FactoMineR 包可以进行CA,MCA,DCA等计算.
```{r, include=FALSE}
library(medicaldata)
library(FactoMineR)
```
### 对应分析的一个小事例
```{r}
# 以列联表数据为例子
# http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/113-ca-correspondence-analysis-in-r-essentials/
data(housetasks, package = 'factoextra')
housetasks
# 本数据集可以看作x多个变量和y多个变量组成的列联表
gplots::balloonplot(t(as.table(as.matrix(housetasks))),
xlab ="", ylab="",
label = FALSE, show.margins = FALSE)
```
```{r}
# housetasks 明明是一个类似列联表的计数数据,其和table返回的数据不一样吗?
# vcd::mosaic(~ Wife + Alternating, data = housetasks)
foo <- housetasks %>% rownames_to_column() %>%
pivot_longer(cols = -rowname)
xtabs(value ~ rowname + name, data = foo) |> vcd::mosaic()
# or
as.matrix(housetasks) %>% array(dim = dim(.), dimnames = dimnames(.)) %>%
as.table() |> vcd::mosaic()
```
```{r}
res.ca <- FactoMineR::CA(housetasks, graph = TRUE)
# 会有chi square的统计结果
res.ca # has a lot of slots
```
```{r}
# Eigenvalue
# eig.val = factoextra::get_eigenvalue(res.ca)
eig.val <- res.ca$eig
eig.val
```
```{r}
chisq <- chisq.test(housetasks)
chisq
```
_结果解读_: 对应分析用于多个变量间关系的分析,类似于卡方检验。散点图中处于同一个象限的为关系较为紧密的变量。
*MCA* multiple correspondence analysis
对于多个多变量之间的关系。
```{r}
data(poison)
poison.active <- poison[1:55, 5:15]
poison.active
```
```{r}
res.ca2 <- FactoMineR::MCA(poison.active, graph = TRUE)
factoextra::fviz_screeplot(res.ca2, addlabels = TRUE, ylim = c(0, 40))
# extract the results for variable categories.
var <- get_mca_var(res.mca)
# Correlation between variables and principal dimensions
```
### Canonical Correspondence Analysis