-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
137 lines (100 loc) · 3.47 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# `wflsa`: Weighted Fused LASSO Signal Approximator
The **wflsa** R package provides an efficient implementation of an algorithm for solving the Weighted Fused LASSO Signal Approximator problem. This algorithm is based on an ADMM (Alternating Direction Method of Multipliers) approach and is designed to estimate a vector of coefficients with sparsity and smoothness constraints.
## Installation
You can install the package from GitHub using `pak`:
```r
# install.packages("pak")
pak::pak("bips-hb/wflsa")
```
## Problem Formulation
The problem solved by the wFLSA algorithm is formulated as follows:
$$
\hat{\beta} = \underset{\beta}{\arg\min} \left( \frac{1}{2} \| y - \beta \|_2^2 + \lambda_1 \| \beta \|_1 + \lambda_2 \sum_{i < j} w_{ij} | \beta_i - \beta_j | \right)
$$
Where:
- $y$ is the response variable with mean 0.
- $\beta$ is the vector of coefficients to be estimated.
- $|| \cdot ||_1$ and $|| \cdot ||_2$ are the $L_1$- and $L_2$-norms, respectively.
- $\lambda_1 > 0$ is the regularization parameter controlling the strength of the sparsity penalty.
- $\lambda_2 > 0$ is the regularization parameter controlling the smoothness.
- $w_{ij} \in [0,1]$ is the weight between the $i$-th and $j$-th coefficient.
## Example Use
A simple example:
```{r}
library(wflsa)
set.seed(1)
# number of covariates
p <- 10
# the response vector
y <- rnorm(p)
# Fully connected (Weight matrix is 1)
W <- matrix(rep(1, p*p), ncol = p) - diag(p)
# lambda values:
lambda1 <- c(0.01, 0.1, 0.2)
lambda2 <- c(0.01, 0.1, 0.2)
# Solve the weighted Fused LASSO Signal Approximator
wflsa(y, W, lambda1, lambda2)
```
A more involved exaples:
```{r}
library(flsa)
library(ggplot2)
# number of parameters. NOTE: is fixed here
p <- 100
# the average
mu <- as.vector(sapply(1:10 * 10, function(mu) rep(mu, 10)))
mu <- mu - mean(mu)
# generating the data
y <- rnorm(100, mean = mu)
# band_matrix function creates a square matrix with a band around the diagonal.
# Parameters:
# - p: Size of the square matrix.
# Usage: band_matrix(p)
# Default band width is set to 1, but it can be adjusted as needed.
band_matrix <- function(p) {
band_width <- 1 # Adjust the band width as needed
# Create a matrix with a band around the diagonal
my_matrix <- matrix(0, nrow = p, ncol = p)
for (i in 1:p) {
lower <- max(1, i - band_width)
upper <- min(p, i + band_width)
my_matrix[i, lower:upper] <- 1
}
diag(my_matrix) <- 0 # Set diagonal elements to 0 (optional, as they are already 1)
my_matrix
}
# creating the weight matrix commonly used for the 1-dimensional fused lasso signal approximator
W <- band_matrix(p)
fit <- wflsa(y, W, lambda1 = 1, lambda2 = 4)
fit_flsa <- as.vector(flsa(y, lambda1 = .1, lambda2 = 2))
data <- data.frame(
index = 1:p,
mu = mu,
y = y,
beta_fit = fit$betas[[1]],
beta_fit_flsa = fit_flsa
)
ggplot2::ggplot(data) +
geom_point(mapping = aes(index, y), color = 'blue') +
geom_point(mapping = aes(index, beta_fit), color = 'red') +
geom_point(mapping = aes(index, mu), color = 'purple') +
ylab("value") +
xlab("variable") +
ggtitle("Raw data (blue) and fitted values (red)")
```
## Contact
Louis Dijkstra
Leibniz Institute for Prevention Research & Epidemiology
E-mail: dijkstra (at) leibniz-bips.de