README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# `wflsa`: Weighted Fused LASSO Signal Approximator

The **wflsa** R package provides an efficient implementation of an algorithm for solving the Weighted Fused LASSO Signal Approximator problem. This algorithm is based on an ADMM (Alternating Direction Method of Multipliers) approach and is designed to estimate a vector of coefficients with sparsity and smoothness constraints.


## Installation

You can install the package from GitHub using `pak`:

```r
# install.packages("pak")
pak::pak("bips-hb/wflsa")
```

## Problem Formulation

The problem solved by the wFLSA algorithm is formulated as follows:

$$
\hat{\beta} = \underset{\beta}{\arg\min} \left( \frac{1}{2} \| y - \beta \|_2^2 + \lambda_1 \| \beta \|_1 + \lambda_2 \sum_{i < j} w_{ij} | \beta_i - \beta_j | \right) 
$$

Where:
- $y$ is the response variable with mean 0.
- $\beta$ is the vector of coefficients to be estimated.
- $|| \cdot ||_1$ and $|| \cdot ||_2$ are the $L_1$- and $L_2$-norms, respectively.
- $\lambda_1 > 0$ is the regularization parameter controlling the strength of the sparsity penalty.
- $\lambda_2 > 0$ is the regularization parameter controlling the smoothness.
- $w_{ij} \in [0,1]$ is the weight between the $i$-th and $j$-th coefficient.

## Example Use

A simple example: 

```{r}
library(wflsa)

set.seed(1)

# number of covariates
p <- 10

# the response vector
y <- rnorm(p)

# Fully connected (Weight matrix is 1)
W <- matrix(rep(1, p*p), ncol = p) - diag(p)

# lambda values:
lambda1 <- c(0.01, 0.1, 0.2)
lambda2 <- c(0.01, 0.1, 0.2)

# Solve the weighted Fused LASSO Signal Approximator
wflsa(y, W, lambda1, lambda2)
```

A more involved exaples:

```{r}
library(flsa)
library(ggplot2)

# number of parameters. NOTE: is fixed here
p <- 100

# the average
mu <- as.vector(sapply(1:10 * 10, function(mu) rep(mu, 10)))
mu <- mu - mean(mu)

# generating the data
y <- rnorm(100, mean = mu)

# band_matrix function creates a square matrix with a band around the diagonal.
# Parameters:
#   - p: Size of the square matrix.
# Usage: band_matrix(p)
# Default band width is set to 1, but it can be adjusted as needed.

band_matrix <- function(p) {
  band_width <- 1  # Adjust the band width as needed
  
  # Create a matrix with a band around the diagonal
  my_matrix <- matrix(0, nrow = p, ncol = p)
  for (i in 1:p) {
    lower <- max(1, i - band_width)
    upper <- min(p, i + band_width)
    my_matrix[i, lower:upper] <- 1
    
  }
  diag(my_matrix) <- 0  # Set diagonal elements to 0 (optional, as they are already 1)
  my_matrix
}

# creating the weight matrix commonly used for the 1-dimensional fused lasso signal approximator
W <- band_matrix(p)

fit <- wflsa(y, W, lambda1 = 1, lambda2 = 4)
fit_flsa <- as.vector(flsa(y, lambda1 = .1, lambda2 = 2))

data <- data.frame(
  index = 1:p, 
  mu = mu,
  y = y, 
  beta_fit = fit$betas[[1]], 
  beta_fit_flsa = fit_flsa
)

ggplot2::ggplot(data) + 
  geom_point(mapping = aes(index, y), color = 'blue') + 
  geom_point(mapping = aes(index, beta_fit), color = 'red') + 
  geom_point(mapping = aes(index, mu), color = 'purple') + 
  ylab("value") + 
  xlab("variable") + 
  ggtitle("Raw data (blue) and fitted values (red)")
```

## Contact

Louis Dijkstra

Leibniz Institute for Prevention Research & Epidemiology
E-mail: dijkstra (at) leibniz-bips.de