Skip to content

Commit

Permalink
Merge pull request #104 from b-rodrigues/with-nix-vignette
Browse files Browse the repository at this point in the history
add Nix-to-Nix example using subshell per subfolder
  • Loading branch information
philipp-baumann authored Jan 21, 2024
2 parents 54d05b9 + 1748aff commit 2105208
Show file tree
Hide file tree
Showing 2 changed files with 200 additions and 43 deletions.
119 changes: 97 additions & 22 deletions dev/running_r_or_shell_code_in_nix_from_r.Rmd
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: "Running R or shell code in Nix from R"
title: "Running R or Shell Code in Nix from R"
output: html_document
editor_options:
chunk_output_type: console
---

## **Testing Code in Evolving Software Dependency Environments with Confidence**
## **Testing code in evolving software dependency environments with confidence**

Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.

## **Two Operational Modes of Computations in Environments: 'System-to-Nix' and 'Nix-to-Nix'**
## **Two operational modes of computations in environments: 'System-to-Nix' and 'Nix-to-Nix'**

We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)

Expand All @@ -26,46 +26,61 @@ Carefully curated software improves over time, so does R. We pick an example fro

- "`as.vector()` gains a `data.frame` method which returns a simple named list, also clearing a long standing 'FIXME' to enable `as.vector(<data.frame>, mode ="list")`. This breaks code relying on `as.vector(<data.frame>)` to return the unchanged data frame."

The goal is to illustrate this change in behavior before and after R version 4.2.0.
The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.

### Setting up the software environment with Nix
### Setting up the (R) software environment with Nix

We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). Startup code written to this local `.Rprofile` will make sure that the system's user library (R_LIBS_USER) is excluded from library paths to load packages from. This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. rix::init() writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.
We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). Startup code written to this local `.Rprofile` will make sure that the system's user library (`R_LIBS_USER`) is excluded from library paths to load packages from. This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. `rix::init()` writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.

```{r, eval=FALSE}
```{r}
library("rix")
path_env_1 <- file.path(".", "_env_1_R-4-2-0")
path_env_1 <- file.path(".", "_env_1_R-4-1-3")
init(
project_path = path_env_1,
rprofile_action = "overwrite",
message_type = "simple"
)
list.files(path = path_env_1, all.files = TRUE)
```

This will generate the following `.Rprofile` file.

```{r, echo=FALSE}
cat(readLines(file.path(path_env_1, ".Rprofile")), sep = "\n")
```

Next, we write a `default.nix` file containing Nix expressions that pin R version 4.2.0 from Nixpkgs.

```{r, eval=FALSE}
```{r}
rix(
r_ver = "4.2.0",
r_ver = "4.1.3",
overwrite = TRUE,
project_path = path_env_1
)
```

The following expression is written to default.nix in the subfolder `./_env_1_R-4-1-3/`.

```{r, echo=FALSE}
cat(readLines(file.path(path_env_1, "default.nix")), sep = "\n")
```

### Defining and interactively testing custom R code with function(s)

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./_env_1_R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.
We know have set up the configuration for R 4.1.3 set up in a `default.nix` file in the folder `./_env_1_R-4-1-3`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.

```{r, eval=FALSE}
```{r}
df <- data.frame(a = 1:3, b = 4:6)
(out <- as.vector(x = df, mode ="list"))
as.vector(x = df, mode ="list")
```

This is is different for R versions 4.1.3 and below, where you should get an identical data frame back.

### Run functioned up code and investigate results produced in pure Nix R software environments

To formally validate in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above.
To formally validate in a 'System-to-Nix' approach that the object returned from `as.vector.data.frame()` is before `R` \< 4.2.0, we define a function that runs the computation above.

```{r, eval=FALSE}
```{r}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
Expand All @@ -81,7 +96,7 @@ Then, we will evaluate this test code through a `nix-shell` R session. This adds

3. **Serialization of Dependent R objects:** Saving them to disk and deserializing them back into the R session's RAM via a temporary folder. This process establishes isolation between two distinct computational environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix' computational modes. Simultaneously, it facilitates the transfer of input arguments, dependencies across the call stack, and outputs of `expr` between the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages.
This approach guarantees reproducible side effects and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages. Please be aware that `with_nix()` will invoke `nix-shell`, which will itself run `nix-build` in case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix store. This will take a bit of time to get the cache. When you use the `exec_mode == "non-blocking"` argument of `with_nix()`, you will see in your current R console the specific Nix paths that will be downloaded and copied into your Nix store automatically.

```{r, eval=FALSE}
# now run it in `nix-shell`; `with_nix()` takes care
Expand All @@ -97,13 +112,15 @@ out_nix_1 <- with_nix(
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `TRUE` if your system's R versions in
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```

As an alternative to wrap your final function with input arguments that produces the results in `function()` or `function(){}`, you can also provide default arguments when assigning the function used as `expr` input like this:
### Syntax option for specifying function in `expr` argument of `with_nix()`

```{r, eval=FALSE}
In the previous code snippet we wrapped the top-level `expr` function with `function()` or `function(){}`. As an alternative, you can also provide default arguments when assigning the function used as `expr` input like this:

```{r}
df_as_vector <- function(x = df) {
out <- as.vector(x = x, mode = "list")
return(out)
Expand All @@ -113,8 +130,8 @@ df_as_vector <- function(x = df) {
Then, you just supply the name of the function to evaluate with default arguments.

```{r, eval=FALSE}
out_nix_1_2 <- with_nix(
expr = function() df_as_vector, # provide name of function
out_nix_1_b <- with_nix(
expr = df_as_vector, # provide name of function
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
Expand All @@ -125,7 +142,65 @@ out_nix_1_2 <- with_nix(
It yields the same results.

```{r, eval=FALSE}
Reduce(f = identical, list(out_system_1, out_nix_1, out_nix_1_2))
Reduce(f = identical, list(out_nix_1, out_nix_1_b))
```

### Comparing `as.vector.data.frame()` for both R versions 4.1.3 and 4.2.0 from Nixpkgs

Here follows an example a `Nix-to-Nix` solution, with two subshells to track the evolution of base R in this specific case. We can verify the breaking changes in case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0 from Nixpkgs. Since we already have defined R 4.1.3 in the *`env`*`_1_R-4-1-3` subshell, we can use it as a source environment where with_nix() is launched from. Accordingly, we define the R 4.2.0 environment in a *`env`*`_1_2_R-4-2-0`using Nix via `rix::rix()`. The latter environment will be the target environment where `df_as_vector()` will be evaluated in.

```{r}
library("rix")
path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0")
init(
project_path = path_env_1_2,
rprofile_action = "overwrite",
message_type = "simple"
)
rix(
r_ver = "4.2.0",
overwrite = TRUE,
project_path = path_env_1_2,
shell_hook = "R"
)
list.files(path_env_1_2)
```

Now, initiate a new R session as development environment using `nix-shell`. Open a new terminal at the current working directory of your R session. The provided expression `default.nix`. defines R 4.1.3 in a "subfolder per subshell" approach. `nix-shell` will use the expression by `default.nix` and prefer it over any other `.nix` files, except when you put a `shell.nix` file in that folder, which takes precedence.

```{sh, eval=FALSE}
nix-shell --pure ./_env_1_R-4-1-3
```

After some time downloading caches and doing builds, you will enter an R console session with R 4.1.3. You did not need to type in R first, because we set up a R shell hook via `rix::rix()`. Next, we define again the target function to test in R 4.2.0, too.

```{r, eval=FALSE}
# current Nix-R session with R 4.1.3
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_nix_1 <- df_as_vector(x = df))
```

```{r, eval=FALSE}
out_nix_1_2 <- with_nix(
expr = function() df_as_vector(x = df),
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1_2,
message_type = "simple" # you can do `"verbose"`, too
)
```

You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.

```{r, eval=FALSE}
identical(out_nix_1, out_nix_1_2)
# yields FALSE
```

## **Case study 2: Breaking changes in {stringr} 1.5.0**
Expand Down
Loading

0 comments on commit 2105208

Please sign in to comment.