Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update vignette: "Running R or Shell Code in Nix from R" #104

Merged
merged 3 commits into from
Jan 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 97 additions & 22 deletions dev/running_r_or_shell_code_in_nix_from_r.Rmd
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: "Running R or shell code in Nix from R"
title: "Running R or Shell Code in Nix from R"
output: html_document
editor_options:
chunk_output_type: console
---

## **Testing Code in Evolving Software Dependency Environments with Confidence**
## **Testing code in evolving software dependency environments with confidence**

Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.

## **Two Operational Modes of Computations in Environments: 'System-to-Nix' and 'Nix-to-Nix'**
## **Two operational modes of computations in environments: 'System-to-Nix' and 'Nix-to-Nix'**

We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)

Expand All @@ -26,46 +26,61 @@ Carefully curated software improves over time, so does R. We pick an example fro

- "`as.vector()` gains a `data.frame` method which returns a simple named list, also clearing a long standing 'FIXME' to enable `as.vector(<data.frame>, mode ="list")`. This breaks code relying on `as.vector(<data.frame>)` to return the unchanged data frame."

The goal is to illustrate this change in behavior before and after R version 4.2.0.
The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.

### Setting up the software environment with Nix
### Setting up the (R) software environment with Nix

We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). Startup code written to this local `.Rprofile` will make sure that the system's user library (R_LIBS_USER) is excluded from library paths to load packages from. This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. rix::init() writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.
We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). Startup code written to this local `.Rprofile` will make sure that the system's user library (`R_LIBS_USER`) is excluded from library paths to load packages from. This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. `rix::init()` writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.

```{r, eval=FALSE}
```{r}
library("rix")
path_env_1 <- file.path(".", "_env_1_R-4-2-0")
path_env_1 <- file.path(".", "_env_1_R-4-1-3")
init(
project_path = path_env_1,
rprofile_action = "overwrite",
message_type = "simple"
)
list.files(path = path_env_1, all.files = TRUE)
```

This will generate the following `.Rprofile` file.

```{r, echo=FALSE}
cat(readLines(file.path(path_env_1, ".Rprofile")), sep = "\n")
```

Next, we write a `default.nix` file containing Nix expressions that pin R version 4.2.0 from Nixpkgs.

```{r, eval=FALSE}
```{r}
rix(
r_ver = "4.2.0",
r_ver = "4.1.3",
overwrite = TRUE,
project_path = path_env_1
)
```

The following expression is written to default.nix in the subfolder `./_env_1_R-4-1-3/`.

```{r, echo=FALSE}
cat(readLines(file.path(path_env_1, "default.nix")), sep = "\n")
```

### Defining and interactively testing custom R code with function(s)

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./_env_1_R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.
We know have set up the configuration for R 4.1.3 set up in a `default.nix` file in the folder `./_env_1_R-4-1-3`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.

```{r, eval=FALSE}
```{r}
df <- data.frame(a = 1:3, b = 4:6)
(out <- as.vector(x = df, mode ="list"))
as.vector(x = df, mode ="list")
```

This is is different for R versions 4.1.3 and below, where you should get an identical data frame back.

### Run functioned up code and investigate results produced in pure Nix R software environments

To formally validate in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above.
To formally validate in a 'System-to-Nix' approach that the object returned from `as.vector.data.frame()` is before `R` \< 4.2.0, we define a function that runs the computation above.

```{r, eval=FALSE}
```{r}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
Expand All @@ -81,7 +96,7 @@ Then, we will evaluate this test code through a `nix-shell` R session. This adds

3. **Serialization of Dependent R objects:** Saving them to disk and deserializing them back into the R session's RAM via a temporary folder. This process establishes isolation between two distinct computational environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix' computational modes. Simultaneously, it facilitates the transfer of input arguments, dependencies across the call stack, and outputs of `expr` between the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages.
This approach guarantees reproducible side effects and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages. Please be aware that `with_nix()` will invoke `nix-shell`, which will itself run `nix-build` in case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix store. This will take a bit of time to get the cache. When you use the `exec_mode == "non-blocking"` argument of `with_nix()`, you will see in your current R console the specific Nix paths that will be downloaded and copied into your Nix store automatically.

```{r, eval=FALSE}
# now run it in `nix-shell`; `with_nix()` takes care
Expand All @@ -97,13 +112,15 @@ out_nix_1 <- with_nix(
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `TRUE` if your system's R versions in
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```

As an alternative to wrap your final function with input arguments that produces the results in `function()` or `function(){}`, you can also provide default arguments when assigning the function used as `expr` input like this:
### Syntax option for specifying function in `expr` argument of `with_nix()`

```{r, eval=FALSE}
In the previous code snippet we wrapped the top-level `expr` function with `function()` or `function(){}`. As an alternative, you can also provide default arguments when assigning the function used as `expr` input like this:

```{r}
df_as_vector <- function(x = df) {
out <- as.vector(x = x, mode = "list")
return(out)
Expand All @@ -113,8 +130,8 @@ df_as_vector <- function(x = df) {
Then, you just supply the name of the function to evaluate with default arguments.

```{r, eval=FALSE}
out_nix_1_2 <- with_nix(
expr = function() df_as_vector, # provide name of function
out_nix_1_b <- with_nix(
expr = df_as_vector, # provide name of function
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
Expand All @@ -125,7 +142,65 @@ out_nix_1_2 <- with_nix(
It yields the same results.

```{r, eval=FALSE}
Reduce(f = identical, list(out_system_1, out_nix_1, out_nix_1_2))
Reduce(f = identical, list(out_nix_1, out_nix_1_b))
```

### Comparing `as.vector.data.frame()` for both R versions 4.1.3 and 4.2.0 from Nixpkgs

Here follows an example a `Nix-to-Nix` solution, with two subshells to track the evolution of base R in this specific case. We can verify the breaking changes in case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0 from Nixpkgs. Since we already have defined R 4.1.3 in the *`env`*`_1_R-4-1-3` subshell, we can use it as a source environment where with_nix() is launched from. Accordingly, we define the R 4.2.0 environment in a *`env`*`_1_2_R-4-2-0`using Nix via `rix::rix()`. The latter environment will be the target environment where `df_as_vector()` will be evaluated in.

```{r}
library("rix")
path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0")

init(
project_path = path_env_1_2,
rprofile_action = "overwrite",
message_type = "simple"
)

rix(
r_ver = "4.2.0",
overwrite = TRUE,
project_path = path_env_1_2,
shell_hook = "R"
)

list.files(path_env_1_2)
```

Now, initiate a new R session as development environment using `nix-shell`. Open a new terminal at the current working directory of your R session. The provided expression `default.nix`. defines R 4.1.3 in a "subfolder per subshell" approach. `nix-shell` will use the expression by `default.nix` and prefer it over any other `.nix` files, except when you put a `shell.nix` file in that folder, which takes precedence.

```{sh, eval=FALSE}
nix-shell --pure ./_env_1_R-4-1-3
```

After some time downloading caches and doing builds, you will enter an R console session with R 4.1.3. You did not need to type in R first, because we set up a R shell hook via `rix::rix()`. Next, we define again the target function to test in R 4.2.0, too.

```{r, eval=FALSE}
# current Nix-R session with R 4.1.3
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_nix_1 <- df_as_vector(x = df))
```

```{r, eval=FALSE}
out_nix_1_2 <- with_nix(
expr = function() df_as_vector(x = df),
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1_2,
message_type = "simple" # you can do `"verbose"`, too
)
```

You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.

```{r, eval=FALSE}
identical(out_nix_1, out_nix_1_2)
# yields FALSE
```

## **Case study 2: Breaking changes in {stringr} 1.5.0**
Expand Down
Loading