Skip to content

Commit

Permalink
formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
Bruno Rodrigues committed Jan 22, 2024
1 parent adeb5b1 commit 9895155
Show file tree
Hide file tree
Showing 2 changed files with 254 additions and 62 deletions.
158 changes: 127 additions & 31 deletions dev/running_r_or_shell_code_in_nix_from_r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,77 @@ editor_options:

## **Testing code in evolving software dependency environments with confidence**

Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.
Adhering to sound versioning practices is crucial for ensuring the
reproducibility of software. Despite the expertise in software engineering, the
ever-growing complexity and continuous development of new, potentially
disruptive features present significant challenges in maintaining code
functionality over time. This pertains not only to backward compatibility but
also to future-proofing. When code handles critical production loads and relies
on numerous external software libraries, it's likely that these dependencies
will evolve. Infrastructure-as-code and other DevOps principles shine in
addressing these challenges. However, they may appear less approachable and more
labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a
different environment with isolated software builds that are both pure at build
and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell
commands with command line interfaces provided by Nixpkgs in a Nix environment,
and thereby bring the read-eval-print-loop feeling. Not only can you evaluate
custom R functions or shell commands in Nix environments, but you can also bring
the results back to your current R session as R objects.

## **Two operational modes of computations in environments: 'System-to-Nix' and 'Nix-to-Nix'**

We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)

1. **'System-to-Nix'** environments: We assume that you launch an R session with an R version defined on your host operating system, either from the terminal or an integrated development environment like RStudio. You need to make sure that you actively control and know where you installed R and R packages from, and at what versions. You may have interactively tested that your custom function pipeline worked for the current setup. Most importantly, you want to check whether you get your computations running and achieve identical results when going back to a Nix revision that represent either newer or also older versions of R and package sources.
2. **'Nix-to-Nix'** environments: Your goals of testing code are the same as in 1., but you want more fine-grained control in the source environment where you launch `with_nix()` from, too. You are probably on the way of getting a passionate Nix user.
We aim to accommodate various use cases, considering a gradient of declarativity
in individual or sets of software environments based on personal preferences.
There are two main modes for defining and comparing code running through R and
system commands (command line interfaces; CLIs)

1. **'System-to-Nix'** environments: We assume that you launch an R session
with an R version defined on your host operating system, either from the
terminal or an integrated development environment like RStudio. You need to
make sure that you actively control and know where you installed R and R
packages from, and at what versions. You may have interactively tested that
your custom function pipeline worked for the current setup. Most
importantly, you want to check whether you get your computations running and
achieve identical results when going back to a Nix revision that represent
either newer or also older versions of R and package sources.
2. **'Nix-to-Nix'** environments: Your goals of testing code are the same as in
1., but you want more fine-grained control in the source environment where
you launch `with_nix()` from, too. You are probably on the way of getting a
passionate Nix user.

## **Case study 1: Evolution of base R**

Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following [literal entry in R 4.2.0](https://cran.r-project.org/doc/manuals/r-release/NEWS.html):
Carefully curated software improves over time, so does R. We pick an example
from the R changelog, the following [literal entry in R
4.2.0](https://cran.r-project.org/doc/manuals/r-release/NEWS.html):

- "`as.vector()` gains a `data.frame` method which returns a simple named list, also clearing a long standing 'FIXME' to enable `as.vector(<data.frame>, mode ="list")`. This breaks code relying on `as.vector(<data.frame>)` to return the unchanged data frame."
- "`as.vector()` gains a `data.frame` method which returns a simple named
list, also clearing a long standing 'FIXME' to enable
`as.vector(<data.frame>, mode ="list")`. This breaks code relying on
`as.vector(<data.frame>)` to return the unchanged data frame."

The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.
The goal is to illustrate this change in behavior from R versions 4.1.3 and
before to R versions 4.2.0 and later.

### Setting up the (R) software environment with Nix

We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). Startup code written to this local `.Rprofile` will make sure that the system's user library (`R_LIBS_USER`) is excluded from library paths to load packages from. This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. `rix::init()` writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.
We first create a isolated directory to prepare for a Nix environment, and write
a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs
includes the user library at first position (returned by `.libPaths()`). Startup
code written to this local `.Rprofile` will make sure that the system's user
library (`R_LIBS_USER`) is excluded from library paths to load packages from.
This is nice to install packages from a Nix-R session environment in ad-hoc and
interactive manner. However, this comes at the cost that one needs be aware of
potential run-time pollution of packages outside the pool of paths per package
from the nix store. On macOS, we experienced a high-chance of segmentation
faults when accidentally loading packages and linked system libraries from the
system's user library, to give an example. `rix::init()` writes a configuration
that takes care of runtime-pure R package libraries from declaratively defined
Nix builds. Additionally, it modifies `.libPaths()` in the running R session.

```{r}
library("rix")
Expand All @@ -49,7 +96,8 @@ This will generate the following `.Rprofile` file.
cat(readLines(file.path(path_env_1, ".Rprofile")), sep = "\n")
```

Next, we write a `default.nix` file containing Nix expressions that pin R version 4.2.0 from Nixpkgs.
Next, we write a `default.nix` file containing Nix expressions that pin R
version 4.2.0 from Nixpkgs.

```{r}
rix(
Expand All @@ -59,26 +107,33 @@ rix(
)
```

The following expression is written to default.nix in the subfolder `./_env_1_R-4-1-3/`.
The following expression is written to default.nix in the subfolder
`./_env_1_R-4-1-3/`.

```{r, echo=FALSE}
cat(readLines(file.path(path_env_1, "default.nix")), sep = "\n")
```

### Defining and interactively testing custom R code with function(s)

We know have set up the configuration for R 4.1.3 set up in a `default.nix` file in the folder `./_env_1_R-4-1-3`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.
We know have set up the configuration for R 4.1.3 set up in a `default.nix` file
in the folder `./_env_1_R-4-1-3`. Since you are sure you are using an R version
higher 4.2.0 available on your system, you can check what that
`as.vector.data.frame()` S3 method returns a list.

```{r}
df <- data.frame(a = 1:3, b = 4:6)
as.vector(x = df, mode ="list")
```

This is is different for R versions 4.1.3 and below, where you should get an identical data frame back.
This is is different for R versions 4.1.3 and below, where you should get an
identical data frame back.

### Run functioned up code and investigate results produced in pure Nix R software environments

To formally validate in a 'System-to-Nix' approach that the object returned from `as.vector.data.frame()` is before `R` \< 4.2.0, we define a function that runs the computation above.
To formally validate in a 'System-to-Nix' approach that the object returned from
`as.vector.data.frame()` is before `R` \< 4.2.0, we define a function that runs
the computation above.

```{r}
df_as_vector <- function(x) {
Expand All @@ -88,15 +143,34 @@ df_as_vector <- function(x) {
(out_system_1 <- df_as_vector(x = df))
```

Then, we will evaluate this test code through a `nix-shell` R session. This adds both build-time and run-time purity with the declarative Nix software configuration we have made earlier. `with_nix()` leverages the following principles under the hood:
Then, we will evaluate this test code through a `nix-shell` R session. This adds
both build-time and run-time purity with the declarative Nix software
configuration we have made earlier. `with_nix()` leverages the following
principles under the hood:

1. **Computing on the Language:** Manipulating language objects using code.

2. **Static Code Analysis:** Detecting global objects and package environments in the function call stack of 'expr'. This involves utilizing essential functionality from the 'codetools' package, which is recursively iterated.

3. **Serialization of Dependent R objects:** Saving them to disk and deserializing them back into the R session's RAM via a temporary folder. This process establishes isolation between two distinct computational environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix' computational modes. Simultaneously, it facilitates the transfer of input arguments, dependencies across the call stack, and outputs of `expr` between the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages. Please be aware that `with_nix()` will invoke `nix-shell`, which will itself run `nix-build` in case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix store. This will take a bit of time to get the cache. When you use the `exec_mode == "non-blocking"` argument of `with_nix()`, you will see in your current R console the specific Nix paths that will be downloaded and copied into your Nix store automatically.
2. **Static Code Analysis:** Detecting global objects and package environments
in the function call stack of 'expr'. This involves utilizing essential
functionality from the 'codetools' package, which is recursively iterated.

3. **Serialization of Dependent R objects:** Saving them to disk and
deserializing them back into the R session's RAM via a temporary folder.
This process establishes isolation between two distinct computational
environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix'
computational modes. Simultaneously, it facilitates the transfer of input
arguments, dependencies across the call stack, and outputs of `expr` between
the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects and effectively streams
messages and errors into the R session. Thereby, the {sys} package facilitates
capturing standard outputs and errors as text output messages. Please be aware
that `with_nix()` will invoke `nix-shell`, which will itself run `nix-build` in
case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix
store. This will take a bit of time to get the cache. When you use the
`exec_mode == "non-blocking"` argument of `with_nix()`, you will see in your
current R console the specific Nix paths that will be downloaded and copied into
your Nix store automatically.

```{r, eval=FALSE}
# now run it in `nix-shell`; `with_nix()` takes care
Expand All @@ -112,13 +186,16 @@ out_nix_1 <- with_nix(
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```

### Syntax option for specifying function in `expr` argument of `with_nix()`

In the previous code snippet we wrapped the top-level `expr` function with `function()` or `function(){}`. As an alternative, you can also provide default arguments when assigning the function used as `expr` input like this:
In the previous code snippet we wrapped the top-level `expr` function with
`function()` or `function(){}`. As an alternative, you can also provide default
arguments when assigning the function used as `expr` input like this:

```{r}
df_as_vector <- function(x = df) {
Expand All @@ -127,7 +204,8 @@ df_as_vector <- function(x = df) {
}
```

Then, you just supply the name of the function to evaluate with default arguments.
Then, you just supply the name of the function to evaluate with default
arguments.

```{r, eval=FALSE}
out_nix_1_b <- with_nix(
Expand All @@ -147,7 +225,14 @@ Reduce(f = identical, list(out_nix_1, out_nix_1_b))

### Comparing `as.vector.data.frame()` for both R versions 4.1.3 and 4.2.0 from Nixpkgs

Here follows an example a `Nix-to-Nix` solution, with two subshells to track the evolution of base R in this specific case. We can verify the breaking changes in case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0 from Nixpkgs. Since we already have defined R 4.1.3 in the *`env`*`_1_R-4-1-3` subshell, we can use it as a source environment where with_nix() is launched from. Accordingly, we define the R 4.2.0 environment in a *`env`*`_1_2_R-4-2-0`using Nix via `rix::rix()`. The latter environment will be the target environment where `df_as_vector()` will be evaluated in.
Here follows an example a `Nix-to-Nix` solution, with two subshells to track the
evolution of base R in this specific case. We can verify the breaking changes in
case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0
from Nixpkgs. Since we already have defined R 4.1.3 in the *`env`*`_1_R-4-1-3`
subshell, we can use it as a source environment where with_nix() is launched
from. Accordingly, we define the R 4.2.0 environment in a
*`env`*`_1_2_R-4-2-0`using Nix via `rix::rix()`. The latter environment will be
the target environment where `df_as_vector()` will be evaluated in.

```{r}
library("rix")
Expand All @@ -169,13 +254,21 @@ rix(
list.files(path_env_1_2)
```

Now, initiate a new R session as development environment using `nix-shell`. Open a new terminal at the current working directory of your R session. The provided expression `default.nix`. defines R 4.1.3 in a "subfolder per subshell" approach. `nix-shell` will use the expression by `default.nix` and prefer it over any other `.nix` files, except when you put a `shell.nix` file in that folder, which takes precedence.
Now, initiate a new R session as development environment using `nix-shell`. Open
a new terminal at the current working directory of your R session. The provided
expression `default.nix`. defines R 4.1.3 in a "subfolder per subshell"
approach. `nix-shell` will use the expression by `default.nix` and prefer it
over any other `.nix` files, except when you put a `shell.nix` file in that
folder, which takes precedence.

```{sh, eval=FALSE}
nix-shell --pure ./_env_1_R-4-1-3
```

After some time downloading caches and doing builds, you will enter an R console session with R 4.1.3. You did not need to type in R first, because we set up a R shell hook via `rix::rix()`. Next, we define again the target function to test in R 4.2.0, too.
After some time downloading caches and doing builds, you will enter an R console
session with R 4.1.3. You did not need to type in R first, because we set up a R
shell hook via `rix::rix()`. Next, we define again the target function to test
in R 4.2.0, too.

```{r, eval=FALSE}
# current Nix-R session with R 4.1.3
Expand All @@ -196,7 +289,8 @@ out_nix_1_2 <- with_nix(
)
```

You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.
You can now formally compare the outputs of the computation of the same code in
R 4.1.3 vs. R 4.2.0 environments controlled by Nix.

```{r, eval=FALSE}
identical(out_nix_1, out_nix_1_2)
Expand All @@ -205,4 +299,6 @@ identical(out_nix_1, out_nix_1_2)

## **Case study 2: Breaking changes in {stringr} 1.5.0**

We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community.
We add one more layer to the reproducibility of the R ecosystem. User libraries
from CRAN or GitHub, one thing that makes R shine is the huge collection of
software packages available from the community.
Loading

0 comments on commit 9895155

Please sign in to comment.