Skip to content

Commit

Permalink
fix conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
philipp-baumann committed Jan 19, 2024
1 parent 0cb1086 commit 18d4a98
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 20 deletions.
60 changes: 50 additions & 10 deletions dev/running_r_or_shell_code_in_nix_from_r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ editor_options:
chunk_output_type: console
---

## Testing Code in Evolving Software Dependency Environments with Confidence
## **Testing Code in Evolving Software Dependency Environments with Confidence**

Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive for the average R developer.
Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console? Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.
Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.

## **Two Operational Modes of Computations in Environments: 'System-to-Nix' and 'Nix-to-Nix'**

Expand All @@ -18,15 +20,15 @@ We aim to accommodate various use cases, considering a gradient of declarativity
1. **'System-to-Nix'** environments: We assume that you launch an R session with an R version defined on your host operating system, either from the terminal or an integrated development environment like RStudio. You need to make sure that you actively control and know where you installed R and R packages from, and at what versions. You may have interactively tested that your custom function pipeline worked for the current setup. Most importantly, you want to check whether you get your computations running and achieve identical results when going back to a Nix revision that represent either newer or also older versions of R and package sources.
2. **'Nix-to-Nix'** environments: Your goals of testing code are the same as in 1., but you want more fine-grained control in the source environment where you launch \`with_nix()\` from, too. You are probably on the way of getting a passionate Nix user.

## Case study 1: Evolution of base R
## **Case study 1: Evolution of base R**

Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following [literal entry in R 4.2.0](https://cran.r-project.org/doc/manuals/r-release/NEWS.html):

- "`as.vector()` gains a `data.frame` method which returns a simple named list, also clearing a long standing 'FIXME' to enable `as.vector(<data.frame>, mode="list")`. This breaks code relying on `as.vector(<data.frame>)` to return the unchanged data frame."

The goal is to illustrate this change in behavior before and after R version 4.2.0.

## Setting up the software environment
### Setting up the software environment with Nix

We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. Startup code written to this local `.Rprofile` will make sure that the system's user library (R_LIBS_USER) is excluded from library paths to load packages from. The R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. rix::init() writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.

Expand All @@ -50,22 +52,60 @@ rix(
)
```

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./env-1R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.
### Defining and interactively testing custom R code with function(s)

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./_env_1_R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.

```{r, eval=FALSE}
df <- data.frame(a = 1:3, b = 4:6)
(out <- as.vector(x = df, mode ="list"))
```

To formally confirm in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above. Then, we will evaluate it through a `nix-shell` R session. This adds both build-time and run-time purity with the declarative Nix software configuration we have made above. Leveraging computing on the language, static code analysis, detecting and serializing as well as deserializing global objects of a function `expr` (save R objects), and system command execution via `Rscript` in a `nix-shell` environment we can achieve perfect isolation. At the same time, we can shuffle and input arguments of `expr` , its call stack and as well its outputs between the Nix-R and the system's R sessions.
### Run functioned up code and investigate results produced in pure Nix R software environments

To formally validate in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above.

```{r}
```{r, eval=FALSE}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_system <- as.vector(x = df, mode ="list"))
```

Then, we will evaluate this test code through a `nix-shell` R session. This adds both build-time and run-time purity with the declarative Nix software configuration we have made above. with_nix leverages the following principles under the hood:

- **Computing on the language**: manipulate language objects using code

- **Static code analysis**: detect global objects and package environments in the function call stack of `expr`. For that, important {codetools} functionality is used, which is recursively iterated.

- **Serializing** dependent **R objects** (save them to disk) and **deserializing** (read them back into RAM of R session) via a temporary folder. This creates isolation of two distinct computational environments, for both 'System-to-Nix' and 'Nix-to-Nix' computational modes. At the same time this allows to shuffle and input arguments of `expr` , dependencies across the call stack and as well its outputs back-and-forth between the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects, and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages.

```{r, eval=FALSE}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
# now run it in `nix-shell`; `with_nix()` takes care
# of exporting global objects of `df_as_vector` recursively
out_nix <- with_nix(
expr = function() df_as_vector(x = df), # wrap to avoid evaluation
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system, out_nix)
# should return `TRUE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```

## Case study 2: Breaking changes in {stringr} 1.5.0
## **Case study 2: Breaking changes in {stringr} 1.5.0**

We add one more layer to the reproducibility of R. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community. Despite
We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community. Despite
61 changes: 51 additions & 10 deletions vignettes/running-r-or-shell-code-in-nix-from-r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,13 @@ library(rix)

<!-- WARNING - This vignette is generated by {fusen} from dev/running_r_or_shell_code_in_nix_from_r.Rmd: do not edit by hand -->

## Testing Code in Evolving Software Dependency Environments with Confidence
## **Testing Code in Evolving Software Dependency Environments with Confidence**

Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive for the average R developer.
Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.

Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console? Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.
Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?

Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell commands with command line interfaces provided by Nixpkgs in a Nix environment, and thereby bring the read-eval-print-loop feeling. Not only can you evaluate custom R functions or shell commands in Nix environments, but you can also bring the results back to your current R session as R objects.


## **Two Operational Modes of Computations in Environments: 'System-to-Nix' and 'Nix-to-Nix'**
Expand All @@ -35,7 +37,7 @@ We aim to accommodate various use cases, considering a gradient of declarativity
2. **'Nix-to-Nix'** environments: Your goals of testing code are the same as in 1., but you want more fine-grained control in the source environment where you launch \`with_nix()\` from, too. You are probably on the way of getting a passionate Nix user.


## Case study 1: Evolution of base R
## **Case study 1: Evolution of base R**

Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following [literal entry in R 4.2.0](https://cran.r-project.org/doc/manuals/r-release/NEWS.html):

Expand All @@ -44,7 +46,7 @@ Carefully curated software improves over time, so does R. We pick an example fro
The goal is to illustrate this change in behavior before and after R version 4.2.0.


## Setting up the software environment
### Setting up the software environment with Nix

We first create a isolated directory to prepare for a Nix environment, and write a custom `.Rprofile` file as well. Startup code written to this local `.Rprofile` will make sure that the system's user library (R_LIBS_USER) is excluded from library paths to load packages from. The R derivation in Nixpkgs includes the user library at first position (returned by `.libPaths()`). This is nice to install packages from a Nix-R session environment in ad-hoc and interactive manner. However, this comes at the cost that one needs be aware of potential run-time pollution of packages outside the pool of paths per package from the nix store. On macOS, we experienced a high-chance of segmentation faults when accidentally loading packages and linked system libraries from the system's user library, to give an example. rix::init() writes a configuration that takes care of runtime-pure R package libraries from declaratively defined Nix builds. Additionally, it modifies `.libPaths()` in the running R session.

Expand All @@ -70,25 +72,64 @@ rix(
)
```

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./env-1R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.
### Defining and interactively testing custom R code with function(s)

We know have set up the configuration for R 4.2.0 set up in a `default.nix` file in the folder `./_env_1_R-4-2-0`. Since you are sure you are using an R version higher 4.2.0 available on your system, you can check what that `as.vector.data.frame()` S3 method returns a list.


```{r eval = FALSE}
df <- data.frame(a = 1:3, b = 4:6)
(out <- as.vector(x = df, mode ="list"))
```

To formally confirm in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above. Then, we will evaluate it through a `nix-shell` R session. This adds both build-time and run-time purity with the declarative Nix software configuration we have made above. Leveraging computing on the language, static code analysis, detecting and serializing as well as deserializing global objects of a function `expr` (save R objects), and system command execution via `Rscript` in a `nix-shell` environment we can achieve perfect isolation. At the same time, we can shuffle and input arguments of `expr` , its call stack and as well its outputs between the Nix-R and the system's R sessions.
### Run functioned up code and investigate results produced in pure Nix R software environments

To formally validate in a 'System-to-Nix' approach that the `out` object is identical since `R` \>= 4.2.0, we define a function that runs the computation above.


```{r eval = FALSE}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_system <- as.vector(x = df, mode ="list"))
```

Then, we will evaluate this test code through a `nix-shell` R session. This adds both build-time and run-time purity with the declarative Nix software configuration we have made above. with_nix leverages the following principles under the hood:

- **Computing on the language**: manipulate language objects using code

- **Static code analysis**: detect global objects and package environments in the function call stack of `expr`. For that, important {codetools} functionality is used, which is recursively iterated.

- **Serializing** dependent **R objects** (save them to disk) and **deserializing** (read them back into RAM of R session) via a temporary folder. This creates isolation of two distinct computational environments, for both 'System-to-Nix' and 'Nix-to-Nix' computational modes. At the same time this allows to shuffle and input arguments of `expr` , dependencies across the call stack and as well its outputs back-and-forth between the Nix-R and the system's R sessions.

This approach guarantees reproducible side effects, and effectively streams messages and errors into the R session. Thereby, the {sys} package facilitates capturing standard outputs and errors as text output messages.

```{r}

```{r eval = FALSE}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
# now run it in `nix-shell`; `with_nix()` takes care
# of exporting global objects of `df_as_vector` recursively
out_nix <- with_nix(
expr = function() df_as_vector(x = df), # wrap to avoid evaluation
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system, out_nix)
# should return `TRUE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```

## Case study 2: Breaking changes in {stringr} 1.5.0
## **Case study 2: Breaking changes in {stringr} 1.5.0**

We add one more layer to the reproducibility of R. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community. Despite
We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community. Despite

0 comments on commit 18d4a98

Please sign in to comment.