Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Bruno Rodrigues committed Jul 11, 2024
1 parent 30d4cd9 commit f85fb62
Show file tree
Hide file tree
Showing 2 changed files with 197 additions and 74 deletions.
133 changes: 76 additions & 57 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -231,69 +231,86 @@ You can also try out Nix inside Docker. To know more, read

### Docker and renv

Let's start with arguably the most popular combo for reproducibility in the R ecosystem,
Docker+renv.

{renv} snapshots the state of the library of R packages for a project, nothing more, nothing
less. It can then be used to restore the library of packages on another machine, but it is
the user's responsibility to ensure that the right version of R and system-level dependencies
are available on that other machine. This is whay {renv} is often coupled with a versioned
Docker image, such as the images from the [Rocker project](https://hub.docker.com/r/rocker/r-ver).
Combining both provides a very robust way to serve applications such as Shiny apps, but it can
be awkward to develop interactively with this setup, which is way most of the time, people work
on their current setup, and *dockerize* the setup right when they're done. However, you need
to make sure to keep updating the image, as the underlying operating system will eventually
reach end of life. Eventually, you might even have to update the whole stack as it could become
impossible to install the version of R and R packages you used on a recent Docker image.
This can be a good thing actually; it could be the opportunity to update your app and make sure
that it benefits from the latest security patches. However for reproducibility in research,
this is not something that you should be doing because it could have an impact on historical
results.

What we suggest instead, is to keep using Docker if you are already invested in the ecosystem,
and continue to use it to deploy and serve applications and archive research. But instead
of using {renv} to get the right packages, you combine Docker and Nix. This way, you have
a nice separation of concerns: Docker will only be used as a platter to serve code, while the
environment will be handled by Nix. You could even use an image that gets continuously updated
such as `ubuntu:latest` as a base: it doesn’t matter that the image is always changing, since
the environment that will be doing the heavy lifting inside the container
is completely reproducible thanks to Nix.

Exactly the same reasoning can be applied to {groundhog}, {rang} or the CRAN snapshots of Posit
in combination to Docker.
Let's start with arguably the most popular combo for reproducibility in the R
ecosystem, Docker+`{renv}` (it is also possible to add `{rspm}` or `{bspm}` in
combination to `{renv}` which will install the required system-level
dependencies automatically).

{renv} snapshots the state of the library of R packages for a project, nothing
more, nothing less. It can then be used to restore the library of packages on
another machine, but it is the user's responsibility to ensure that the right
version of R and system-level dependencies are available on that other machine.
This is whay `{renv}` is often coupled with a versioned Docker image, such as the
images from the [Rocker project](https://hub.docker.com/r/rocker/r-ver).
Combining both provides a very robust way to serve applications such as Shiny
apps, but it can be awkward to develop interactively with this setup, which is
why most of the time, people work on their current setup, and *dockerize* the
setup once when they're done. However, you need to make sure to keep updating
the image, as the underlying operating system will eventually reach end of life.
Eventually, you might even have to update the whole stack as it could become
impossible to install the version of R and R packages you used on a recent
Docker image. This can be a good thing actually; it could be the opportunity to
update your app and make sure that it benefits from the latest security patches.
However for reproducibility in research, this is not something that you should
be doing because it could have an impact on historical results.

What we suggest instead, is to keep using Docker if you are already invested in
the ecosystem, and continue to use it to deploy and serve applications and
archive research. But instead of using `{renv}` to get the right packages, you
combine Docker and Nix. This way, you have a nice separation of concerns: Docker
will only be used as a platter to serve code, while the environment will be
handled by Nix. You could even use an image that gets continuously updated such
as `ubuntu:latest` as a base: it doesn’t matter that the image is always
changing, since the environment that will be doing the heavy lifting inside the
container is completely reproducible thanks to Nix.

Exactly the same reasoning can be applied to `{groundhog}`, `{rang}` or the CRAN
snapshots of Posit in combination to Docker instead of `{renv}`.

### Ana/Mini-conda and Mamba

Anaconda, Miniconda, Mamba, Micromamba... (henceforth we'll refer to these as Conda)
and Nix have much in common: they are multiplatform package managers and both can be used
to setup reproducible development environments for many languages, such as R or Python.
Using [conda-lock](https://github.com/conda/conda-lock) one can generate fully reproducible
lock files that can then be used by Conda to build the environment as defined in the lock file.
The mean difference between Conda and Nix is conceptual and might not seem that important
for end-users: Conda is a procedural package manager, while Nix is a functional package manager.
In practice this means that environments managed by Conda are mutable and users are not prevented
from changing their environment interactively, and then re-generate the lock file. This can lead to
issues where dependency management might get borked. In the case of Nix on the other hand,
environments are immutable: you cannot add software into a running Nix environment. You will
need to stop working, re-define the environment, rebuild it and then use it. While this might
sound more tedious (it is) it forces users to work more "cleanly" and avoids many issues from
dynamically changing an environment. Another major difference is that Conda does not include
the entirety of CRAN nor Bioconductor, which is the case for Nix. According to
[Anaconda's Documentation](https://docs.anaconda.com/working-with-conda/packages/using-r-language/)
6000 CRAN packages are available through Conda (as of writing in July 2024, CRAN has 21'000+ packages).
Nix also includes almost all of Bioconductor packages, and Conda includes them trough the Bioconda
project, however, we were not able to find if Bioconda contains all of Bioconductor. According to
Bioconda's FAQ,
[Bioconductor data packages are not included.](https://bioconda.github.io/faqs.html#why-are-bioconductor-data-packages-failing-to-install)
Anaconda, Miniconda, Mamba, Micromamba... (henceforth we'll refer to these as
Conda) and Nix have much in common: they are multiplatform package managers and
both can be used to setup reproducible development environments for many
languages, such as R or Python. Using
[conda-lock](https://github.com/conda/conda-lock) one can generate fully
reproducible lock files that can then be used by Conda to build the environment
as defined in the lock file. The main difference between Conda and Nix is
conceptual and might not seem that important for end-users: Conda is a
procedural package manager, while Nix is a functional package manager. In
practice this means that environments managed by Conda are mutable and users are
not prevented from changing their environment interactively, and then
re-generate the lock file. This is quite comfortable when working interactively,
but can lead to issues where dependency management might get borked.

In the case of Nix however, environments are immutable: you cannot add software
into a running Nix environment. You will need to stop working, re-define the
environment, rebuild it and then use it. While this might sound more tedious (it
is) it forces users to work more "cleanly" and avoids many issues from
dynamically changing an environment. If it is not possible to build that
environment, it fails as early as possible and forces you to deal with the
issue. A mutating environment could lead you into a false sense of safeness.

Another major difference is that Conda does not include the entirety of CRAN nor
Bioconductor, which is the case for Nix. According to [Anaconda's
Documentation](https://docs.anaconda.com/working-with-conda/packages/using-r-language/)
6000 CRAN packages are available through Conda (as of writing in July 2024, CRAN
has 21'000+ packages). Nix also includes almost all of Bioconductor packages,
and Conda includes them trough the Bioconda project, however, we were not able
to find if Bioconda contains all of Bioconductor. According to Bioconda's FAQ,
[Bioconductor data packages are not
included.](https://bioconda.github.io/faqs.html#why-are-bioconductor-data-packages-failing-to-install)

### How is Nix different from Guix?

Just like Nix, Guix is a functional package manager with a focus on reproducible builds.
We won't go into technical differences/similarities, but only to pratical ones for end-users of the R programming
language. If you want to know about technical aspects, read this
[https://news.ycombinator.com/item?id=18910683](Hackernews post by one of the authors of Guix).
The mean shortcoming of Guix for R users is that not all CRAN or Bioconductor packages are included,
nor is Guix available on Windows or macOS.
Just like Nix, Guix is a functional package manager with a focus on reproducible
builds. We won't go into technical differences/similarities, but only to
pratical ones for end-users of the R programming language. If you want to know
about technical aspects, read this
[https://news.ycombinator.com/item?id=18910683](Hackernews post by one of the
authors of Guix). The main shortcoming of Guix for R users is that not all CRAN
or Bioconductor packages are included, nor is Guix available on Windows or
macOS.

## Contributing

Expand All @@ -312,6 +329,8 @@ Lackerbauer](https://github.com/ciil),
[MrTarantoga](https://github.com/MrTarantoga) and every other person from the
[Matrix Nixpkgs R channel](https://matrix.to/#/#r:nixos.org)).

Finally, thanks to [David Solito](https://x.com/dsolito) for creating `{rix}`'s logo!

## Recommended reading

- [NixOS’s website](https://nixos.org/)
Expand Down
Loading

0 comments on commit f85fb62

Please sign in to comment.