Skip to content

Commit

Permalink
Merge pull request #700 from njtierney/add-installation-vignette
Browse files Browse the repository at this point in the history
Adds installation vignette / more information on installing dependencies in greta
  • Loading branch information
njtierney authored Aug 21, 2024
2 parents f31a0db + d481c9f commit 845746e
Show file tree
Hide file tree
Showing 9 changed files with 155 additions and 54 deletions.
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ export(nelder_mead)
export(newton_cg)
export(normal)
export(ones)
export(open_greta_install_log)
export(opt)
export(ordered_variable)
export(pareto)
Expand All @@ -244,7 +245,6 @@ export(powell)
export(proximal_adagrad)
export(proximal_gradient_descent)
export(rdist)
export(read_greta_install_log)
export(reinstall_greta_deps)
export(reinstall_greta_env)
export(reinstall_miniconda)
Expand Down
4 changes: 2 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ This release provides a few improvements to installation in greta. It should now
* Added checking suite to ensure you are using valid versions of TF, TFP, and Python(#666)
* Added data `greta_deps_tf_tfp` (#666), which contains valid versions combinations of TF, TFP, and Python.
* remove `greta_nodes_install/conda_*()` options as #493 makes them defunct.
* Added option to write to a single logfile with `greta_set_install_logfile()`, and `write_greta_install_log()`, and `read_greta_install_log()` (#493)
* Added option to write to a single logfile with `greta_set_install_logfile()`, and `write_greta_install_log()`, and `open_greta_install_log()` (#493)
* Added `destroy_greta_deps()` function to remove miniconda and python conda environment
* Improved `write_greta_install_log()` and `read_greta_install_log()` to use `tools::R_user_dir()` to always write to a file location. `read_greta_install_log()` will open one found from an environment variable or go to the default location. (#703)
* Improved `write_greta_install_log()` and `open_greta_install_log()` to use `tools::R_user_dir()` to always write to a file location. `open_greta_install_log()` will open one found from an environment variable or go to the default location. (#703)

## Minor

Expand Down
2 changes: 1 addition & 1 deletion R/install_greta_deps.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#' `tools::R_user_dir("greta")` as the directory to save a logfile named
#' "greta-installation-logfile.html". To see installation notes or errors,
#' after installation you can open the logfile with
#' [read_greta_install_log()], or you can navigate to the logfile and open
#' [open_greta_install_log()], or you can navigate to the logfile and open
#' it in a browser.
#'
#' @param deps object created with [greta_deps_spec()] where you
Expand Down
4 changes: 2 additions & 2 deletions R/write-logfiles.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ write_greta_install_log <- function(path = greta_logfile) {
)

cli::cli_progress_step(
msg = "Open with: {.run read_greta_install_log()}"
msg = "Open with: {.run open_greta_install_log()}"
)

template <- '
Expand Down Expand Up @@ -152,7 +152,7 @@ sys_get_env <- function(envvar){
#'
#' @return opens a URL in your default browser
#' @export
read_greta_install_log <- function(){
open_greta_install_log <- function(){

greta_logfile <- sys_get_env("GRETA_INSTALLATION_LOG")

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ devtools::install_github("greta-dev/greta")

The `install_greta_deps()` function helps install the Python dependencies (Google's [TensorFlow](https://www.tensorflow.org/) and [tensorflow-probability](https://github.com/tensorflow/probability)).

By default, `install_greta_deps()` installs versions TF 2.15.0, and TFP version 0.23.0, using python 3.10. To change the versions of TF, TFP, or python that you want to use, you specify the `python_deps` argument of `install_greta_deps()`, which used `greta_python_deps()`. See `?install_greta_deps()` or `?greta_python_deps()` for more information.
By default, `install_greta_deps()` installs versions TF 2.15.0, and TFP version 0.23.0, using python 3.10. To change the versions of TF, TFP, or python that you want to use, you specify the `deps` argument of `install_greta_deps()`, which used `greta_deps_spec()`. See `?install_greta_deps()` or `?greta_deps_spec()` for more information.

This helper function, `install_greta_deps()`, installs the exact pythons package versions needed. It also places these inside a conda environment, "greta-env-tf2". This isolates these exact python modules from other python installations, so that only `greta` will see them. This helps avoids installation issues, where previously you might update tensorflow on your computer and overwrite the current version needed by `greta`. Using this "greta-env-tf2" conda environment means installing other python packages should not be impact the Python packages needed by `greta`.

Expand All @@ -51,7 +51,7 @@ If these python modules aren't yet installed, when `greta` is used, it provides
<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/greta-dev/greta/branch/master/graph/badge.svg)](https://app.codecov.io/gh/greta-dev/greta?branch=master)
[![R-CMD-check](https://github.com/greta-dev/greta/workflows/R-CMD-check/badge.svg)](https://github.com/greta-dev/greta/actions)
[![cran version](http://www.r-pkg.org/badges/version/greta)](https://CRAN.R-project.org/package=greta)
[![cran-version](http://www.r-pkg.org/badges/version/greta)](https://CRAN.R-project.org/package=greta)
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/license/apache-2-0)
[![doi](https://zenodo.org/badge/73758247.svg)](https://zenodo.org/badge/latestdoi/73758247)
[![joss](https://joss.theoj.org/papers/10.21105/joss.01601/status.svg)](https://joss.theoj.org/papers/10.21105/joss.01601)
Expand Down
2 changes: 1 addition & 1 deletion man/install_greta_deps.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 2 additions & 42 deletions vignettes/get_started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -50,52 +50,12 @@ library(greta)

### Helper functions to install TensorFlow

Before you can fit models with `greta`, you will also need to have a working installation of Google's [TensorFlow](https://www.tensorflow.org/) python package (version 1.14.0) and the [tensorflow-probability](https://github.com/tensorflow/probability) python package (version 0.7.0). In the future we will support different versions of Tensorflow and Tensorflow Probability, but currently we need these exact versions.
Before you can fit models with `greta`, you will also need to have a working installation of Google's [TensorFlow](https://www.tensorflow.org/) python package (version >= 2.0.0) and the [tensorflow-probability](https://github.com/tensorflow/probability) python package (version >= 0.8.0).

To assist with installing these Python packages, `greta` provides an installation helper, `install_greta_deps()`, which installs the exact pythons package versions needed. It also places these inside a "greta-env" conda environment. This isolates these exact python modules from other python installations, so that only `greta` will see them. This helps avoids installation issues, where previously you might update tensorflow on your computer and overwrite the current version needed by `greta`. Using this "greta-env" conda environment means installing other python packages should not be impact the Python packages needed by `greta`.

If these python modules aren't yet installed, when `greta` is used, it provides instructions on how to install them for your system. If in doubt follow those.
If these python modules aren't yet installed when `greta` is used, it suggests to use `install_greta_deps()` to install the dependencies. We recommend using this function to install dependencies. For more detail on installation, see the vignette "installation".

<!-- If you want `greta` to run as fast as possible on your computer's CPUs, it would be worth installing python and TensorFlow using Anaconda since they will be automatically configured to use Intel's MKL routines, which provide a 2-8 fold sampling speedup on most models. -->

#### Standard installation

If the previous installation helper did not work, you can try the following:

```{r install_tensorflow, eval = FALSE}
reticulate::install_miniconda()
reticulate::conda_create(
envname = "greta-env",
python_version = "3.7"
)
reticulate::conda_install(
envname = "greta-env",
packages = c(
"numpy==1.16.4",
"tensorflow-probability==0.7.0",
"tensorflow==1.14.0"
)
)
```

Which will install the python modules into a conda environment named "greta-env".

You can also not install these not into a special conda environment "greta-env",
like so:

```{r install-deps-plain, eval = FALSE}
reticulate::install_miniconda()
reticulate::conda_install(
packages = c(
"numpy==1.16.4",
"tensorflow-probability==0.7.0",
"tensorflow==1.14.0"
)
)
```

<!-- You can also use `install_tensorflow()` to install different versions of TensorFlow, including versions with GPU acceleration. If you're having trouble with this step, [this guide](https://tensorflow.rstudio.com/installation/) may help. -->

<hr>

### DiagrammeR
Expand Down
141 changes: 141 additions & 0 deletions vignettes/installation.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
title: "Installing Dependencies"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{installation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{r setup}
library(greta)
```

# Why we need to install dependencies

The greta package uses Google's [TensorFlow (TF)](https://www.tensorflow.org/) and [Tensorflow Probability (TFP)](https://github.com/tensorflow/probability)) under the hood to do efficient, fast, and scalable linear algebra and MCMC. TF and TFP are python packages, and so are required to be installed. This is different to how normal dependencies work with R packages, where the dependencies are automagically built and managed by CRAN.

Unfortunately, there isn't an automatic, reliable way to ensure that these are provided along when you install greta, so we need to take an additional step to install them. We have tried very hard to make the process as easy as possible by providing a helper function, `install_greta_deps()`.

# How to install python dependencies using `install_greta_deps()`

We recommend running:

```{r}
#| eval: FALSE
install_greta_deps()
```

And then following any prompts to install dependencies. You will then need to restart R and load `library(greta)` to start using greta.

# How `install_greta_deps()` works

The `install_greta_deps()` function installs the Python dependencies TF and TFP.
By default it installs versions TF 2.15.0, and TFP version 0.23.0. It places these inside a conda environment, "greta-env-tf2". For the default settings, this is python 3.10. Using a conda environment isolates these exact python modules from other python installations, so only `greta` will see them.

We do this as it helps avoids installation issues, where previously you might update TF on your computer and overwrite the current version needed by `greta`. Using this "greta-env-tf2" conda environment means installing other python packages should not be impact the Python packages needed by `greta`. It is part of the recommended way to [manage python dependencies in an R package](https://rstudio.github.io/reticulate/articles/python_dependencies.html) as recommended by the team at Posit.

## Using different versions of TF, TFP, and Python

The `install_greta_deps()` function takes three arguments:

1. `deps`: Specify dependencies with `greta_deps_spec()`
2. `timeout`: time in minutes to wait in installation before failing/exiting
3. `restart`: whether to restart R ("force" - restart R, "no", will not restart, "ask" (default) - ask the user)

You specify the version of TF TFP, or python that you want to use with `greta_deps_spec()`, which has arguments:

- `tf_version`
- `tfp_version`
- `python_version`

If you specify versions of TF/TFP/Python that are not compatible with each other, it will error before starting installation. We determined the appropriate versions of Python, TF, and TFP from https://www.tensorflow.org/install/source#tested_build_configurations and https://www.tensorflow.org/install/source_windows#tested_build_configurations, and by inspecting TFP release notes. We put this information together into a dataset, `greta_deps_tf_tfp`. You can inspect this with `View(greta_deps_tf_tfp)`.

If you provide an invalid installation versions, it will error and suggest some alternative installation versions.

## How we install dependencies

For users who want to know more about the installation process of dependencies in greta.

We create a separate R instance using [`callr`](https://callr.r-lib.org/index.html) to install python dependencies using `reticulate` to talk to Python, and the R package `tensorflow`, for installing the tensorflow python module. We use `callr` so that we can ensure the installation of python dependencies happens in a clean R session that doesn't have python or reticulate already loaded. It also means that we can hide the large amounts of text output to the console that happens when installation is running - these are written a logfile during installation that you can read with `open_greta_install_log()`.

If miniconda isn't installed, we install miniconda. You can think of miniconda as a lightweight version of python with minimal dependencies.

If "greta-tf2-env" isn't found, then we create a new conda environment named "greta-tf2-env", for a version of python that works with the specified versions of TF and TFP.

Then we install the TF and TFP python modules, using the versions specified in `greta_deps_spec()`.

After installation, we ask users if they want to restart R. This only happens in interactive sessions, and only if the user is in RStudio. This is to avoid potential issues where this script might be used in batch scripts online.

## Troubleshooting installation

Installation doesn't always go to plan. Here are some approaches to getting your dependencies working.

- Check you have restarted R after installing dependencies
- After you have installed dependencies with `install_greta_deps()`, you will be prompted to restart R. To use greta you must restart R after installing dependencies as this allows greta to connect to the installed python dependencies.

- Use `greta_sitrep()` to check dependencies.
- `greta_sitrep()` will provide information about your installed version of Python, TF, TFP, and whether a conda environment is used. This can be helpful to troubleshoot some installation issues.

- Check the installation logfile
- During installation we write a logfile, which records all of the steps taken during installation. This can sometimes provide useful clues as to what might have gone awry during installation. You can open the logfile with `open_greta_install_log()`, which opens the logfile in a browser window, and scroll through it to try and find errors or things that went wrong during installation. We recommend viewing this with `open_greta_install_log()` and then searching with Ctrl/Cmd+F for things like "error/Error/ERROR/warn/etc" to find problems. There might not be a clear solution to the problem, but the logfile might provide clues to the problem that you can share on a forum or issue on the greta github.

- Reinstall greta dependencies with `reinstall_greta_deps()`
- Sometimes we just need to "turn it off and on again". Use `reinstall_greta_deps()` to remove miniconda, and the greta conda environment, and install them again.

- Manually remove python installation
- You can manually remove python installation by doing:
- `remove_greta_env()`
- `remove_miniconda()`
- or `destroy_greta_deps()`, which does both of these steps.
- Then install the dependences with: `install_greta_deps()`
- Note that this is functionally what `reinstall_greta_deps()` does, but sometimes it can be useful to separate them out into separate steps.

- Check internet access
- Installing these dependencies requires an internet connection, and sometimes the internet service provider (perhaps IT?) blocks websites like conda from downloading. In the past we have encountered this issue and have found that it can be avoided by doing re-installation with `reinstall_greta_deps()`.

If the previous installation helper did not work, you can try the following:

```{r install_tensorflow, eval = FALSE}
reticulate::install_miniconda()
reticulate::conda_create(
envname = "greta-env-tf2",
python_version = "3.10"
)
reticulate::conda_install(
envname = "greta-env-tf2",
packages = c(
"tensorflow-probability==0.23.0",
"tensorflow==2.15.0"
)
)
```

Which will install the python modules into a conda environment named "greta-env-tf2".

You can also not install these not into a special conda environment like so:

```{r install-deps-plain, eval = FALSE}
reticulate::install_miniconda()
reticulate::conda_install(
packages = c(
"tensorflow-probability==0.23.0",
"tensorflow==2.15.0"
)
)
```

<!-- You can also use `install_tensorflow()` to install different versions of TensorFlow, including versions with GPU acceleration. If you're having trouble with this step, [this guide](https://tensorflow.rstudio.com/installation/) may help. -->

<hr>

<!-- You can also use `install_tensorflow()` to install different versions of TensorFlow, including versions with GPU acceleration. If you're having trouble with this step, [this guide](https://tensorflow.rstudio.com/installation/) may help. -->

<!-- If you want `greta` to run as fast as possible on your computer's CPUs, it would be worth installing python and TensorFlow using Anaconda since they will be automatically configured to use Intel's MKL routines, which provide a 2-8 fold sampling speedup on most models. -->

0 comments on commit 845746e

Please sign in to comment.