Skip to content

Using alternate database backends for Cache

Alex Chubaty edited this page Nov 28, 2022 · 1 revision

Cache database backends

By default, caching relies on a sqlite database for it's backend. While this works in many situations, there are some important limitations of using sqlite for caching, including 1) speed; 2) concurrent transactions; 3) sharing database across machines or projects. Fortunately, Cache makes use of DBI package and thus supports several database backends, including mysql and postgresql.

Storing database credentials

Be careful not to save database credentials in version control. We recommend per-project credentials saved either in config files or environment variables, described below.

See https://solutions.rstudio.com/db/best-practices/managing-credentials/ for other important best practices.

Using config files

Add the following to your project's config.yml file:

default:
  cachedb:
    driver: 'postgres' 
    server: 'localhost'
    user: 'mydbuser'
    password: 'mysecurepassword'  
    port: 5432
    database: 'mydatabase'

Then, in your main project script, add the following:

cachedb <- config::get("cachedb")

conn <- DBI::dbConnect(drv = RPostgres::Postgres(),
                      host = cachedb$server,
                      port = cachedb$port,
                      dbname = cachedb$database,
                      user = cachedb$user,
                      password = cachedb$password)

options("reproducible.conn" = conn) # sets the default connection globally

PostgreSQL

Using environment variables

Add the following to your project's .Renviron file:

PGHOST="localhost"
PGPORT=5432
PGDATABASE="mydatabase"
PGUSER="mydbuser"
PGPASSWORD="mysecurepassword"

Then, in your main project script, add the following:

readRenviron(".Renviron") ## alternatively, use global ~/.Renviron

conn <-   DBI::dbConnect(drv = RPostgres::Postgres(),
                         host = Sys.getenv("PGHOST"),
                         port = Sys.getenv("PGPORT"),
                         dbname = Sys.getenv("PGDATABASE"),
                         user = Sys.getenv("PGUSER"),
                         password = Sys.getenv("PGPASSWORD"))

options("reproducible.conn" = conn) # sets the default connection globally

NOTE: database connections are not serializable, which means that they cannot be passed to e.g., spawned child processes if running in parallel. A new database connection will need to be established in each of the child processes using the DBI::dbConnect() call and setting the reproducible.conn option per above.