Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: Inline anotations and command line #286

Open
jrosell opened this issue Aug 30, 2024 · 14 comments
Open

Feature suggestion: Inline anotations and command line #286

jrosell opened this issue Aug 30, 2024 · 14 comments
Labels
enhancement New feature or request

Comments

@jrosell
Copy link
Contributor

jrosell commented Aug 30, 2024

I tried the {rix} package today and I think about two features that could make it more awesome for R development.

Let me give an example here.

data-visualize.R file

library(here)
library(dplyr)
library(tidyr)
library(ggplot2)
library(palmerpenguins)
library(ggthemes)
library(R.devices)

str(penguins)
p <-
  penguins |> 
  drop_na() |> 
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()
suppressGraphics(ggsave(filename = 'penguin-plot.png', plot = p))
if (interactive()) {
  utils::browseURL(here('penguin-plot.png'))
}

rix file

if [ "$#" -eq 0 ]
then
  echo "Please, provide the file to run as the first argument should be the file to run."
  echo "For example: bash rix \$(pwd)/data-visualize.R"
  exit 1
fi
FILE_TO_RUN="$1"
CODE_TO_RUN=`cat $FILE_TO_RUN`

nix-shell \
 	--expr "$(Rscript -e 'rix::rix(r_ver = '\"'4.3.3'\"', r_pkgs = c('\"'here'\"','\"'ggplot2'\"', '\"'dplyr'\"', '\"'tidyr'\"', '\"'palmerpenguins'\"', '\"'ggthemes'\"', '\"'R.devices'\"'), system_pkgs = NULL, git_pkgs = NULL, ide = '\"'code'\"', overwrite = TRUE, print = TRUE)')" \
 	--run "Rscript -e "'"'"$CODE_TO_RUN"'"'"" 

The first feature is a rix command line tool. For example, one can run: bash rix $(pwd)/data-visualize.R to generate the ''penguin-plot.png' plot.

The second feature is inline script metadata for R like python already have.

If you look at my code for the rix file I already set the rix R command in the nix-shell call but I think it could be anotated some way in the file to be run.

Let me know what you think.

@philipp-baumann
Copy link
Collaborator

Hey @jrosell thanks for your ideas.
I'm not yet sure, do i get you correctly that you imagine a wrapper fun around rix::rix() that generates the above shell script? I think its sufficient and easier to just write a really short R script that defines the environment:

# env.R
rix::rix(
  r_ver = "4.3.2",
  r_pkgs = "data.table",
  overwrite = TRUE,
  project_path = "./my_proj_subdir"
)

Run that env.R in your R session or via Rscript.

Then use a custom bash , nix-rscript.sh with nix-shebang syntax that could be part of inst/extdata and a helper to copy it to the current proj dir, to be implemented. chmod +x

#!/usr/bin/env nix-shell
#! nix-shell -i bash --pure default.nix
Rscript  \
  --no-site-file \
  --no-environ \
   --no-restore \
  ${1}

And just

./nix-rscript.sh data-visualize.R

@philipp-baumann
Copy link
Collaborator

To sum up, maybe like a littler script helper. https://nix.dev/tutorials/first-steps/reproducible-scripts.html is a nice ref.

@jrosell
Copy link
Contributor Author

jrosell commented Sep 7, 2024

Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R

In Python one can do inline anotations and run: uv run script.py

@philipp-baumann
Copy link
Collaborator

Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R

In python one can do with inline anotations and run: uv run script.py

Well, these are at least two pair of shoes. I like those inline annotations. It needs a lot on top of the nix-Rscript runner. Tooling in nix? Nix ensures reproducibility via output hashes and based on inputs in the expression supplied. Currently, rix boilerplating assumes one fixed nixpkgs revisions/git hashes for specific packages, but in principle it could be extended to multiple. There is quite a bit of tooling needed so we can leverage renv lockfile to-nix work (see #5 ) . Ideas and PRs are very welcome. Wanna join the Nixpkgs R matrix channel? Could be a good place to brainstorm, too.

@jrosell
Copy link
Contributor Author

jrosell commented Sep 13, 2024

Here is what I have. It works in Ubuntu: https://github.com/jrosell/rix-run

@b-rodrigues
Copy link
Contributor

that's really cool, I must admit that I didn't really understand what you meant but now that I see it, it's really nice!

How would you like to move forward with this? Would you like to have it included into rix? We are in the process of submitting to CRAN very soon so now wouldn't be the right moment to add a completely new feature, however if you want to continue to work on it feel free, and we could merge a PR for a next release.

@jrosell
Copy link
Contributor Author

jrosell commented Sep 13, 2024

I think that the rix-run script belongs to rix, but I belive that the script should work fine on more systems. So, we can wait.

@philipp-baumann philipp-baumann added the enhancement New feature or request label Sep 20, 2024
@jrosell
Copy link
Contributor Author

jrosell commented Sep 20, 2024

To keep you update, it turns out that rix-run plays well with targets script file too. I really like the ability to have multiple target scripts in the same project.

https://github.com/jrosell/rix-run?tab=readme-ov-file#targets-single-file

@jrosell
Copy link
Contributor Author

jrosell commented Oct 15, 2024

I thought about this idea and I think it could be taken further using {processx} as {callr} do.

I imagine something like this for testing same function on diferent R versions using nix shell processes.

bench::mark(
rix::run(rix::rix(v_ver="4.3.1"), my_function),
rix::run(rix::rix(v_ver="4.4.1"), my_function)
)

What do you think?

@philipp-baumann
Copy link
Collaborator

philipp-baumann commented Oct 22, 2024

I thought about this idea and I think it could be taken further using {processx} as {callr} do.

I imagine something like this for testing same function on diferent R versions using nix shell processes.

bench::mark( rix::run(rix::rix(v_ver="4.3.1"), my_function), rix::run(rix::rix(v_ver="4.4.1"), my_function) )

What do you think?

Running R functions in different Nix R environments is exactly what with_nix() that I implememented does. see e.g.

rix/R/with_nix.R

Lines 284 to 292 in 287e8bd

cmd_rnix_deparsed <- c(
file.path(project_path, "default.nix"),
"--pure", # required so that nix glibc is used
"--run",
sprintf(
"Rscript --no-site-file --no-environ --no-restore '%s'",
rnix_file
)
)
We also have docs for it.

https://docs.ropensci.org/rix/articles/z-advanced-topic-running-r-or-shell-code-in-nix-from-r.html

We do it via {sys} and have some safe defaults to run code it different nix shells, with proper recursive detection of globals etc. The approach really works well and I don't think it's necessary to have duplicate functionality.

For functionality under the hood, see https://github.com/ropensci/rix/blob/main/R/with_nix_helpers.R

Cheers,
Philipp

@jrosell
Copy link
Contributor Author

jrosell commented Oct 22, 2024

Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.

benchmark_dummy <- \(){
  invisible(NULL)
}
benchmark_memCompress <- \(){  
  txt <- readLines(file.path(R.home(), "COPYING"))
  for(i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}
results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)
results_r[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression    median
#>   <bch:expr>  <bch:tm>
#> 1 dummy        250.1ns
#> 2 memCompress   43.4ms

# Configuring and initial set up of the two environments
rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")

# Get the fastest time
results_dummy <- bench::mark(
  old_dummy = {    
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_dummy[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression   median
#>   <bch:expr> <bch:tm>
#> 1 old_dummy     5.05s
#> 2 new_dummy     7.05s

# Get the bechmark times
results_memCompress <- bench::mark(
  old_memCompress = {    
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3")
  },
  new_memCompress ={ 
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_memCompress[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression        median
#>   <bch:expr>      <bch:tm>
#> 1 old_memCompress    5.69s
#> 2 new_memCompress    8.36s

@philipp-baumann
Copy link
Collaborator

Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.

benchmark_dummy <- \(){
  invisible(NULL)
}
benchmark_memCompress <- \(){  
  txt <- readLines(file.path(R.home(), "COPYING"))
  for(i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}
results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)
results_r[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression    median
#>   <bch:expr>  <bch:tm>
#> 1 dummy        250.1ns
#> 2 memCompress   43.4ms

# Configuring and initial set up of the two environments
rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")

# Get the fastest time
results_dummy <- bench::mark(
  old_dummy = {    
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_dummy[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression   median
#>   <bch:expr> <bch:tm>
#> 1 old_dummy     5.05s
#> 2 new_dummy     7.05s

# Get the bechmark times
results_memCompress <- bench::mark(
  old_memCompress = {    
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3")
  },
  new_memCompress ={ 
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_memCompress[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression        median
#>   <bch:expr>      <bch:tm>
#> 1 old_memCompress    5.69s
#> 2 new_memCompress    8.36s

Yes, exactly, it doesn't make sense to benchmark, because there is a serialization/deserialization overhead (including detecting and assigning globals recursively before), the time to invoke nix-shell (which is known its relatively slow as packaged in NixCpp).

@philipp-baumann
Copy link
Collaborator

I have currently on my aarch64 MacbookM2 about 2.5s median time (my rocky linux in my home network is currently disconnected from ssh access). Had to switch to to microbenchmark::microbenchmark() because bench::mark() errored with a file unlinking problem, and also i just test dummy in "latest" R because back then that arch did not exist on nixpkgs. But the 2.5 seconds I got would match also a similar benchmarking overhead between haskell build tool and nix-shell invocation: commercialhaskell/stack#4406

benchmark_dummy <- \(){
  invisible(NULL)
}

benchmark_memCompress <- \(){
  txt <- readLines(file.path(R.home(), "COPYING"))
  for (i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}

results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)

r_latest_path <- file.path("latest")
r_3_6_3_path <- file.path("3.6.3")

results_r[, c("expression", "median")]

# Configuring and initial set up of the two environments
# R 3.6.3 is not available for aarch64-darwin,will not build because at that
# time nixpkgs was not yet supporting the Apple Silicon architecture
# rix::rix(r_ver = "3.6.3", project_path = r_3_6_3_path, overwrite = TRUE)
# rix::nix_build(project_path = r_3_6_3_path)

rix::rix(r_ver = "latest", project_path = r_latest_path, overwrite = TRUE)
rix::nix_build(project_path = r_latest_path)

# Get the fastest time
results_dummy <- bench::mark(
  # old_dummy = {   
  #   rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  # },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
  },
  check = FALSE,
  memory = FALSE,
  filter_gc = FALSE,
  min_time = 10
)

benchmark_new <- microbenchmark::microbenchmark(
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
  },
  times = 20
)

Where i get

> benchmark_new
Unit: seconds
      expr      min     lq    mean   median       uq      max neval
 new_dummy 2.379217 2.4867 2.54477 2.503232 2.584663 2.785343    20

whatever it will be, you will have the overhead of nix-shell, which is significant, when you launch all from the same session. Otherwise you can just open two nix-R sessions in different subfolders and just run the same R scripts for benchmarking in separate R environments.

@jrosell
Copy link
Contributor Author

jrosell commented Oct 22, 2024

I'm not sure if I understand well what you said in the last paragraph. Do you mean to run two separate benchmarks in two diferent scripts? I think I can try it with my rix-run tool. It could make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants