Feature suggestion: Inline anotations and command line #286

jrosell · 2024-08-30T11:31:11Z

I tried the {rix} package today and I think about two features that could make it more awesome for R development.

Let me give an example here.

data-visualize.R file

library(here)
library(dplyr)
library(tidyr)
library(ggplot2)
library(palmerpenguins)
library(ggthemes)
library(R.devices)

str(penguins)
p <-
  penguins |> 
  drop_na() |> 
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()
suppressGraphics(ggsave(filename = 'penguin-plot.png', plot = p))
if (interactive()) {
  utils::browseURL(here('penguin-plot.png'))
}

rix file

if [ "$#" -eq 0 ]
then
  echo "Please, provide the file to run as the first argument should be the file to run."
  echo "For example: bash rix \$(pwd)/data-visualize.R"
  exit 1
fi
FILE_TO_RUN="$1"
CODE_TO_RUN=`cat $FILE_TO_RUN`

nix-shell \
 	--expr "$(Rscript -e 'rix::rix(r_ver = '\"'4.3.3'\"', r_pkgs = c('\"'here'\"','\"'ggplot2'\"', '\"'dplyr'\"', '\"'tidyr'\"', '\"'palmerpenguins'\"', '\"'ggthemes'\"', '\"'R.devices'\"'), system_pkgs = NULL, git_pkgs = NULL, ide = '\"'code'\"', overwrite = TRUE, print = TRUE)')" \
 	--run "Rscript -e "'"'"$CODE_TO_RUN"'"'""

The first feature is a rix command line tool. For example, one can run: bash rix $(pwd)/data-visualize.R to generate the ''penguin-plot.png' plot.

The second feature is inline script metadata for R like python already have.

If you look at my code for the rix file I already set the rix R command in the nix-shell call but I think it could be anotated some way in the file to be run.

Let me know what you think.

The text was updated successfully, but these errors were encountered:

philipp-baumann · 2024-09-07T13:53:26Z

Hey @jrosell thanks for your ideas.
I'm not yet sure, do i get you correctly that you imagine a wrapper fun around rix::rix() that generates the above shell script? I think its sufficient and easier to just write a really short R script that defines the environment:

# env.R
rix::rix(
  r_ver = "4.3.2",
  r_pkgs = "data.table",
  overwrite = TRUE,
  project_path = "./my_proj_subdir"
)

Run that env.R in your R session or via Rscript.

Then use a custom bash , nix-rscript.sh with nix-shebang syntax that could be part of inst/extdata and a helper to copy it to the current proj dir, to be implemented. chmod +x

#!/usr/bin/env nix-shell
#! nix-shell -i bash --pure default.nix
Rscript  \
  --no-site-file \
  --no-environ \
   --no-restore \
  ${1}

And just

./nix-rscript.sh data-visualize.R

philipp-baumann · 2024-09-07T13:57:41Z

To sum up, maybe like a littler script helper. https://nix.dev/tutorials/first-steps/reproducible-scripts.html is a nice ref.

jrosell · 2024-09-07T14:02:58Z

Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R

In Python one can do inline anotations and run: uv run script.py

philipp-baumann · 2024-09-07T14:34:03Z

Well, the goal is to have rix anotations at script level so one can run something like: rix run script.R

In python one can do with inline anotations and run: uv run script.py

Well, these are at least two pair of shoes. I like those inline annotations. It needs a lot on top of the nix-Rscript runner. Tooling in nix? Nix ensures reproducibility via output hashes and based on inputs in the expression supplied. Currently, rix boilerplating assumes one fixed nixpkgs revisions/git hashes for specific packages, but in principle it could be extended to multiple. There is quite a bit of tooling needed so we can leverage renv lockfile to-nix work (see #5 ) . Ideas and PRs are very welcome. Wanna join the Nixpkgs R matrix channel? Could be a good place to brainstorm, too.

jrosell · 2024-09-13T14:26:46Z

Here is what I have. It works in Ubuntu: https://github.com/jrosell/rix-run

b-rodrigues · 2024-09-13T16:32:48Z

that's really cool, I must admit that I didn't really understand what you meant but now that I see it, it's really nice!

How would you like to move forward with this? Would you like to have it included into rix? We are in the process of submitting to CRAN very soon so now wouldn't be the right moment to add a completely new feature, however if you want to continue to work on it feel free, and we could merge a PR for a next release.

jrosell · 2024-09-13T17:39:55Z

I think that the rix-run script belongs to rix, but I belive that the script should work fine on more systems. So, we can wait.

jrosell · 2024-09-20T09:00:04Z

To keep you update, it turns out that rix-run plays well with targets script file too. I really like the ability to have multiple target scripts in the same project.

https://github.com/jrosell/rix-run?tab=readme-ov-file#targets-single-file

jrosell · 2024-10-15T18:58:02Z

I thought about this idea and I think it could be taken further using {processx} as {callr} do.

I imagine something like this for testing same function on diferent R versions using nix shell processes.

bench::mark(
rix::run(rix::rix(v_ver="4.3.1"), my_function),
rix::run(rix::rix(v_ver="4.4.1"), my_function)
)

What do you think?

philipp-baumann · 2024-10-22T08:04:14Z

I thought about this idea and I think it could be taken further using {processx} as {callr} do.

I imagine something like this for testing same function on diferent R versions using nix shell processes.

bench::mark( rix::run(rix::rix(v_ver="4.3.1"), my_function), rix::run(rix::rix(v_ver="4.4.1"), my_function) )

What do you think?

Running R functions in different Nix R environments is exactly what with_nix() that I implememented does. see e.g.

rix/R/with_nix.R

Lines 284 to 292 in 287e8bd

    
           cmd_rnix_deparsed <- c( 
        
             file.path(project_path, "default.nix"), 
        
             "--pure", # required so that nix glibc is used 
        
             "--run", 
        
             sprintf( 
        
               "Rscript --no-site-file --no-environ --no-restore '%s'", 
        
               rnix_file 
        
             ) 
        
           )

We also have docs for it.

https://docs.ropensci.org/rix/articles/z-advanced-topic-running-r-or-shell-code-in-nix-from-r.html

We do it via {sys} and have some safe defaults to run code it different nix shells, with proper recursive detection of globals etc. The approach really works well and I don't think it's necessary to have duplicate functionality.

For functionality under the hood, see https://github.com/ropensci/rix/blob/main/R/with_nix_helpers.R

Cheers,
Philipp

jrosell · 2024-10-22T12:31:24Z

Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.

benchmark_dummy <- \(){
  invisible(NULL)
}
benchmark_memCompress <- \(){  
  txt <- readLines(file.path(R.home(), "COPYING"))
  for(i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}
results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)
results_r[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression    median
#>   <bch:expr>  <bch:tm>
#> 1 dummy        250.1ns
#> 2 memCompress   43.4ms

# Configuring and initial set up of the two environments
rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")

# Get the fastest time
results_dummy <- bench::mark(
  old_dummy = {    
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_dummy[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression   median
#>   <bch:expr> <bch:tm>
#> 1 old_dummy     5.05s
#> 2 new_dummy     7.05s

# Get the bechmark times
results_memCompress <- bench::mark(
  old_memCompress = {    
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3")
  },
  new_memCompress ={ 
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_memCompress[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression        median
#>   <bch:expr>      <bch:tm>
#> 1 old_memCompress    5.69s
#> 2 new_memCompress    8.36s

philipp-baumann · 2024-10-22T17:02:42Z

Thanks, Philipp. I tested it a bit and I get some weird results with this approach. I assume it's because it doesn't make sense to benchmark with less than 10s precision with this implementation.

benchmark_dummy <- \(){
  invisible(NULL)
}
benchmark_memCompress <- \(){  
  txt <- readLines(file.path(R.home(), "COPYING"))
  for(i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}
results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)
results_r[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression    median
#>   <bch:expr>  <bch:tm>
#> 1 dummy        250.1ns
#> 2 memCompress   43.4ms

# Configuring and initial set up of the two environments
rix::rix(r_ver = "3.6.3", project_path = "/tmp/R/3.6.3", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
rix::rix(r_ver = "latest", project_path = "/tmp/R/latest", overwrite = TRUE)
rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")

# Get the fastest time
results_dummy <- bench::mark(
  old_dummy = {    
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = "/tmp/R/latest", program = "R")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_dummy[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression   median
#>   <bch:expr> <bch:tm>
#> 1 old_dummy     5.05s
#> 2 new_dummy     7.05s

# Get the bechmark times
results_memCompress <- bench::mark(
  old_memCompress = {    
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/3.6.3")
  },
  new_memCompress ={ 
    rix::with_nix(benchmark_memCompress, project_path = "/tmp/R/latest")
  },
  check = FALSE,
  memory = FALSE,
  min_time = 30
)
results_memCompress[,c("expression", "median")]
#> # A tibble: 2 × 2
#>   expression        median
#>   <bch:expr>      <bch:tm>
#> 1 old_memCompress    5.69s
#> 2 new_memCompress    8.36s

Yes, exactly, it doesn't make sense to benchmark, because there is a serialization/deserialization overhead (including detecting and assigning globals recursively before), the time to invoke nix-shell (which is known its relatively slow as packaged in NixCpp).

philipp-baumann · 2024-10-22T18:42:48Z

I have currently on my aarch64 MacbookM2 about 2.5s median time (my rocky linux in my home network is currently disconnected from ssh access). Had to switch to to microbenchmark::microbenchmark() because bench::mark() errored with a file unlinking problem, and also i just test dummy in "latest" R because back then that arch did not exist on nixpkgs. But the 2.5 seconds I got would match also a similar benchmarking overhead between haskell build tool and nix-shell invocation: commercialhaskell/stack#4406

benchmark_dummy <- \(){
  invisible(NULL)
}

benchmark_memCompress <- \(){
  txt <- readLines(file.path(R.home(), "COPYING"))
  for (i in 1:100) {    
    memCompress(txt, "g")
  }
  invisible(NULL)
}

results_r <- bench::mark(
  dummy = {    
    benchmark_dummy()
  },
  memCompress ={  
    benchmark_memCompress()
  },
  check = FALSE,
  memory = FALSE,
  min_time = 10
)

r_latest_path <- file.path("latest")
r_3_6_3_path <- file.path("3.6.3")

results_r[, c("expression", "median")]

# Configuring and initial set up of the two environments
# R 3.6.3 is not available for aarch64-darwin,will not build because at that
# time nixpkgs was not yet supporting the Apple Silicon architecture
# rix::rix(r_ver = "3.6.3", project_path = r_3_6_3_path, overwrite = TRUE)
# rix::nix_build(project_path = r_3_6_3_path)

rix::rix(r_ver = "latest", project_path = r_latest_path, overwrite = TRUE)
rix::nix_build(project_path = r_latest_path)

# Get the fastest time
results_dummy <- bench::mark(
  # old_dummy = {   
  #   rix::with_nix(benchmark_dummy, project_path = "/tmp/R/3.6.3", program = "R")
  # },
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
  },
  check = FALSE,
  memory = FALSE,
  filter_gc = FALSE,
  min_time = 10
)

benchmark_new <- microbenchmark::microbenchmark(
  new_dummy ={ 
    rix::with_nix(benchmark_dummy, project_path = r_latest_path, program = "R")
  },
  times = 20
)

Where i get

> benchmark_new
Unit: seconds
      expr      min     lq    mean   median       uq      max neval
 new_dummy 2.379217 2.4867 2.54477 2.503232 2.584663 2.785343    20

whatever it will be, you will have the overhead of nix-shell, which is significant, when you launch all from the same session. Otherwise you can just open two nix-R sessions in different subfolders and just run the same R scripts for benchmarking in separate R environments.

jrosell · 2024-10-22T19:15:32Z

I'm not sure if I understand well what you said in the last paragraph. Do you mean to run two separate benchmarks in two diferent scripts? I think I can try it with my rix-run tool. It could make sense.

philipp-baumann added the enhancement New feature or request label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature suggestion: Inline anotations and command line #286

Feature suggestion: Inline anotations and command line #286

jrosell commented Aug 30, 2024 •

edited

Loading

philipp-baumann commented Sep 7, 2024

philipp-baumann commented Sep 7, 2024

jrosell commented Sep 7, 2024 •

edited

Loading

philipp-baumann commented Sep 7, 2024

jrosell commented Sep 13, 2024 •

edited

Loading

b-rodrigues commented Sep 13, 2024

jrosell commented Sep 13, 2024 •

edited

Loading

jrosell commented Sep 20, 2024

jrosell commented Oct 15, 2024

philipp-baumann commented Oct 22, 2024 •

edited

Loading

jrosell commented Oct 22, 2024 •

edited

Loading

philipp-baumann commented Oct 22, 2024

philipp-baumann commented Oct 22, 2024

jrosell commented Oct 22, 2024

Feature suggestion: Inline anotations and command line #286

Feature suggestion: Inline anotations and command line #286

Comments

jrosell commented Aug 30, 2024 • edited Loading

philipp-baumann commented Sep 7, 2024

philipp-baumann commented Sep 7, 2024

jrosell commented Sep 7, 2024 • edited Loading

philipp-baumann commented Sep 7, 2024

jrosell commented Sep 13, 2024 • edited Loading

b-rodrigues commented Sep 13, 2024

jrosell commented Sep 13, 2024 • edited Loading

jrosell commented Sep 20, 2024

jrosell commented Oct 15, 2024

philipp-baumann commented Oct 22, 2024 • edited Loading

jrosell commented Oct 22, 2024 • edited Loading

philipp-baumann commented Oct 22, 2024

philipp-baumann commented Oct 22, 2024

jrosell commented Oct 22, 2024

jrosell commented Aug 30, 2024 •

edited

Loading

jrosell commented Sep 7, 2024 •

edited

Loading

jrosell commented Sep 13, 2024 •

edited

Loading

jrosell commented Sep 13, 2024 •

edited

Loading

philipp-baumann commented Oct 22, 2024 •

edited

Loading

jrosell commented Oct 22, 2024 •

edited

Loading