Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bed_intersect] consider warning user when input data is passed multiple times through %>% operator #412

Closed
kriemo opened this issue Apr 3, 2024 · 1 comment

Comments

@kriemo
Copy link
Member

kriemo commented Apr 3, 2024

It's possible for the user to mistakenly pass a dataset twice to bed_intersect() when using the %>% operator. This happens when an intermediate function is called within the bed_intersect call. The data gets passed implicitly to the x argument, then secondly as a dots (...) argument. I'm not sure what options we have to detect this on our end, and for some users this might be a feature rather than a bug. The native pipe operator won't allow this, as the data/placeholder can't be passed twice.

library(valr)
library(dplyr)
library(tibble)

x <- tribble(
     ~chrom, ~start, ~end, ~strand,
    "1", 0L, 5L,  "+"
    )

# data from x interpreted as two inputs
x %>%
  bed_intersect(group_by(., strand))
#> # A tibble: 1 × 9
#>   chrom start.x end.x strand.x start.y end.y strand.y .source .overlap
#>   <chr>   <int> <int> <chr>      <int> <int> <chr>    <chr>      <int>
#> 1 1           0     5 +              0     5 +        1              5

# is equivalent to
bed_intersect(x, group_by(x, strand))
#> # A tibble: 1 × 9
#>   chrom start.x end.x strand.x start.y end.y strand.y .source .overlap
#>   <chr>   <int> <int> <chr>      <int> <int> <chr>    <chr>      <int>
#> 1 1           0     5 +              0     5 +        1              5

# this behavior can be confusing if intersecting with another tibble
y <- tribble(
  ~chrom, ~start, ~end, ~nonsense, ~strand,
  "XX", 100L, 500L,  "hello!", "-"
)

x %>%
  bed_intersect(group_by(., strand), group_by(y, strand))
#> # A tibble: 1 × 10
#>   chrom start.x end.x strand.x start.y end.y strand.y nonsense.y .source
#>   <chr>   <int> <int> <chr>      <int> <int> <chr>    <chr>      <chr>  
#> 1 1           0     5 +              0     5 +        <NA>       1      
#> # ℹ 1 more variable: .overlap <int>

# is equivalent to:
bed_intersect(x, group_by(x, strand), group_by(y, strand))
#> # A tibble: 1 × 10
#>   chrom start.x end.x strand.x start.y end.y strand.y nonsense.y .source
#>   <chr>   <int> <int> <chr>      <int> <int> <chr>    <chr>      <chr>  
#> 1 1           0     5 +              0     5 +        <NA>       1      
#> # ℹ 1 more variable: .overlap <int>

This doesn't happen with the native pipe, as you can't (implicitly or explicitly) pass the data twice.

x |> bed_intersect(group_by(.data = _, strand))
Error in bed_intersect(x, group_by(.data = "_", strand)) : 
  invalid use of pipe placeholder (<input>:1:0)

Created on 2024-04-03 with reprex v2.1.0

@kriemo
Copy link
Member Author

kriemo commented Apr 26, 2024

closing, as i think the current behavior is desirable.

@kriemo kriemo closed this as completed Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant