Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fortify.default() accepts data-frame-like objects #5404

Merged
merged 5 commits into from
Sep 20, 2023
Merged

fortify.default() accepts data-frame-like objects #5404

merged 5 commits into from
Sep 20, 2023

Conversation

hpages
Copy link
Contributor

@hpages hpages commented Sep 1, 2023

fortify.default() now accepts a data-frame-like object granted the object exhibits healthy dim(), colnames(), and as.data.frame() behaviors. Closes #5390.

Some notes:

  • Internal helper .as_data_frame_trust_no_one() can fail in many ways and I'm using detailed error messages that I hope are informative/useful. However I'm not sure if that level of verbosity is ok. Let me know if I should use shorter error messages.
  • In the meantime my expect_error() statements in the unit tests don't check the error message but I can change that.
  • The primary motivation for this change was to make ggplot() work directly on a DataFrame object (and now it works 😉) but I didn't add unit tests for this because that would require Suggests'ing the S4Vectors package where the DataFrame class is implemented.
  • fortify.default() is now a no-op on a data.frame object so fortify.data.frame() is no longer needed but this PR keeps it anyways in order to keep it focused and minimalist.

Thanks

`fortify.default()` now accepts a data-frame-like object granted the
object exhibits healthy `dim()`, `colnames()`, and `as.data.frame()`
behaviors. Closes #5390.
@hpages
Copy link
Contributor Author

hpages commented Sep 1, 2023

Looks like the R-CMD-check / ubuntu-latest (devel) (pull_request) check is timing out. Not sure what to do about that 😞 Any idea?

Copy link
Collaborator

@teunbrand teunbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hervé,

Thanks so much for putting in the work for this PR!
I think the validation of the data.frame looks great. In fact, I think its usefulness might extend beyond just the use in fortify() to check any 'untrusted' data.frame.

I understand that this isn't an immediate concern for S4Vectors compatibility, so I'm not asking you to implement all the comments. I'd be happy to take on this extension myself in a follow up PR if this isn't in your direct line of interest, in which case you can consider the comments as 'notes to self'.

From the BioC side, the requirement of having to return appropriate dim() might mean that we cannot directly put a GenomicRanges::GRanges class into the data argument, even though it has an as.data.frame method. Is this intentional?

R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
@hpages
Copy link
Contributor Author

hpages commented Sep 4, 2023

Thanks for the feeback and happy to revise as suggested.

It's good to know about vec_is() and try_fetch(), thanks for the pointers!

About your suggestion to not do any error cacthing in .as_data_frame_trust_no_one() itself, but to wrap the call to .as_data_frame_trust_no_one(model) in a try_fetch() statement instead. This is something I initially considered but I found it diffcult to provide error messages that would be informative enough with this approach.

Let me explain: I usually advocate against excessive use of try() statements, and my preference is to let things just fail naturally. However, in this case, the supplied object is untrusted so dim(), colnames(), or as.data.frame() can fail in unpredictable/nasty ways with error messages that can be hard to interpret. In particular it might be hard for the user to tell whether the error is caused by an issue with the object they supplied or by a bug in ggplot2. This is especially true for complex S4 objects. For example:

setClass("OnDiskRectangularBlob", slots=c(stuff="ANY"))
setMethod("dim", "OnDiskRectangularBlob", function(x)  stop("failed to access resource X for reason Y"))
blob <- new("OnDiskRectangularBlob")

Letting dim() fail naturally would produce the following error:

library(ggplot2)
ggplot(blob, mapping=aes(x, y)) + geom_point()
# Error in dim(model) : failed to access resource X for reason Y

But catching the error with try() and issuing our own error message produces:

library(ggplot2)
ggplot(blob, mapping=aes(x, y)) + geom_point()
# Error in `.as_data_frame_trust_no_one()`:
# ! No `fortify()` method found for `data` (a <OnDiskRectangularBlob>
#   object), and the object does not look like it can be treated as a valid
#   data-frame-like object either (calling `dim()` on the object returned an
#   error).
# Run `rlang::last_trace()` to see where the error occurred.

This error message makes it clear that this is not a ggplot2 problem and it also gives a precise description of what ggplot()'s expectations are.

Now I need to get more familiar with try_fetch() and see how I can achieve this kind of disambiguation with a single try_fetch() statement around the call to .as_data_frame_trust_no_one() (which I will rename validate_as_data_frame()).

Best,
H.

@teunbrand
Copy link
Collaborator

teunbrand commented Sep 6, 2023

Now I need to get more familiar with try_fetch() and see how I can achieve this kind of disambiguation with a single try_fetch() statement around the call to .as_data_frame_trust_no_one() (which I will rename validate_as_data_frame()).

I think the trick to this is using parent = cnd in the error handler: it mentions the offending function. See below for an example using mocked functions.

library(rlang)

setClass("OnDiskRectangularBlob", slots=c(stuff="ANY"))
setMethod("dim", "OnDiskRectangularBlob", function(x)  stop("failed to access resource X for reason Y"))
blob <- new("OnDiskRectangularBlob")

validate <- function(x) {
  my_dims <- dim(x)
  if (length(my_dims) != 2) {
    stop("Invalid dims")
  }
  my_dims
}

wrapper <- function(x) {
  try_fetch(
    validate(x),
    error = function(cnd) cli::cli_abort("Cannot wrap", parent = cnd)
  )
}

# Failure in upstream functions
wrapper(blob)
#> Error in `wrapper()`:
#> ! Cannot wrap
#> Caused by error in `dim()`:
#> ! failed to access resource X for reason Y

# Not valid
wrapper(array(dim = 2:4))
#> Error in `wrapper()`:
#> ! Cannot wrap
#> Caused by error in `validate()`:
#> ! Invalid dims

# Valid
wrapper(matrix())
#> [1] 1 1

Created on 2023-09-06 with reprex v2.0.2

@hpages
Copy link
Contributor Author

hpages commented Sep 6, 2023

That's very useful, thank you!

@hpages
Copy link
Contributor Author

hpages commented Sep 20, 2023

Hi @teunbrand,

Thanks again for the feedback. I've implemented your suggestions:

  • Internal helper .as_data_frame_trust_no_one() is named validate_as_data_frame() and it's no longer trying to catch any error. The only error catching is now in the caller (fortify.default()), and the catching is done via try_fetch().
  • Various checks were collapsed and simplified via the use of vec_is().

Overall this makes the code significantly simpler/shorter.

2 small things I didn't implement:

  • I'm not checking that the colnames are syntactically valid. This is for consistency with ggplot2:::fortify.data.frame() which just accepts whatever colnames are on the user-supplied data frame.

  • I'm not using check_data_frame() instead of my current if (!is.data.frame(df)) cli::cli_abort(...). The former does a fine job at erroring on bad user input but in this case I'm checking that the result of calling as.data.frame() on the user-supplied input did something sensible. Using check_data_frame() for that produces a somewhat obscure error message:

    > fortify(blob)
    Error in `fortify()`:
    ! `data` must be a <data.frame>, or an object coercible by `fortify()`,
      or a valid <data.frame>-like object coercible by `as.data.frame()`.
    Caused by error in `.postvalidate_data_frame_like_object()`:
    ! `df` must be a data frame, not `NULL`.
    Run `rlang::last_trace()` to see where the error occurred.
    

    In particular it's not clear what the relationship between df and the user-supplied input (data) is, and so it's easy to jump to the conclusion that ggplot2 is doing something wrong.
    The current error message tries to avoid the confusion:

    > fortify(blob)
    Error in `fortify()`:
    ! `data` must be a <data.frame>, or an object coercible by `fortify()`,
      or a valid <data.frame>-like object coercible by `as.data.frame()`.
    Caused by error in `.postvalidate_data_frame_like_object()`:
    ! `as.data.frame(data)` did not return a <data.frame>
    Run `rlang::last_trace()` to see where the error occurred.
    

Let me know if you think I should make these changes nevertheless, and I'll be happy to do it.

Thanks again for taking the time to look at this.

Best,
H.

@hpages
Copy link
Contributor Author

hpages commented Sep 20, 2023

Note that the R-CMD-check / ubuntu-latest (devel) (pull_request) check is still timing out. Otherwise all the automated checks were successful ✔️

Copy link
Collaborator

@teunbrand teunbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hervé,

Thank you for incorporating the suggestions!
I think the code looks good as a whole and your latest changes definitely made it more maintainable.
I agree with your assessment about the column names and check_data_frame().
I only have a few nit-picks with regards to phrasing and some {cli} specific fidgeting.
The devel check time-out is because of some {vdiffr} compilation issue that is unrelated to this PR.
If you could have a look at those then I think this should be ready to merge.

Best,
Teun

R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
R/fortify.R Outdated Show resolved Hide resolved
@hpages
Copy link
Contributor Author

hpages commented Sep 20, 2023

Done. Thanks again!

Copy link
Collaborator

@teunbrand teunbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the contribution Hervé, I hope this makes BioC <> Tidyverse interface a little bit easier :)

@teunbrand teunbrand merged commit de7bad6 into tidyverse:main Sep 20, 2023
11 of 12 checks passed
@hpages
Copy link
Contributor Author

hpages commented Sep 20, 2023

I hope this makes BioC <> Tidyverse interface a little bit easier :)

It definitely will. There was a strong demand for this feature in the BioC community. Thanks again for the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for formats other than data.frame
2 participants