Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support for tibble aesthetics #4189

Open
davidchall opened this issue Sep 4, 2020 · 14 comments · May be fixed by #6076
Open

Feature request: support for tibble aesthetics #4189

davidchall opened this issue Sep 4, 2020 · 14 comments · May be fixed by #6076
Labels
feature a feature request or enhancement internals 🔎

Comments

@davidchall
Copy link

I'm developing a ggplot2 extension where it would be helpful to pass a tibble column as an aesthetic.

As a simple motivating example, you could imagine a geom_box() layer with a bounds aesthetic that expects a tibble with columns xmin, ymin, xmax and ymax. Then I could simply do geom_box(bounds = bbox) and the layer will use the nested columns to draw the box.

To do this, we need to make use of nested tibbles (i.e. a tibble column within a tibble).

library(tidyverse)

dat <- tibble(a = tibble(x = 1:5, y = 6:10))

# works fine
ggplot(dat) +
  geom_point(aes(x = a$x, y = a$y))

So far, so good. But these aesthetics are still just vectors. Let's try using the tibble column a.

(Obviously, geom_point() doesn't expect a tibble column for its x aesthetic, but it does generate the relevant error.)

ggplot(dat) +
  geom_point(aes(x = a, y = a$y))
#> Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.
#> Error: Aesthetics must be either length 1 or the same as the data (5): x

The warning can be removed with a new scale_type. But the error is generated by this line:

ns <- vapply(x, length, numeric(1))

The length of a data frame is the number of columns (2) instead of the number of rows (5), so we get this error. This is the same problem that vctrs::vec_size() addresses.

Would it be possible to simply avoid this error by replacing length() with nrow(), NROW() or vec_size()? Or would there be other repercussions?

@yutannihilation
Copy link
Member

Thanks. I'm not sure if this is really doable, but I think data frame column ("nested tibble," or "packed column"? I don't know the proper term for this...) should be supported. This might be a nice reason to start importing vctrs.

@davidchall
Copy link
Author

To understand the motivation with a more fleshed out example, please take a look at this vignette. This package solves the problem using an vctrs-based class that behaves effectively like a data frame whose length() returns vec_size().

@teunbrand
Copy link
Collaborator

Disclaimer: the following is mostly out of selfish reasons, so feel free to dismiss entirely.

I think replacing the length() with NROW() is more appropriate than replacing it with vec_size(). Consider the following line from the vec_size() documentation:

vec_size() is equivalent to NROW() but has a name that is easier to pronounce, and throws an error when passed non-vector inputs.

The 'throws an error when passed non-vector input', will give some problems for things that are effectively vectors but aren't implemented as typical vectors (example: #3835).

@clauswilke
Copy link
Member

I hadn't really processed the existence of NROW(). Is there any reason not to use it right now?

@teunbrand Why don't you create a PR so we can see if any tests fail.

@teunbrand
Copy link
Collaborator

teunbrand commented Oct 8, 2020

Alright I created the PR and experimented a bit. Here are some problems I ran into while attempting to make tibble aesthetics work.

data.frame constructor

In the code below;

lengths <- vapply(x, length, integer(1))

we would run into the same problem, so if tibble aesthetics were to be supported, this would have to change too.

scales

The next issue I ran into was that the scales didn't accept the tibble, as was to expected. As I imagined an extension supporting tibble aesthetics, I made a quick and dirty scale_type.tbl_df(), scale_x_tibble_continuous() and ScaleContinuousTibble ggproto to test with (based on scale_x_continuous() and ScaleContinuousPosition).

transformation checking

Next, I ran into a problem with transformation checking. In particular, in the lines below:

if (any(is.finite(x) != is.finite(transformed))) {

we get the same error as the following code produces:

is.finite(tibble::tibble(x = 1))
#> Error in is.finite(tibble::tibble(x = 1)): default method not implemented for type 'list'

I commented out the check_transformation() line in the ScaleContinuousTibble$transform() method.

scale_apply

Lastly, I got stuck at the scale training part in the scale_apply() function. In the following piece of code:

ggplot2/R/layout.R

Lines 302 to 304 in ac2b5a7

pieces <- lapply(seq_along(scales), function(i) {
scales[[i]][[method]](data[[var]][scale_index[[i]]])
})

If you can parse this with all the brackets and parenthesis (or use the RStudio debugger). the data[[var]] object is still an intact tibble. However, by double-bracket subsetting a tibble we are using subsetting by column, whereas for scale training you'd want to subset by observation/row. At this point I decided I to stop exploring.

@teunbrand
Copy link
Collaborator

teunbrand commented Oct 8, 2020

Giving this another thought, the vctrs rcrd S3 class is essentially a data.frame in that it is a collection of fields with vectors of equal length, just as the data.frame/tibble is practically a list with vectors of equal length. I can imagine it being easier to implement a rcrd subclass than to change the internals of ggplot to comply with rectangular data structures.

Potential downside is that you'd probably have to mirror quite a few of the scales package's functions that aren't S3 generics.

@davidchall
Copy link
Author

Hi @teunbrand - thank you for digging into this problem so deeply! It turned out to be quite complex, so I can understand if you wouldn't want to merge this.

I found it reassuring that you came to the vctrs_rcrd solution too - this is exactly what I've been doing up until now 👍

@yutannihilation
Copy link
Member

Thanks for your efforts! Let me leave some quick comments.

transformation checking

We can define another generic function like is_finite() to handle data frames transparently. So, this doesn't seem a serious problem to me.

scale_apply

This made me think ggplot2 needs a proper integration with vctrs (so that we can use vec_slice()?) to achieve this feature request.

One more thing I want to emphasize that it would be "data.frame" column, not "tibble" column, if we will support. I once experimented with using tibble inside ggplot2, but it seems impossible because may functions convert a tibble to a data.frame.

c.f. #3048

@teunbrand
Copy link
Collaborator

Yes an (exported) is_finite() generic for which people can write is_finite.myclass() methods seems like a good idea to me; I had to work around this before as well.

... ggplot2 needs a proper integration with vctrs ...

The scales package then would also need to support vctrs, or convert some of their functions to (S3) generics (out-of-bounds handeling, range expansion, scale training, scale transformations etc).

@clauswilke
Copy link
Member

I'm not sure that exporting a function named is_finite() is a good idea. It seems likely to clash with a function in some other package, if not now then at some point in the future. Some other name for the function would be fine.

@hadley
Copy link
Member

hadley commented Feb 11, 2021

I suspect it isn’t worth doing this piecemeal but would be better left until we attack a vctrs integration.

@thomasp85 thomasp85 added feature a feature request or enhancement internals 🔎 labels Mar 25, 2021
@thomasp85
Copy link
Member

As the vctrs integration is happening now I'm reading through these older issues. While vctrs would solve some of the issues described herein this is obviously deeper than that and touches on the basis of the API itself.

There is no concept of multivalue scales in ggplot2 so training a scale on a tibble column makes zero sense. In the example with a bounds aesthetic you'd also want to use it to train the x and y scale rather than a bounds scale so this is even more out-of-line with how aesthetics work currently.

We will be facing similar issues when thinking about how to e.g. support grid gradients because those are made up of several different values, some relates to the position others to colour mapping etc.

All of this is to say that the impeding vctrs integration will do very little to move this issue forward and what is really needed is not coding but deep thoughts about how this should conceptually work.

One small idea we could discuss was to have something like "aesthetic unpacking" in layers where you can unpack a tibble column into separate aesthetics automatically. This could be done in a single step whereafter everything would proceed as normal

@clauswilke
Copy link
Member

@thomasp85 It's definitely possible to write scales that train on multi-dimensional input. I experimented with this a long time ago. At the time, the biggest challenge was actually to pass the data through.

https://github.com/clauswilke/multiscales

If you think about it, the geometry column in sf objects is also multi-dimensional input.

In general, I think there are two distinct categories of cases:

  1. Multi-dimensional input gets mapped onto a single output. E.g., two dimensions of input determine a specific color. This can be handled simply by implementing a scale function that can perform the mapping. This doesn't require any other changes in ggplot, and it's the scenario I played around with in my linked demo.

  2. Multi-dimensional input gets mapped onto multiple outputs, and possibly x/y coordinates. This requires more work, and typically an appropriate geom and/or coord. Example is the entire geom_sf() infrastructure.

@teunbrand
Copy link
Collaborator

Let's leave the scale-side of the problem to extension devs and just move any barriers on the ggplot2 side out of the way.
I feel that this should just return the 2-column data.frame that we put in. (label was chosen as I think no computation occurs on that column)

library(ggplot2)

data <- mtcars
data$df <- data.frame(x = mtcars$cyl, y = mtcars$carb)

p <- ggplot(data, aes(disp, mpg, label = df)) +
  geom_text()

layer_data(p)$label
#> Don't know how to automatically pick scale for object of type <data.frame>.
#> Defaulting to continuous.
#> Error in `geom_text()`:
#> ! Problem while computing aesthetics.
#> ℹ Error occurred in the 1st layer.
#> Caused by error in `check_aesthetics()` at ggplot2/R/layer.R:334:5:
#> ! Aesthetics must be either length 1 or the same as the data (32).
#> ✖ Fix the following mappings: `label`.

Created on 2024-09-04 with reprex v2.1.1

@teunbrand teunbrand linked a pull request Sep 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement internals 🔎
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants