Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error or warning when mapping to a variable not in data (if length > 1) #5391

Closed
eliocamp opened this issue Aug 15, 2023 · 6 comments

Comments

@eliocamp
Copy link
Contributor

eliocamp commented Aug 15, 2023

Would it be possible/desirable to make this kind of usage a warning or an error?

library(ggplot2)

cylinders <- mtcars$cyl

ggplot(mtcars, aes(cylinders)) +
  geom_density()

I've seen many beginner write this mistake. They create a new variable outside the data.frame and then use it as data. It would be very helpful if ggplot2 could detect it and issue a warning.

I'm not sure if this would break a lot of existing code. In case there are legitimate uses, maybe ggplot2 could export a particular function to let users explicitly use external variables along these lines:

ggplot(mtcars, aes(external(cylinders))) +
  geom_density()

Something like this could allow advanced usages that might not be possible otherwise and guard regular users against this common issue.

@mjskay
Copy link
Contributor

mjskay commented Aug 15, 2023

Can I vehemently vote against this? In the modern tidyverse version of aes(), it is consistent that expressions in aes() follow the same rules as other functions that use data masking. Having aes() be different would be surprising.

Plus this would add syntactic noise to valid constructions (eg any expressions involving a data column and an external variable, like a constant).

@eliocamp
Copy link
Contributor Author

eliocamp commented Aug 15, 2023

Constants (length 1 vectors) should be allowed since that's a natural use case.

I don't understand what you mean by aes() being consistent with data masking, though.

@mjskay
Copy link
Contributor

mjskay commented Aug 15, 2023

Constants (length 1 vectors) should be allowed since that's a natural use case.

I'd argue there are cases where non-constants might be reasonable as well (maybe that's controversial). But also, if the condition becomes "external variables as long as they're length 1", that makes the rules for understanding how to construct valid expressions in aes() more complex. Personally I find that trying to guess the user's intent through complex heuristics, despite good intentions, often leads to harder-to-use APIs in practice, because your users have to learn those heuristics. Better to establish a simple set of rules (which yes, must also be learned --- but hopefully are more predictable) and apply them consistently.

I don't understand what you mean by aes() being consistent with data masking, though.

I mean that aes() follows the data masking rules used throughout the tidyverse to resolve variable names either to variables in a "masked" data frame or in another environment. If someone learns how those rules work in other tidy functions, they will understand how aes() resolves names. If you add an exception to those rules for aes(), that makes name resolution for aes() more complicated, as one needs to learn a new exception for just one function.

@teunbrand
Copy link
Collaborator

I think the trade-off here isn't worth it. There are valid use-cases for using orphan variables that don't exist in a data.frame in the mapping. I'd also argue that the beginners aren't wrong per se; there are just cleaner ways that are recommended.

@teunbrand
Copy link
Collaborator

Just to give a brief example of a valid use-case:

ggplot(mtcars, aes(disp, mpg)) +
  geom_text(aes(label = rownames(mtcars)))

Arguably one should make the rownames a proper column in the data, but we don't live in a perfect world, so it is convenient to make it slightly less of a struggle.

@teunbrand
Copy link
Collaborator

Sorry Elio I'm going to close this issue as it is unlikely that ggplot2 is going to change this behaviour. Thanks for your ideas though!

@teunbrand teunbrand closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants