Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contours for non-axis aligned grids #5911

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #4320.

Briefly, it attempts to 'unrotate' the data, do the contour calculation, and then re-apply the rotation back to the original data. It does not do the interpolation suggestion in the issue.

Estimating the rotation of the data is the tricky bit and it is done as follows:

  • I guessed that in about 90% of cases, the data comes in an ordered fashion. When the data is ordered, two subsequent points in the data are frequently nearest neighbours (except when a new row/column begins). This PR exploits this by trying the most frequently occurring angle between subsequent points.
  • When the data is unordered, the approach above is invalid. The fallback for this case is to compute a convex hull which should give us the coordinates of the 4 corner points of the grid (in a perfect world with no numerical imprecision). The angle is then inferred from the longest segment on the hull.

A demonstration:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2
angle <- 30 * pi / 180

df <- 
  data.frame(
    x = as.vector(row(volcano)),
    y = as.vector(col(volcano)),
    z = as.vector(volcano)
  ) |>
  transform(
    x = cos(angle) * x - sin(angle) * y,
    y = sin(angle) * x + cos(angle) * y
  )

ggplot(df, aes(x, y, z = z)) +
  geom_contour_filled() +
  coord_equal()

Created on 2024-05-28 with reprex v2.1.0

@thomasp85
Copy link
Member

Since the angle estimation happens in every case, could we consider adding an upper bound on the number of points used to do the estimation. Contours are often used to declutter huge amounts of data so I can be a bit concerned with the performance implications of this PR

@teunbrand
Copy link
Collaborator Author

Alright given the following plot of a 328 x 612 raster;

image

We benchmark for this PR:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

asia <-  system.file("extdata/asia.tif", package = "tidyterra")
asia <- as.matrix(terra::rast(asia), wide = TRUE)

# Make larger for good measure
asia <- cbind(rbind(-asia, asia), rbind(asia, -asia))
dim(asia)
#> [1] 328 612

angle <- 30 * pi / 180

rectangle <- 
  data.frame(
    x = as.vector(row(asia)),
    y = as.vector(col(asia)),
    z = as.vector(asia)
  )

rotated <- rectangle |>
  transform(
    x = cos(angle) * x - sin(angle) * y,
    y = sin(angle) * x + cos(angle) * y
  )

disordered <- rotated[sample(nrow(rotated)), ]

template <- ggplot(mapping = aes(x, y, z = z)) +
  geom_contour_filled() +
  coord_equal()

bench::mark(
  rectangle  = ggplot_build(template %+% rectangle),
  rotated    = ggplot_build(template %+% rotated),
  disordered = ggplot_build(template %+% disordered), 
  check = FALSE, min_iterations = 10
)

#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rectangle     593ms    689ms      1.43     272MB     7.03
#> 2 rotated       633ms    724ms      1.37     306MB     7.13
#> 3 disordered    668ms    761ms      1.30     307MB     6.66

Created on 2024-07-11 with reprex v2.1.0

The same benchmark for the rectangle option in current main branch (can't compute the rotated options in main):

#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rectangle     596ms    774ms      1.41     250MB     7.04

Doing the initial estimation of angles on head(x/y, 20) instead of full coordinates gives the following benchmarks:

#> disabled.
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rectangle     575ms    592ms      1.47     250MB     6.03
#> 2 rotated       624ms    627ms      1.46     284MB     6.42
#> 3 disordered    629ms    645ms      1.37     281MB     6.70

Conclusion: the base case of having an axis-aligned rectangle isn't necessarily slower. Initially looking at the first 20 angles speeds it up a little bit.

@thomasp85
Copy link
Member

I think the speedup is significant enough so that we should do it

@teunbrand
Copy link
Collaborator Author

gotcha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

contour fails when coordinates are not aligned with axes
2 participants