Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summary.rspec rewrite #250

Merged
merged 18 commits into from
Aug 30, 2023
Merged

summary.rspec rewrite #250

merged 18 commits into from
Aug 30, 2023

Conversation

thomased
Copy link
Collaborator

@thomased thomased commented Jun 16, 2023

summary.rspec rejig (fixes #165)

  • Deprecate wlmin, wlmax, in favour of lim = c() argument.
  • Only calculate the variable(s) being returned, when using subset.

@thomased thomased mentioned this pull request Jun 16, 2023
4 tasks
@thomased
Copy link
Collaborator Author

thomased commented Jun 21, 2023

Current state

Slower when computing full suite of variables (as expected, owing to duplication of calcs)

library(microbenchmark)

data(sicalis)

microbenchmark(
  summary(sicalis),
  summary2.rspec(sicalis)
)
Unit: milliseconds
                    expr       min        lq      mean   median        uq       max neval cld
        summary(sicalis)  8.952917  9.063292  9.882924  9.18223  9.353521  17.08933   100  a 
 summary2.rspec(sicalis) 30.926543 31.235022 34.230939 31.66886 37.546584 117.12217   100   b
> 

Faster when computing subset (as expected, since that's all it's calculating)

microbenchmark(
  summary(sicalis, subset = 'S2'),
  summary2.rspec(sicalis, subset = 'S2')
)
Unit: microseconds
                                   expr      min        lq      mean   median        uq       max neval cld
        summary(sicalis, subset = "S2") 8895.251 9075.2505 9830.7253 9189.292 9503.1675 16579.334   100  a 
 summary2.rspec(sicalis, subset = "S2")  591.917  628.5425  659.8965  652.376  678.9175   811.125   100   b

Some further refactoring can probably bring down that difference for the full set

@thomased
Copy link
Collaborator Author

thomased commented Jul 4, 2023

Alright, I think I've gone as far as I can & have the patience for at the moment. So it now only calculates the specified variable(s) (and any dependent variables), which I think is good news just from a design perspective. E.g. it shouldn't matter if you don't have the wavelength range for 'green chroma' if you just want B3.

It also means some changes to its speed. It's now slower when calculating all variables — about ~half the speed:

> microbenchmark(
+   summary(sicalis),
+   summary_old.rspec(sicalis)
+ )
Unit: milliseconds
                       expr       min        lq     mean    median        uq      max neval cld
           summary(sicalis) 20.536917 20.844168 24.35726 21.463855 23.064938 149.1463   100  a 
 summary_old.rspec(sicalis)  8.915709  9.040937 10.95369  9.219043  9.444459 120.3357   100   b

But ~10x faster when calculating only a subset:

> microbenchmark(
+   summary(sicalis, subset = 'S2'),
+   summary_old.rspec(sicalis, subset = 'S2')
+ )
Unit: microseconds
                                      expr      min        lq     mean   median        uq       max neval cld
           summary(sicalis, subset = "S2")  768.876  807.7095  848.439  838.730  866.2295  1122.376   100  a 
 summary_old.rspec(sicalis, subset = "S2") 8912.167 9070.2920 9940.778 9255.084 9542.8550 18332.750   100   b

I didn't think it'd be quite so slow for the full-variable-set use case, but I might be missing something obvious. I'm sure there's more tweaking & tightening which can be done - feel free to shout if anything obvious leaps out at you.

@thomased thomased marked this pull request as ready for review July 4, 2023 22:55
@thomased thomased requested a review from Bisaloo July 4, 2023 23:48
Copy link
Collaborator

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! I don't have any major comments on code.

I feel like the structure and design is slightly more complex though. Is it okay if I push directly some code comments on this branch to make sure the intent and possible next steps are recorded?

Comment on lines 189 to 194
if (lambdamin > lim[1]) {
stop("Minimum specified wavelength is smaller than the range of spectral data. Check the lim() argument.")
}
if (lambdamax < lim[2]) {
stop("Maximum specified wavelength is larger than the range of spectral data. Check the lim() argument.")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (lambdamin > lim[1]) {
stop("Minimum specified wavelength is smaller than the range of spectral data. Check the lim() argument.")
}
if (lambdamax < lim[2]) {
stop("Maximum specified wavelength is larger than the range of spectral data. Check the lim() argument.")
}
if (lambdamin > lim[1]) {
stop("Minimum specified wavelength is smaller than the range of spectral data. Check the lim argument.")
}
if (lambdamax < lim[2]) {
stop("Maximum specified wavelength is larger than the range of spectral data. Check the lim argument.")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to add tests for the deprecation warnings as well.

@thomased
Copy link
Collaborator Author

Absolutely — feel free to push/tweak anything. Thanks!

@Bisaloo
Copy link
Collaborator

Bisaloo commented Jul 17, 2023

I had another, in-depth, look and I'm now not sure more comments are needed. I've pushed a minor update but I think it already looks very good as it is 👍.

One problematic edge case I've found while trying to add deprecation tests:

summary(sicalis, wlmin = 400)

Error in if (lambdamax < lim[2]) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In summary.rspec(sicalis, wlmin = 400) :
The 'wlmin' and 'wlmax' arguments are deprecated as of v2.9.0,
and will be removed in future releases. Use the lim argument instead.

@thomased
Copy link
Collaborator Author

Thanks! Should be fixed with that next commit.

I'll merge this in shortly then, and start prepping a release soon too.

@thomased thomased merged commit ac699ef into master Aug 30, 2023
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only compute required variables in summary.rspec()
2 participants