-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reviewing calibration vignette #111
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,7 +61,7 @@ mk_sim <- function(init_state = c(S = 99, I = 1, R = 0)) { | |
sir_simulator <- mk_sim() | ||
## `.phases = "during"` is important so that the number of observations matches the number of time steps | ||
sir_results = sir_simulator$report(.phases = "during") | ||
``` | ||
``` | ||
|
||
**fixme**: providing mismatched time series (e.g. by forgetting to specify `.phases = "during"` when generating simulated data) gives cryptic/confusing errors and behaviour (warnings about failure to recycle, objective function values equal to zero). (We can make a reprex for this by leaving out `.phases = "during"` above ... | ||
|
||
|
@@ -93,8 +93,26 @@ mk_calibrate(sir_simulator, | |
) | ||
``` | ||
|
||
--- | ||
|
||
IP: It would be good to explain what each argument above is and point out that the starting values are assumed to already be transformed when transformations are provided, _i.e._ that you're giving $\log(\beta) = 1$ as the starting value not $\beta = 1 \Rightarrow \log(\beta) = 0$. I actually think it would be more natural to have the user specify the starting value on the original parameter scale and then transform it if a transform is provided. | ||
|
||
IP: `mk_calibrate()` is not the most descriptive function name here... Based on the current functionality, something more like `update_to_calibrate()` is a bit more clear, though probably not perfect. We need something that gets at the fact that this function is setting up the simulator for calibration (and modifying it in place!!), but not actually performing the fit itself. | ||
|
||
IP: It may be obvious to some users why the log-likelihood expression above makes sense, but I think it merits a little explaining. | ||
|
||
--- | ||
|
||
Unlike typical R functions, this function modifies the `sim` object *in place* (!!) | ||
|
||
--- | ||
|
||
IP: This is scary and needs to be emphasized more!! Can you explain more what gets modified in the simulator object? | ||
|
||
IP: What's the rationale of not just having `mk_calibrate()` return a new simulator? I feel like this approach makes reproducibility harder. I kind of liked the approach in macpan1.5 where you specify a simulator (`flexmodel`), then you attach stuff to it to update it to a `model_to_calibrate`, and then after fitting you have a `model_calibrated` object. This sequence means that you can use the same base `flexmodel` to perform various independent calibrations, without inadvertently undermining yourself because you modified the underlying simulator and forgot... This is less of a concern in a production pipeline, but could certainly be a problem in an interactive work setting. | ||
|
||
--- | ||
|
||
A sanity check: make sure that the starting values give a reasonable-looking trajectory. | ||
|
||
```{r plot1} | ||
|
@@ -108,13 +126,36 @@ The simulated trajectory is **not** sensible in this case - the `log_beta` value | |
|
||
Setting $\log(\beta)=0$ instead gives us a trajectory that is still very unrealistic (the peak of our observed prevalence is only 16.7), but at least it's smooth. As it turns out this will be good enough, but finding appropriate starting values (based on external information and some trial and error) is often a significant part of a modeling workflow. | ||
|
||
--- | ||
|
||
IP: Wait, why did simulator return a jagged line at all? Isn't the simulator deterministic? | ||
|
||
--- | ||
|
||
```{r plot2} | ||
(sir_simulator$report(c(0,0)) |> | ||
filter(matrix == "state", row == "I") |> | ||
ggplot(aes(time, value)) + geom_line() | ||
) | ||
``` | ||
|
||
--- | ||
|
||
IP: What is the argument of `report()` above specifying? | ||
|
||
IP: This is a bit of a larger point that Steve may want to weigh in on, but the object-oriented approach here makes it a little more difficult to understand how to get help with various functions. For instance, I can't just do `?report` to get info on the arguments for the function above. Perhaps including a short section on the home page of `macpan2` quickly explaining that `macpan2` is object-oriented and how one can get help on object methods would be helpful. I know I had [something about this in the first Quickstart guide](https://canmod.github.io/macpan2/articles/quickstart.html#side-note-macpan2-is-object-oriented), but it didn't go as far as to explain how to get help, which I think is as simple as | ||
|
||
``` | ||
> class(sir_simulator) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I completely agree @papsti . Something like this in the vignette makes it even more important to make sure that the method documentation is up-to-date. It also makes me want to write a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you write your own object-specific There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that's a better idea. But not as S3 methods. It would be more object$help(). |
||
[1] "TMBSimulator" "TMBSimulationFormatter" | ||
[3] "Base" | ||
?TMBSimulator | ||
``` | ||
|
||
and then scrolling down to the Methods section of the help page. It's easy once you know to do it, but it's not going to be intuitive to many R users. Even better would be to write a help method that looks up the most specific class of the object and then displays the corresponding help page. | ||
|
||
--- | ||
|
||
Let's replace the starting value for `log_beta` with 0: | ||
|
||
```{r repl_param} | ||
|
@@ -129,7 +170,9 @@ sir_simulator$replace$params(c(0, 1), c("log_beta", "log_I_sd")) | |
fit <- sir_simulator$optimize$nlminb() | ||
``` | ||
|
||
Print the results of `nlminb` - **always check the value of the convergence code** (if it's not 0, then something *may* have gone wrong ...) | ||
IP: A bit more exposition on `nlminb()` and what it's doing + a reference to read? | ||
|
||
Print the results of `nlminb` - **always check the value of the convergence code** (if it's not 0, then something *may* have gone wrong ...) | ||
|
||
```{r check_fit} | ||
print(fit) | ||
|
@@ -162,12 +205,28 @@ class(ff) <- "TMB" | |
) | ||
``` | ||
|
||
IP: Explain the output here a bit more? What are the `outer mgc` values? | ||
|
||
These correspond to true values of 0.2, 1, so pretty close. | ||
|
||
**fixme**: can we handle back-transformation/parameter naming more nicely? `coef()`, `summary()` methods ... ?? | ||
|
||
IP: We had methods for back-transformation/parameter naming in macpan1.5 and I definitely appreciated them. If you're going to bother doing that for the use case here, then the UI of `mk_calibrate()` could also be simplified as follows: | ||
|
||
``` | ||
mk_calibrate(sir_simulator, | ||
data = data.frame(I_obs = sir_prevalence$obs_val), | ||
params = list(log_beta = 1, log_I_sd = 1), | ||
exprs = list(log_lik ~ dnorm(I_obs, I, I_sd)), | ||
) | ||
``` | ||
|
||
This specification also makes it clearer that the initial conditions are already transformed. | ||
|
||
The best-fit parameters are stored *internally*, so if we re-run the `$report()` method we will get information about the predicted best-fit trajectory: | ||
|
||
IP: By the above, do you mean that the line `fit <- sir_simulator$optimize$nlminb()` _also_ updated the simulator invisibly in place? If so, be explicit about what was modified (I assume just that the fitted parameter values were updated to the best fit values?) | ||
|
||
```{r plot_results} | ||
sim_vals <- (sir_simulator$report(.phases = "during") | ||
|> filter(matrix == "state", row == "I") | ||
|
@@ -192,7 +251,7 @@ mk_calibrate(sir_simulator, | |
sir_simulator$objective(log(c(1, 0.5, 1))) | ||
``` | ||
|
||
How does this look with our default parameters? (To save typing I'm writing a generic function that runs the model with a specified set of parameters, then plots the results along with the data.) | ||
How does this look with our initial parameters? (To save typing I'm writing a generic function that runs the model with a specified set of parameters, then plots the results along with the data.) | ||
|
||
**fixme**: some basic plotting functionality in `macpan2helpers` ? | ||
|
||
|
@@ -484,6 +543,8 @@ Likelihood profile CIs ... | |
|
||
This section explains what is going on under the hood in `macpan2helpers::mk_calibrate()` | ||
|
||
IP: this almost feels like it could be its own vignette for those that want to build the calibration up from scratch... | ||
|
||
### Step 0: recreate the simulator | ||
|
||
Since trying to add the same matrix to a simulator twice causes an error, we'll re-run the `mk_sim()` function to create a new instance of the simulator: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to point to the documentation that explains what
.phases = "during"
means hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an aside: why is
.phases = "during"
not the default?