Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

Open
seanhardison1 opened this issue Mar 18, 2024 · 3 comments
Open

Comments

@seanhardison1
Copy link

seanhardison1 commented Mar 18, 2024

Hello! I'm a big fan of the sdmTMB_cv function, but I find the likelihood-based measures that are currently returned with the function to be a bit unintuitive. For a future iteration, it would be nice to have the function also return RMSE and MAE (by fold? averaged across folds?) in addition to the likelihood-based measures, or perhaps allow for user defined functions.

@seananderson
Copy link
Member

We can certainly get some common ones in there and/or allow a user-specified function. In the meantime, it's relatively easy to calculate whatever you need yourself with the cv_predicted column. The log likelihood of the left-out data has the advantage of being consistent with the what the model is trying to maximize in the data portion. It is similar to the log score (not sure what the classic paper is, but Draper often wrote about it) or the ELPD in Bayesian stats. But, yes, I can see the benefit of calculating other classic measures on the scale of the data.

library(sdmTMB)
library(dplyr)

m <- sdmTMB_cv(
  density ~ depth_scaled + depth_scaled2,
  data = pcod, mesh = make_mesh(pcod, c("X", "Y"), cutoff = 25),
  family = tweedie(link = "log"), k_folds = 3
)
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.

names(m$data)
#>  [1] "year"          "X"             "Y"             "depth"        
#>  [5] "density"       "present"       "lat"           "lon"          
#>  [9] "depth_mean"    "depth_sd"      "depth_scaled"  "depth_scaled2"
#> [13] "cv_fold"       "cv_predicted"  "cv_loglik"

# see cv_predicted for the prediction within each left-out fold

d <- m$data

# RMSE:
sqrt(mean((d$density - d$cv_predicted)^2)) 
#> [1] 194.1635

# MAE:
mean(abs(d$density - d$cv_predicted))
#> [1] 49.19603

# by fold:
group_by(d, cv_fold) |> 
  summarize(
    rmse = sqrt(mean((density - cv_predicted)^2)),
    mae = mean(abs(density - cv_predicted))
  )
#> # A tibble: 3 × 3
#>   cv_fold  rmse   mae
#>     <int> <dbl> <dbl>
#> 1       1 176.   47.4
#> 2       2 270.   55.5
#> 3       3  87.8  44.5

Created on 2024-03-18 with reprex v2.1.0

@seanhardison1
Copy link
Author

Great, thank you Sean! Seeing how easy it is to do this with the current function, I think just adding this example to the CV vignette would be a fine alternative.

@seananderson seananderson changed the title RMSE and MAE for sdmTMB_cv Add RMSE and MAE sdmTMB_cv examples to cross validation vignette Mar 18, 2024
@seananderson
Copy link
Member

Good idea. I've re-opened the issue and renamed it.

@seananderson seananderson reopened this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants