You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently int_conformal_quantile() seems limited in that it:
doesn't work for arbitrary model types
doesn't use components of the workflow, etc. in producing the calibrated quantiles (e.g. the recipe, or the holdout samples if doing resampling,...)
may be inconsistent with point estimate from the base model
Ideally, for any model/workflow fit that is set-up to output quantiles (or intervals), int_conformal_quantile() would simply use the calibration data (or the available held-out data like int_conformal_cv() does if set-up resamples) to adjust the quantiles outputted by the fitted workflow.
As described from 31:00 to 37:00 by Angelopuoulos and Bates here: https://www.youtube.com/watch?v=nql000Lu_iE&list=PLXs7Va5fWFZ72DTVcx4qIvny1xNrl68PK&index=1), the steps then would be: (with parsnip / workflows) train an arbitrary model (that is capable of optimizing on pinball loss function / outputting quantiles / intervals) --> pass the resulting object into (a generalized version of) int_conformal_quantile() whose responsibility it would be to calibrate the quantiles from the model/workflow (which would be a similar set-up but that just doesn't have the probably:::quant_train() step so more similar to how the other int_conformal_*() functions work) --> which could then be used to produce well-calibrated intervals on new data.
The adaptability of the intervals then would be coming from the model in the workflow being able to output quantiles / intervals (rather than from overriding the workflow and retraining for the interval). Even if the underlying workflow isn't that adaptive (e.g. say the user has a workflow for an lm model that is just returning standard prediction intervals based on variance) the approach described above would likely do a slightly better job at factoring in the epistemic uncertainty in the model estimation compared to just doing int_conformal_split(), because it would allow for wider intervals further from the data centroid, which doesn't happen with int_conformal_split().
The text was updated successfully, but these errors were encountered:
brshallo
changed the title
Generalize conformal_infer_quantile()
Generalize int_conformal_quantile()Feb 16, 2024
Currently
int_conformal_quantile()
seems limited in that it:Ideally, for any model/workflow fit that is set-up to output quantiles (or intervals),
int_conformal_quantile()
would simply use the calibration data (or the available held-out data likeint_conformal_cv()
does if set-up resamples) to adjust the quantiles outputted by the fitted workflow.As described from 31:00 to 37:00 by Angelopuoulos and Bates here: https://www.youtube.com/watch?v=nql000Lu_iE&list=PLXs7Va5fWFZ72DTVcx4qIvny1xNrl68PK&index=1), the steps then would be: (with parsnip / workflows) train an arbitrary model (that is capable of optimizing on pinball loss function / outputting quantiles / intervals) --> pass the resulting object into (a generalized version of)
int_conformal_quantile()
whose responsibility it would be to calibrate the quantiles from the model/workflow (which would be a similar set-up but that just doesn't have theprobably:::quant_train()
step so more similar to how the otherint_conformal_*()
functions work) --> which could then be used to produce well-calibrated intervals on new data.I imagine this would be dependent on integrated support in parsnip for quantiles (tidymodels/parsnip#119, tidymodels/parsnip#465). Figured may as well open an issue though.
Rough ex with a ranger workflow:
The adaptability of the intervals then would be coming from the model in the workflow being able to output quantiles / intervals (rather than from overriding the workflow and retraining for the interval). Even if the underlying workflow isn't that adaptive (e.g. say the user has a workflow for an lm model that is just returning standard prediction intervals based on variance) the approach described above would likely do a slightly better job at factoring in the epistemic uncertainty in the model estimation compared to just doing
int_conformal_split()
, because it would allow for wider intervals further from the data centroid, which doesn't happen withint_conformal_split()
.The text was updated successfully, but these errors were encountered: