The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:
- resampling for estimating the sampling distribution of a statistic
- estimating model performance using a holdout set
The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.
Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.
For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:
library(rsample)
library(mlbench)
data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B
set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B
# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B
# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426
Created on 2022-02-28 by the reprex package (v2.0.1)
The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.
To install it, use:
install.packages("rsample")
And the development version from GitHub with:
# install.packages("devtools")
install_dev("rsample")
This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
-
For questions and discussions about tidymodels packages, modeling, and machine learning, please post on RStudio Community.
-
If you think you have encountered a bug, please submit an issue.
-
Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code.
-
Check out further details on contributing guidelines for tidymodels packages and how to get help.