fit <- orsf(data = pbc_orsf,
formula = Surv(time, status) ~ . - id,
@@ -182,7 +183,7 @@ User-supplied out-of-bag
In some cases, you may want more control over how out-of-bag error is
estimated. For example, let’s use the Brier score from the
SurvMetrics
package:
-
+
oobag_brier_surv <- function(y_mat, w_vec, s_vec){
@@ -212,7 +213,7 @@ User-supplied out-of-bag
There are two ways to apply your own function to compute out-of-bag
error. First, you can apply your function to the out-of-bag survival
predictions that are stored in ‘aorsf’ objects, e.g:
-
+
oobag_brier_surv(y_mat = pbc_orsf[,c('time', 'status')],
s_vec = fit$pred_oobag)
@@ -220,7 +221,7 @@ User-supplied out-of-bag
#> [1] 0.11869
Second, you can pass your function into orsf()
, and it
will be used in place of Harrell’s C-statistic:
-
+
# instead of copy/pasting the modeling code and then modifying it,
# you can just use orsf_update.
diff --git a/articles/pd.html b/articles/pd.html
index 604e6dc4..c4ff680a 100644
--- a/articles/pd.html
+++ b/articles/pd.html
@@ -35,7 +35,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/orsf_update.html b/reference/orsf_update.html
index cde91f9b..d248149f 100644
--- a/reference/orsf_update.html
+++ b/reference/orsf_update.html
@@ -10,7 +10,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/orsf_vi.html b/reference/orsf_vi.html
index dc28dc69..4c703887 100644
--- a/reference/orsf_vi.html
+++ b/reference/orsf_vi.html
@@ -12,7 +12,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/orsf_vint.html b/reference/orsf_vint.html
index 8e379481..b4c5f0df 100644
--- a/reference/orsf_vint.html
+++ b/reference/orsf_vint.html
@@ -14,7 +14,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/orsf_vs.html b/reference/orsf_vs.html
index 660d33f6..6e342d47 100644
--- a/reference/orsf_vs.html
+++ b/reference/orsf_vs.html
@@ -10,7 +10,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/pbc_orsf.html b/reference/pbc_orsf.html
index b5dead67..84928c0b 100644
--- a/reference/pbc_orsf.html
+++ b/reference/pbc_orsf.html
@@ -12,7 +12,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/penguins_orsf.html b/reference/penguins_orsf.html
index 14e7f541..93877cae 100644
--- a/reference/penguins_orsf.html
+++ b/reference/penguins_orsf.html
@@ -20,7 +20,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/pred_spec_auto.html b/reference/pred_spec_auto.html
index a90e3558..a6ed4383 100644
--- a/reference/pred_spec_auto.html
+++ b/reference/pred_spec_auto.html
@@ -18,7 +18,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/predict.ObliqueForest.html b/reference/predict.ObliqueForest.html
index 0040973c..53904b62 100644
--- a/reference/predict.ObliqueForest.html
+++ b/reference/predict.ObliqueForest.html
@@ -14,7 +14,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/print.ObliqueForest.html b/reference/print.ObliqueForest.html
index 6934b4e7..a192e3a7 100644
--- a/reference/print.ObliqueForest.html
+++ b/reference/print.ObliqueForest.html
@@ -38,7 +38,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/reference/print.orsf_summary_uni.html b/reference/print.orsf_summary_uni.html
index 4cee11bc..8f7d7a22 100644
--- a/reference/print.orsf_summary_uni.html
+++ b/reference/print.orsf_summary_uni.html
@@ -10,7 +10,7 @@
aorsf
- 0.1.4.9001
+ 0.1.5
diff --git a/search.json b/search.json
index afee3c38..ab934912 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to aorsf","title":"Contributing to aorsf","text":"Want contribute aorsf? Great! aorsf initially stable state development, great deal active subsequent development envisioned. outline propose change aorsf. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to aorsf","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to aorsf","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to aorsf","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"ropensci/aorsf\", fork = TRUE). Install development dependencies devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to aorsf","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to aorsf","text":"Please note aorsf project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://bcjaeger.github.io/aorsf/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 aorsf authors (Byron C. Jaeger, Sawyer Welden, Nicholas M. Pajewski) Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"background","dir":"Articles","previous_headings":"","what":"Background","title":"Introduction to aorsf","text":"oblique random forest (RF) extension traditional (axis-based) RF. Instead using single variable split data grow new branches, trees oblique RF use weighted combination multiple variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"oblique-rfs-for-survival-classification-and-regression","dir":"Articles","previous_headings":"","what":"Oblique RFs for survival, classification, and regression","title":"Introduction to aorsf","text":"purpose aorsf (‘’ short accelerated) provide unifying framework fit oblique RFs can scale adequately large data sets. fastest algorithms available package used default often equivalent prediction accuracy computational approaches. center piece aorsf orsf() function. initial versions aorsf, orsf() function fit oblique random survival forests, now allows classification, regression, survival forests. (may introduce orf() function future name orsf() misleading users.) classification, fit oblique RF predict penguin species using penguin data magnificent palmerpenguins R package regression, use data predict bill length penguins: personal favorite oblique survival RF accelerated Cox regression great combination prediction accuracy computational efficiency (see JCGS paper). , predict mortality risk following diagnosis primary biliary cirrhosis: may notice first input aorsf data. design choice makes easier use orsf pipes (.e., %>% |>). instance,","code":"# An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, formula = species ~ .) penguin_fit #> ---------- Oblique random classification forest #> #> Linear combinations: Accelerated Logistic regression #> N observations: 333 #> N classes: 3 #> N trees: 500 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 5.542 #> Min observations in leaf: 5 #> OOB stat value: 1.00 #> OOB stat type: AUC-ROC #> Variable importance: anova #> #> ----------------------------------------- # An oblique regression RF bill_fit <- orsf(data = penguins_orsf, formula = bill_length_mm ~ .) bill_fit #> ---------- Oblique random regression forest #> #> Linear combinations: Accelerated Linear regression #> N observations: 333 #> N trees: 500 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 49.958 #> Min observations in leaf: 5 #> OOB stat value: 0.81 #> OOB stat type: RSQ #> Variable importance: anova #> #> ----------------------------------------- # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 21.6 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.77 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> ----------------------------------------- library(dplyr) pbc_fit <- pbc_orsf |> select(-id) |> orsf(formula = Surv(time, status) ~ ., n_tree = 5)"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"interpretation","dir":"Articles","previous_headings":"","what":"Interpretation","title":"Introduction to aorsf","text":"aorsf includes several functions dedicated interpretation ORSFs, estimation partial dependence variable importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"variable-importance","dir":"Articles","previous_headings":"Interpretation","what":"Variable importance","title":"Introduction to aorsf","text":"multiple methods compute variable importance, can applied type oblique forest. compute negation importance, ORSF multiplies coefficient variable -1 re-computes --sample (sometimes referred --bag) accuracy ORSF model. can also compute variable importance using permutation, classical approach noises predictor assigned resulting degradation prediction accuracy importance predictor. faster alternative permutation negation importance ANOVA importance, computes proportion times variable obtains low p-value (p < 0.01) forest grown.","code":"orsf_vi_negate(pbc_fit) #> bili age copper ast sex #> 0.1468851774 0.0606952129 0.0246435580 0.0224269123 0.0175587328 #> trig alk.phos protime edema chol #> 0.0096895007 0.0093198869 0.0086039712 0.0006382134 -0.0015687436 #> ascites platelet hepato spiders trt #> -0.0060269468 -0.0102280228 -0.0108549805 -0.0113883544 -0.0201827916 #> stage albumin #> -0.0221462608 -0.0224072750 orsf_vi_permute(penguin_fit) #> bill_length_mm flipper_length_mm bill_depth_mm island #> 0.1724983056 0.1024126291 0.0751508005 0.0676077927 #> body_mass_g sex year #> 0.0626576714 0.0186787401 0.0009286133 orsf_vi_anova(bill_fit) #> species sex island flipper_length_mm #> 0.34861430 0.21055730 0.11626929 0.08843136 #> body_mass_g bill_depth_mm year #> 0.07642887 0.06077348 0.01475293"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"Interpretation","what":"Partial dependence (PD)","title":"Introduction to aorsf","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. PD, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"Interpretation","what":"Individual conditional expectations (ICE)","title":"Introduction to aorsf","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"what-about-the-original-orsf","dir":"Articles","previous_headings":"","what":"What about the original ORSF?","title":"Introduction to aorsf","text":"original ORSF (.e., obliqueRSF) used glmnet find linear combinations inputs. aorsf allows users implement approach using orsf_control_survival(method = 'net') function: net forests fit lot faster original ORSF function obliqueRSF. However, net forests still much slower cph ones.","code":"orsf_net <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, control = orsf_control_survival(method = 'net'))"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"aorsf-and-other-machine-learning-software","dir":"Articles","previous_headings":"","what":"aorsf and other machine learning software","title":"Introduction to aorsf","text":"unique feature aorsf fast algorithms fit ORSF ensembles. RLT obliqueRSF fit oblique random survival forests, aorsf faster. ranger randomForestSRC fit survival forests, neither package supports oblique splitting. obliqueRF fits oblique random forests classification regression, survival. PPforest fits oblique random forests classification survival. Note: default prediction behavior aorsf models produce predicted risk specific prediction horizon, default ranger randomForestSRC. think change future, computing time independent predictions aorsf helpful.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"learning-more","dir":"Articles","previous_headings":"","what":"Learning more","title":"Introduction to aorsf","text":"aorsf began dedicated package oblique random survival forests, papers published far focused survival analysis risk prediction. However, routines regression classification oblique RFs aorsf high overlap survival ones. See orsf details oblique random survival forests. see JCGS paper details algorithms used specifically aorsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"go-faster","dir":"Articles","previous_headings":"","what":"Go faster","title":"Tips to speed up computation","text":"Analyses can slow crawl models need hours run. article find tricks prevent bottleneck using orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-specify-a-control","dir":"Articles","previous_headings":"","what":"Don’t specify a control","title":"Tips to speed up computation","text":"default control orsf() NULL , unspecified, orsf() pick fastest possible control depending type forest grown. default control run-time compared approaches can striking. example:","code":"time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 5) ) # unspecified control is much faster time_net['elapsed'] / time_fast['elapsed'] #> elapsed #> 45.80952"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"use-n_thread","dir":"Articles","previous_headings":"","what":"Use n_thread","title":"Tips to speed up computation","text":"n_thread argument uses multi-threading run aorsf functions parallel possible. know many threads want, e.g. want exactly 5, set n_thread = 5. aren’t sure many threads available want use feasible amount, using n_thread = 0 (default) tells aorsf . Note: sometimes multi-threading possible. example, R single threaded language, multi-threading applied orsf() needs call R functions C++, occurs customized R function used find linear combination variables compute prediction accuracy.","code":"# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0)"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"do-less","dir":"Articles","previous_headings":"","what":"Do less","title":"Tips to speed up computation","text":"inputs orsf() can adjusted make run faster: set n_retry 0 set oobag_pred_type 'none' set importance 'none' increase split_min_events, split_min_obs, leaf_min_events, leaf_min_obs make trees stop growing sooner increase split_min_stat enforce strict requirements growing deeper trees. Applying tips: modifying inputs can make orsf() run faster, can also impact prediction accuracy.","code":"orsf(pbc_orsf, formula = time+status~., n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10)"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"show-progress","dir":"Articles","previous_headings":"","what":"Show progress","title":"Tips to speed up computation","text":"Setting verbose_progress = TRUE doesn’t make anything run faster, can help make feel like things running less slow.","code":"verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE) #> Growing trees: 100%. #> Computing predictions: 100%."},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-wait--estimate","dir":"Articles","previous_headings":"","what":"Don’t wait. Estimate!","title":"Tips to speed up computation","text":"Instead running model hoping fast, can estimate long specification model take using no_fit = TRUE call orsf().","code":"fit_spec <- orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 2000, no_fit = TRUE) # how much time it takes to estimate training time: system.time( time_est <- orsf_time_to_train(fit_spec, n_tree_subset = 5) ) #> user system elapsed #> 0.267 0.001 0.267 # the estimated training time: time_est #> Time difference of 106.6964 secs"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-data","dir":"Articles","previous_headings":"","what":"Out-of-bag data","title":"Out-of-bag predictions and evaluation","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. ‘--bag’ data instances bootstrapped training set.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-predictions-and-error","dir":"Articles","previous_headings":"","what":"Out-of-bag predictions and error","title":"Out-of-bag predictions and evaluation","text":"tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. --bag prediction error plays central role routines estimate variable importance, e.g. negation importance. fit oblique random survival forest plot distribution ensemble --bag predictions. Next, let’s check --bag accuracy fit: --bag estimate Harrell’s C-index (default method evaluate --bag predictions) 0.7419135.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, oobag_pred_type = 'surv', n_tree = 5, oobag_pred_horizon = 2000) hist(fit$pred_oobag, main = 'Out-of-bag survival predictions at t=2,000') # what function is used to evaluate out-of-bag predictions? fit$eval_oobag$stat_type #> [1] \"Harrell's C-index\" # what is the output from this function? fit$eval_oobag$stat_values #> [,1] #> [1,] 0.7419135"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"monitoring-out-of-bag-error","dir":"Articles","previous_headings":"","what":"Monitoring out-of-bag error","title":"Out-of-bag predictions and evaluation","text":"--bag data set contains one-third training set, --bag error estimate usually converges stable value trees added forest. want monitor convergence --bag error oblique random survival forest, can set oobag_eval_every compute --bag error every oobag_eval_every tree. example, let’s compute --bag error fitting tree forest 50 trees: general, least 500 trees recommended random forest fit. ’re just using 10 illustration.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_type = 'surv', oobag_pred_horizon = 2000, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag C-statistic computed after each new tree is grown.', xlab = 'Number of trees grown', ylab = fit$eval_oobag$stat_type ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"user-supplied-out-of-bag-evaluation-functions","dir":"Articles","previous_headings":"","what":"User-supplied out-of-bag evaluation functions","title":"Out-of-bag predictions and evaluation","text":"cases, may want control --bag error estimated. example, let’s use Brier score SurvMetrics package: two ways apply function compute --bag error. First, can apply function --bag survival predictions stored ‘aorsf’ objects, e.g: Second, can pass function orsf(), used place Harrell’s C-statistic:","code":"oobag_brier_surv <- function(y_mat, w_vec, s_vec){ # use if SurvMetrics is available if(requireNamespace(\"SurvMetrics\")){ return( # output is numeric vector of length 1 as.numeric( SurvMetrics::Brier( object = Surv(time = y_mat[, 1], event = y_mat[, 2]), pre_sp = s_vec, # t_star in Brier() should match oob_pred_horizon in orsf() t_star = 2000 ) ) ) } # if not available, use a dummy version mean( (y_mat[,2] - (1-s_vec))^2 ) } oobag_brier_surv(y_mat = pbc_orsf[,c('time', 'status')], s_vec = fit$pred_oobag) #> Loading required namespace: SurvMetrics #> [1] 0.11869 # instead of copy/pasting the modeling code and then modifying it, # you can just use orsf_update. fit_brier <- orsf_update(fit, oobag_fun = oobag_brier_surv) plot( x = seq(1, 20, by = 1), y = fit_brier$eval_oobag$stat_values, main = 'Out-of-bag error computed after each new tree is grown.', sub = 'For the Brier score, lower values indicate more accurate predictions', xlab = 'Number of trees grown', ylab = \"Brier score\" ) lines(x=seq(1, 20), y = fit_brier$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"specific-instructions-on-user-supplied-functions","dir":"Articles","previous_headings":"User-supplied out-of-bag evaluation functions","what":"Specific instructions on user-supplied functions","title":"Out-of-bag predictions and evaluation","text":"use oobag_fun note following: oobag_fun three inputs: y_mat, w_vec, s_vec survival trees, y_mat two column matrix first column named ‘time’ second named ‘status’. classification trees, y_mat matrix number columns = number distinct classes outcome. regression, y_mat matrix one column. s_vec numeric vector containing predictions oobag_fun return numeric output length 1","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"notes","dir":"Articles","previous_headings":"","what":"Notes","title":"Out-of-bag predictions and evaluation","text":"evaluating --bag error: oobag_pred_horizon input orsf() determines prediction horizon --bag predictions. prediction horizon needs specified evaluate prediction accuracy cases, examples . sure check case using functions, , sure oobag_pred_horizon matches prediction horizon used custom function. functions expect predicted risk (.e., 1 - predicted survival), others expect predicted survival.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"","what":"Partial dependence (PD)","title":"PD and ICE curves with ORSF","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. can compute PD individual conditional expectation (ICE) three ways: using -bag predictions training data. -bag PD indicates relationships model learned training. helpful goal interpret model. using --bag predictions training data. --bag PD indicates relationships model learned training using --bag data simulates application model new data. helpful want test model’s reliability fairness new data don’t access large testing set. using predictions new set data. New data PD shows model predicts outcomes observations seen. helpful want test model’s reliability fairness.","code":"library(aorsf) library(ggplot2)"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"classification","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Classification","title":"PD and ICE curves with ORSF","text":"Begin fitting oblique classification random forest: Compute PD using --bag data flipper_length_mm = c(190, 210). Note predicted probabilities returned class probabilities mean column sum 1 take sum class specific value pred_spec variables. example, isn’t case median predicted probability!","code":"set.seed(329) index_train <- sample(nrow(penguins_orsf), 150) penguins_orsf_train <- penguins_orsf[index_train, ] penguins_orsf_test <- penguins_orsf[-index_train, ] fit_clsf <- orsf(data = penguins_orsf_train, formula = species ~ .) pred_spec <- list(flipper_length_mm = c(190, 210)) pd_oob <- orsf_pd_oob(fit_clsf, pred_spec = pred_spec) pd_oob #> Key: #> class flipper_length_mm mean lwr medn upr #> #> 1: Adelie 190 0.6182417 0.206899034 0.75537171 0.9796439 #> 2: Adelie 210 0.4348386 0.019519733 0.56802082 0.8620694 #> 3: Chinstrap 190 0.2114905 0.018420139 0.15561560 0.7174734 #> 4: Chinstrap 210 0.1806274 0.020409141 0.09928047 0.6990198 #> 5: Gentoo 190 0.1702678 0.001281382 0.02830728 0.5733438 #> 6: Gentoo 210 0.3845340 0.072260715 0.20258335 0.9519486 sum(pd_oob[flipper_length_mm == 190, mean]) #> [1] 1 sum(pd_oob[flipper_length_mm == 190, medn]) #> [1] 0.9392946"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"regression","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Regression","title":"PD and ICE curves with ORSF","text":"Begin fitting oblique regression random forest: Compute PD using new data flipper_length_mm = c(190, 210). can also let pred_spec_auto pick reasonable values like : default, combinations variables used. However, can also look variables one one, separately, like : can also bypass bells whistles using data.frame pred_spec. (Just make sure request values exist training data.)","code":"set.seed(329) index_train <- sample(nrow(penguins_orsf), 150) penguins_orsf_train <- penguins_orsf[index_train, ] penguins_orsf_test <- penguins_orsf[-index_train, ] fit_regr <- orsf(data = penguins_orsf_train, formula = bill_length_mm ~ .) pred_spec <- list(flipper_length_mm = c(190, 210)) pd_new <- orsf_pd_new(fit_regr, pred_spec = pred_spec, new_data = penguins_orsf_test) pd_new #> flipper_length_mm mean lwr medn upr #> #> 1: 190 42.96571 37.09805 43.69769 48.72301 #> 2: 210 45.66012 40.50693 46.31577 51.65163 pred_spec = pred_spec_auto(species, island, body_mass_g) pd_new <- orsf_pd_new(fit_regr, pred_spec = pred_spec, new_data = penguins_orsf_test) pd_new #> species island body_mass_g mean lwr medn upr #> #> 1: Adelie Biscoe 3200 40.31374 37.24373 40.31967 44.22824 #> 2: Chinstrap Biscoe 3200 45.10582 42.63342 45.10859 47.60119 #> 3: Gentoo Biscoe 3200 42.81649 40.19221 42.55664 46.84035 #> 4: Adelie Dream 3200 40.16219 36.95895 40.34633 43.90681 #> 5: Chinstrap Dream 3200 46.21778 43.53954 45.90929 49.19173 #> 6: Gentoo Dream 3200 42.60465 39.89647 42.63520 46.28769 #> 7: Adelie Torgersen 3200 39.91652 36.80227 39.79806 43.68842 #> 8: Chinstrap Torgersen 3200 44.27807 41.95470 44.40742 46.68848 #> 9: Gentoo Torgersen 3200 42.09510 39.49863 41.80049 45.81833 #> 10: Adelie Biscoe 3550 40.77971 38.04027 40.59561 44.57505 #> 11: Chinstrap Biscoe 3550 45.81304 43.52102 45.73116 48.36366 #> 12: Gentoo Biscoe 3550 43.31233 40.77355 43.03077 47.22936 #> 13: Adelie Dream 3550 40.77741 38.07399 40.78175 44.37273 #> 14: Chinstrap Dream 3550 47.30926 44.80493 46.77540 50.47092 #> 15: Gentoo Dream 3550 43.26955 40.86119 43.16204 46.89190 #> 16: Adelie Torgersen 3550 40.25780 37.35251 40.07871 44.04576 #> 17: Chinstrap Torgersen 3550 44.77911 42.60161 44.81944 47.14986 #> 18: Gentoo Torgersen 3550 42.49520 39.95866 42.14160 46.26237 #> 19: Adelie Biscoe 3975 41.61744 38.94515 41.36634 45.38752 #> 20: Chinstrap Biscoe 3975 46.59363 44.59970 46.44923 49.11457 #> 21: Gentoo Biscoe 3975 44.07857 41.60792 43.74562 47.85109 #> 22: Adelie Dream 3975 41.50511 39.06187 41.24741 45.13027 #> 23: Chinstrap Dream 3975 48.14978 45.87390 47.54867 51.50683 #> 24: Gentoo Dream 3975 44.01928 41.70577 43.84099 47.50470 #> 25: Adelie Torgersen 3975 40.94764 38.12519 40.66759 44.73689 #> 26: Chinstrap Torgersen 3975 45.44820 43.49986 45.44036 47.63243 #> 27: Gentoo Torgersen 3975 43.13791 40.70628 42.70627 46.87306 #> 28: Adelie Biscoe 4700 42.93914 40.48463 42.44768 46.81756 #> 29: Chinstrap Biscoe 4700 47.18517 45.40866 47.07739 49.55747 #> 30: Gentoo Biscoe 4700 45.32541 43.08173 44.93498 49.23391 #> 31: Adelie Dream 4700 42.73806 40.44229 42.22226 46.49936 #> 32: Chinstrap Dream 4700 48.37278 46.34335 48.00781 51.18955 #> 33: Gentoo Dream 4700 45.09132 42.88328 44.79530 48.82180 #> 34: Adelie Torgersen 4700 42.09349 39.72074 41.56168 45.68838 #> 35: Chinstrap Torgersen 4700 46.16807 44.38410 46.09525 48.35127 #> 36: Gentoo Torgersen 4700 44.31621 42.18968 43.81773 47.98024 #> 37: Adelie Biscoe 5300 43.89769 41.43335 43.28504 48.10892 #> 38: Chinstrap Biscoe 5300 47.53721 45.66038 47.52770 49.88701 #> 39: Gentoo Biscoe 5300 46.16115 43.81722 45.59309 50.57469 #> 40: Adelie Dream 5300 43.59846 41.25825 43.24518 47.46193 #> 41: Chinstrap Dream 5300 48.48139 46.36282 48.25679 51.02996 #> 42: Gentoo Dream 5300 45.91819 43.62832 45.54110 49.91622 #> 43: Adelie Torgersen 5300 42.92879 40.66576 42.31072 46.76406 #> 44: Chinstrap Torgersen 5300 46.59576 44.80400 46.49196 49.03906 #> 45: Gentoo Torgersen 5300 45.11384 42.95190 44.51289 49.27629 #> species island body_mass_g mean lwr medn upr pd_new <- orsf_pd_new(fit_regr, expand_grid = FALSE, pred_spec = pred_spec, new_data = penguins_orsf_test) pd_new #> variable value level mean lwr medn upr #> #> 1: species NA Adelie 41.90271 37.10417 41.51723 48.51478 #> 2: species NA Chinstrap 47.11314 42.40419 46.96478 51.51392 #> 3: species NA Gentoo 44.37038 39.87306 43.89889 51.21635 #> 4: island NA Biscoe 44.21332 37.22711 45.27862 51.21635 #> 5: island NA Dream 44.43354 37.01471 45.57261 51.51392 #> 6: island NA Torgersen 43.29539 37.01513 44.26924 49.84391 #> 7: body_mass_g 3200 42.84625 37.03978 43.95991 49.19173 #> 8: body_mass_g 3550 43.53326 37.56730 44.43756 50.47092 #> 9: body_mass_g 3975 44.30431 38.31567 45.22089 51.50683 #> 10: body_mass_g 4700 45.22525 39.88199 46.34680 51.18955 #> 11: body_mass_g 5300 45.91412 40.84742 46.95327 51.48851 custom_pred_spec <- data.frame(species = 'Adelie', island = 'Biscoe') pd_new <- orsf_pd_new(fit_regr, pred_spec = custom_pred_spec, new_data = penguins_orsf_test) pd_new #> species island mean lwr medn upr #> #> 1: Adelie Biscoe 41.98024 37.22711 41.65252 48.51478"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"survival","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Survival","title":"PD and ICE curves with ORSF","text":"Begin fitting oblique survival random forest: Compute PD using -bag data bili = c(1,2,3,4,5): don’t specific values variable mind, let pred_spec_auto pick : Specify pred_horizon get PD value:","code":"set.seed(329) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit_surv <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5) pd_train <- orsf_pd_inb(fit_surv, pred_spec = list(bili = 1:5)) pd_train #> pred_horizon bili mean lwr medn upr #> #> 1: 1826.25 1 0.2575450 0.02234786 0.1334170 0.8917942 #> 2: 1826.25 2 0.3130469 0.06853733 0.1906695 0.9203372 #> 3: 1826.25 3 0.3711963 0.11409793 0.2582027 0.9416791 #> 4: 1826.25 4 0.4248968 0.15648381 0.3334579 0.9591581 #> 5: 1826.25 5 0.4671699 0.20123406 0.3855137 0.9655296 pd_train <- orsf_pd_inb(fit_surv, pred_spec_auto(bili)) pd_train #> pred_horizon bili mean lwr medn upr #> #> 1: 1826.25 0.590 0.2493753 0.02035041 0.1250263 0.8823385 #> 2: 1826.25 0.725 0.2517103 0.02060111 0.1281814 0.8836536 #> 3: 1826.25 1.500 0.2807082 0.03964900 0.1601715 0.9040617 #> 4: 1826.25 3.500 0.3968251 0.13431288 0.2934565 0.9501230 #> 5: 1826.25 7.210 0.5352155 0.27869513 0.4658256 0.9782084 pd_train <- orsf_pd_inb(fit_surv, pred_spec_auto(bili), pred_horizon = seq(500, 3000, by = 500)) pd_train #> pred_horizon bili mean lwr medn upr #> #> 1: 500 0.590 0.06217164 0.0004433990 0.008765301 0.5918852 #> 2: 1000 0.590 0.14282695 0.0057937418 0.056509484 0.7381953 #> 3: 1500 0.590 0.20944972 0.0136094784 0.092379507 0.8577223 #> 4: 2000 0.590 0.26917477 0.0230476894 0.146421502 0.8918696 #> 5: 2500 0.590 0.31901518 0.0631155452 0.203673185 0.9034059 #> 6: 3000 0.590 0.39244000 0.0911566314 0.302726475 0.9239494 #> 7: 500 0.725 0.06287876 0.0004462367 0.009001904 0.5980510 #> 8: 1000 0.725 0.14409310 0.0063321712 0.056833294 0.7448126 #> 9: 1500 0.725 0.21143724 0.0140736894 0.093685200 0.8597396 #> 10: 2000 0.725 0.27150368 0.0235448705 0.147022224 0.8940497 #> 11: 2500 0.725 0.32014805 0.0626303822 0.203946002 0.9073003 #> 12: 3000 0.725 0.39518173 0.0911457406 0.308428469 0.9252028 #> 13: 500 1.500 0.06712295 0.0012717884 0.011028398 0.6240769 #> 14: 1000 1.500 0.15802582 0.0114789623 0.068332010 0.7683888 #> 15: 1500 1.500 0.23407183 0.0287320952 0.117289745 0.8789647 #> 16: 2000 1.500 0.30235436 0.0467927208 0.180096425 0.9143235 #> 17: 2500 1.500 0.35354874 0.0845866747 0.238415966 0.9265099 #> 18: 3000 1.500 0.43604287 0.1311103304 0.348078730 0.9438196 #> 19: 500 3.500 0.08677320 0.0052087533 0.028244374 0.6741102 #> 20: 1000 3.500 0.22427808 0.0519179775 0.139857107 0.8277541 #> 21: 1500 3.500 0.32788654 0.0901983241 0.217982772 0.9371150 #> 22: 2000 3.500 0.41708208 0.1445328597 0.313224605 0.9566091 #> 23: 2500 3.500 0.49334883 0.2195110942 0.402932569 0.9636221 #> 24: 3000 3.500 0.56094391 0.2647541788 0.503509668 0.9734948 #> 25: 500 7.210 0.12591911 0.0220920570 0.063283130 0.7522611 #> 26: 1000 7.210 0.32642477 0.1353851175 0.259731888 0.8879218 #> 27: 1500 7.210 0.46409472 0.2181840827 0.387142510 0.9700903 #> 28: 2000 7.210 0.55116942 0.2912654769 0.484118150 0.9811496 #> 29: 2500 7.210 0.62008114 0.3709845684 0.568822502 0.9844945 #> 30: 3000 7.210 0.68030697 0.4247511750 0.646009789 0.9888637 #> pred_horizon bili mean lwr medn upr"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"one-variable-moving-horizon","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"One variable, moving horizon","title":"PD and ICE curves with ORSF","text":"next sections, update orsf_fit include data pbc_orsf instead just training sample: effect predictor varies time? Partial dependence can show . inspection, can see males higher risk females difference risk grows time. can also seen viewing ratio expected risk time: get view PD number variables training data, use orsf_summarize_uni(). function computes --bag PD important n_variables returns nicely formatted view output: ‘summary’ object can converted data.table downstream plotting tables.","code":"# a rare case of modify_in_place = TRUE orsf_update(fit_surv, data = pbc_orsf, modify_in_place = TRUE) fit_surv #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 500 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 21.038 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.84 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> ----------------------------------------- pd_sex_tv <- orsf_pd_oob(fit_surv, pred_spec = pred_spec_auto(sex), pred_horizon = seq(365, 365*5)) ggplot(pd_sex_tv) + aes(x = pred_horizon, y = mean, color = sex) + geom_line() + labs(x = 'Time since baseline', y = 'Expected risk') library(data.table) ratio_tv <- pd_sex_tv[ , .(ratio = mean[sex == 'm'] / mean[sex == 'f']), by = pred_horizon ] ggplot(ratio_tv, aes(x = pred_horizon, y = ratio)) + geom_line(color = 'grey') + geom_smooth(color = 'black', se = FALSE) + labs(x = 'time since baseline', y = 'ratio in expected risk for males versus females') pd_smry <- orsf_summarize_uni(fit_surv, n_variables = 4) pd_smry #> #> -- ascites (VI Rank: 1) ------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> #> 0 0.3083328 0.1985589 0.06581247 0.5241336 #> 1 0.4702396 0.3975953 0.27481738 0.6564321 #> #> -- bili (VI Rank: 2) ---------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> #> 0.60 0.2356543 0.1536301 0.05872720 0.3719578 #> 0.80 0.2398021 0.1609720 0.06167673 0.3776136 #> 1.40 0.2613612 0.1809950 0.07893386 0.4064484 #> 3.52 0.3702763 0.3118827 0.17050712 0.5447088 #> 7.25 0.4780580 0.4406202 0.29442977 0.6434075 #> #> -- edema (VI Rank: 3) --------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> #> 0 0.3035731 0.1840849 0.06509174 0.5228237 #> 0.5 0.3558716 0.2649457 0.11132293 0.5831396 #> 1 0.4693915 0.3961470 0.28211662 0.6331870 #> #> -- copper (VI Rank: 4) -------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> #> 25.5 0.2632768 0.1622871 0.05581251 0.4308234 #> 42.8 0.2707739 0.1703028 0.05887747 0.4418590 #> 74.0 0.2908707 0.1940176 0.07155433 0.4768302 #> 129 0.3444258 0.2651729 0.11918406 0.5574967 #> 214 0.4245218 0.3577346 0.21408331 0.6238041 #> #> Predicted risk at time t = 1826.25 for top 4 predictors head(as.data.table(pd_smry)) #> variable importance Value Mean Median 25th % 75th % #> #> 1: ascites 0.4960630 0 0.3083328 0.1985589 0.06581247 0.5241336 #> 2: ascites 0.4960630 1 0.4702396 0.3975953 0.27481738 0.6564321 #> 3: bili 0.4160074 0.60 0.2356543 0.1536301 0.05872720 0.3719578 #> 4: bili 0.4160074 0.80 0.2398021 0.1609720 0.06167673 0.3776136 #> 5: bili 0.4160074 1.40 0.2613612 0.1809950 0.07893386 0.4064484 #> 6: bili 0.4160074 3.52 0.3702763 0.3118827 0.17050712 0.5447088 #> pred_horizon level #> #> 1: 1826.25 0 #> 2: 1826.25 1 #> 3: 1826.25 #> 4: 1826.25 #> 5: 1826.25 #> 6: 1826.25 "},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"multiple-variables-jointly","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Multiple variables, jointly","title":"PD and ICE curves with ORSF","text":"Partial dependence can show expected value model’s predictions function specific predictor, function multiple predictors. instance, can estimate predicted risk joint function bili, edema, trt: inspection, model’s predictions indicate slightly lower risk placebo group, seem change much different values bili edema. clear increase predicted risk higher levels edema higher levels bili slope predicted risk function bili appears highest among patients edema 0.5. effect bili modified edema 0.5? quick sanity check coxph suggests .","code":"pred_spec = pred_spec_auto(bili, edema, trt) pd_bili_edema <- orsf_pd_oob(fit_surv, pred_spec) ggplot(pd_bili_edema) + aes(x = bili, y = medn, col = trt, linetype = edema) + geom_line() + labs(y = 'Expected predicted risk') library(survival) pbc_orsf$edema_05 <- ifelse(pbc_orsf$edema == '0.5', 'yes', 'no') fit_cph <- coxph(Surv(time,status) ~ edema_05 * bili, data = pbc_orsf) anova(fit_cph) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> edema_05 -546.83 6.7248 1 0.009508 ** #> bili -513.59 66.4689 1 3.555e-16 *** #> edema_05:bili -510.54 6.1112 1 0.013433 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"find-interactions-using-pd","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Find interactions using PD","title":"PD and ICE curves with ORSF","text":"Random forests good using interactions, less good telling . Use orsf_vint() apply method variable interaction scoring PD described Greenwell et al (2018). can take little lots predictors, seems work best continuous continuous interactions. Interactions categorical variables sometimes - - scored. scores include partial dependence values can pull plot: use sanity check coxph see interactions detected using standard test: Note: Caution warranted interpreting statistical hypotheses motivated data tested . Results like p-values interaction shown interpreted exploratory.","code":"# use just the continuous variables preds <- names(fit_surv$get_means()) vint_scores <- orsf_vint(fit_surv, predictors = preds) vint_scores #> interaction score pd_values #> #> 1: albumin..protime 1.15973071 #> 2: copper..protime 0.79587419 #> 3: bili..chol 0.74163213 #> 4: age..bili 0.74097713 #> 5: bili..copper 0.71610872 #> 6: bili..albumin 0.67849272 #> 7: bili..protime 0.59576252 #> 8: albumin..ast 0.59439149 #> 9: bili..platelet 0.56627946 #> 10: ast..protime 0.56220910 #> 11: albumin..copper 0.54057277 #> 12: bili..trig 0.52794450 #> 13: copper..trig 0.50661291 #> 14: age..protime 0.45818900 #> 15: age..ast 0.44410913 #> 16: age..platelet 0.42607794 #> 17: albumin..platelet 0.41293884 #> 18: chol..albumin 0.39547725 #> 19: platelet..protime 0.38674364 #> 20: age..copper 0.36230121 #> 21: copper..ast 0.35089611 #> 22: trig..protime 0.29339926 #> 23: bili..alk.phos 0.25729691 #> 24: chol..protime 0.24424042 #> 25: copper..alk.phos 0.22156162 #> 26: bili..ast 0.21483757 #> 27: chol..trig 0.20737852 #> 28: trig..platelet 0.18819009 #> 29: age..alk.phos 0.17844523 #> 30: chol..copper 0.17025610 #> 31: copper..platelet 0.16009542 #> 32: age..albumin 0.15186211 #> 33: alk.phos..trig 0.14212275 #> 34: age..trig 0.12185330 #> 35: albumin..alk.phos 0.12061152 #> 36: chol..ast 0.10767371 #> 37: chol..alk.phos 0.10712377 #> 38: ast..platelet 0.09157413 #> 39: alk.phos..protime 0.08277287 #> 40: alk.phos..ast 0.08062752 #> 41: ast..trig 0.07157470 #> 42: age..chol 0.05564449 #> 43: chol..platelet 0.04813670 #> 44: alk.phos..platelet 0.04760897 #> 45: albumin..trig 0.04689324 #> interaction score pd_values # top scoring interaction pd_top <- vint_scores$pd_values[[1]] # center pd values so it's easier to see the interaction effect pd_top[, mean := mean - mean[1], by = var_2_value] ggplot(pd_top) + aes(x = var_1_value, y = mean, color = factor(var_2_value), group = factor(var_2_value)) + geom_line() + labs(x = \"albumin\", y = \"predicted mortality (centered)\", color = \"protime\") # test the top score (expect strong interaction) fit_cph <- coxph(Surv(time,status) ~ albumin * protime, data = pbc_orsf) anova(fit_cph) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> albumin -526.29 47.801 1 4.717e-12 *** #> protime -514.89 22.806 1 1.792e-06 *** #> albumin:protime -511.76 6.252 1 0.01241 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"","what":"Individual conditional expectations (ICE)","title":"PD and ICE curves with ORSF","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"classification-1","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"Classification","title":"PD and ICE curves with ORSF","text":"Compute ICE using --bag data flipper_length_mm = c(190, 210). two identifiers output: id_variable identifier current value variable(s) data. redundant one variable, helpful multiple variables. id_row identifier observation original data. Note predicted probabilities returned class observation data. Predicted probabilities given observation given variable value sum 1. example,","code":"pred_spec <- list(flipper_length_mm = c(190, 210)) ice_oob <- orsf_ice_oob(fit_clsf, pred_spec = pred_spec) ice_oob #> Key: #> id_variable id_row class flipper_length_mm pred #> #> 1: 1 1 Adelie 190 0.92045213 #> 2: 1 2 Adelie 190 0.80427932 #> 3: 1 3 Adelie 190 0.84342550 #> 4: 1 4 Adelie 190 0.93514694 #> 5: 1 5 Adelie 190 0.97172229 #> --- #> 896: 2 146 Gentoo 210 0.25779089 #> 897: 2 147 Gentoo 210 0.04806888 #> 898: 2 148 Gentoo 210 0.07926342 #> 899: 2 149 Gentoo 210 0.84597108 #> 900: 2 150 Gentoo 210 0.10191162 ice_oob %>% .[flipper_length_mm == 190] %>% .[id_row == 1] %>% .[['pred']] %>% sum() #> [1] 1"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"regression-1","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"Regression","title":"PD and ICE curves with ORSF","text":"Compute ICE using new data flipper_length_mm = c(190, 210). can also let pred_spec_auto pick reasonable values like : default, combinations variables used. However, can also look variables one one, separately, like : can also bypass bells whistles using data.frame pred_spec. (Just make sure request values exist training data.)","code":"pred_spec <- list(flipper_length_mm = c(190, 210)) ice_new <- orsf_ice_new(fit_regr, pred_spec = pred_spec, new_data = penguins_orsf_test) ice_new #> id_variable id_row flipper_length_mm pred #> #> 1: 1 1 190 37.94483 #> 2: 1 2 190 37.61595 #> 3: 1 3 190 37.53681 #> 4: 1 4 190 39.49476 #> 5: 1 5 190 38.95635 #> --- #> 362: 2 179 210 51.80471 #> 363: 2 180 210 47.27183 #> 364: 2 181 210 47.05031 #> 365: 2 182 210 50.39028 #> 366: 2 183 210 48.44774 pred_spec = pred_spec_auto(species, island, body_mass_g) ice_new <- orsf_ice_new(fit_regr, pred_spec = pred_spec, new_data = penguins_orsf_test) ice_new #> id_variable id_row species island body_mass_g pred #> #> 1: 1 1 Adelie Biscoe 3200 37.78339 #> 2: 1 2 Adelie Biscoe 3200 37.73273 #> 3: 1 3 Adelie Biscoe 3200 37.71248 #> 4: 1 4 Adelie Biscoe 3200 40.25782 #> 5: 1 5 Adelie Biscoe 3200 40.04074 #> --- #> 8231: 45 179 Gentoo Torgersen 5300 46.14559 #> 8232: 45 180 Gentoo Torgersen 5300 43.98050 #> 8233: 45 181 Gentoo Torgersen 5300 44.59837 #> 8234: 45 182 Gentoo Torgersen 5300 44.85146 #> 8235: 45 183 Gentoo Torgersen 5300 44.23710 ice_new <- orsf_ice_new(fit_regr, expand_grid = FALSE, pred_spec = pred_spec, new_data = penguins_orsf_test) ice_new #> id_variable id_row variable value level pred #> #> 1: 1 1 species NA Adelie 37.74136 #> 2: 1 2 species NA Adelie 37.42367 #> 3: 1 3 species NA Adelie 37.04598 #> 4: 1 4 species NA Adelie 39.89602 #> 5: 1 5 species NA Adelie 39.14848 #> --- #> 2009: 5 179 body_mass_g 5300 51.50196 #> 2010: 5 180 body_mass_g 5300 47.27055 #> 2011: 5 181 body_mass_g 5300 48.34064 #> 2012: 5 182 body_mass_g 5300 48.75828 #> 2013: 5 183 body_mass_g 5300 48.11020 custom_pred_spec <- data.frame(species = 'Adelie', island = 'Biscoe') ice_new <- orsf_ice_new(fit_regr, pred_spec = custom_pred_spec, new_data = penguins_orsf_test) ice_new #> id_variable id_row species island pred #> #> 1: 1 1 Adelie Biscoe 38.52327 #> 2: 1 2 Adelie Biscoe 38.32073 #> 3: 1 3 Adelie Biscoe 37.71248 #> 4: 1 4 Adelie Biscoe 41.68380 #> 5: 1 5 Adelie Biscoe 40.91140 #> --- #> 179: 1 179 Adelie Biscoe 43.09493 #> 180: 1 180 Adelie Biscoe 38.79455 #> 181: 1 181 Adelie Biscoe 39.37734 #> 182: 1 182 Adelie Biscoe 40.71952 #> 183: 1 183 Adelie Biscoe 39.34501"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"survival-1","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"Survival","title":"PD and ICE curves with ORSF","text":"Compute ICE using -bag data bili = c(1,2,3,4,5): don’t specific values variable mind, let pred_spec_auto pick : Specify pred_horizon get ICE value: Multi-prediction horizon ice comes minimal extra computational cost. Use fine grid time values assess whether predictors time-varying effects.","code":"ice_train <- orsf_ice_inb(fit_surv, pred_spec = list(bili = 1:5)) ice_train #> id_variable id_row pred_horizon bili pred #> #> 1: 1 1 1826.25 1 0.9015162 #> 2: 1 2 1826.25 1 0.1019426 #> 3: 1 3 1826.25 1 0.6821646 #> 4: 1 4 1826.25 1 0.3623411 #> 5: 1 5 1826.25 1 0.1374271 #> --- #> 1376: 5 272 1826.25 5 0.2650957 #> 1377: 5 273 1826.25 5 0.3065318 #> 1378: 5 274 1826.25 5 0.3503776 #> 1379: 5 275 1826.25 5 0.1652897 #> 1380: 5 276 1826.25 5 0.3549165 ice_train <- orsf_ice_inb(fit_surv, pred_spec_auto(bili)) ice_train #> id_variable id_row pred_horizon bili pred #> #> 1: 1 1 1826.25 0.60 0.89210440 #> 2: 1 2 1826.25 0.60 0.09186876 #> 3: 1 3 1826.25 0.60 0.65503431 #> 4: 1 4 1826.25 0.60 0.34622748 #> 5: 1 5 1826.25 0.60 0.13310425 #> --- #> 1376: 5 272 1826.25 7.25 0.31258148 #> 1377: 5 273 1826.25 7.25 0.35478676 #> 1378: 5 274 1826.25 7.25 0.41559176 #> 1379: 5 275 1826.25 7.25 0.25301890 #> 1380: 5 276 1826.25 7.25 0.44533769 ice_train <- orsf_ice_inb(fit_surv, pred_spec_auto(bili), pred_horizon = seq(500, 3000, by = 500)) ice_train #> id_variable id_row pred_horizon bili pred #> #> 1: 1 1 500 0.60 0.5949598 #> 2: 1 1 1000 0.60 0.7652137 #> 3: 1 1 1500 0.60 0.8751746 #> 4: 1 1 2000 0.60 0.9057135 #> 5: 1 1 2500 0.60 0.9231915 #> --- #> 8276: 5 276 1000 7.25 0.2111306 #> 8277: 5 276 1500 7.25 0.3642278 #> 8278: 5 276 2000 7.25 0.4850492 #> 8279: 5 276 2500 7.25 0.5720362 #> 8280: 5 276 3000 7.25 0.6206786"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"visualizing-ice-curves","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"Visualizing ICE curves","title":"PD and ICE curves with ORSF","text":"Inspecting ICE curves observation can help identify whether heterogeneity model’s predictions. .e., effect variable follow pattern data, groups variable impacts risk differently? going turn boundary checking orsf_ice_oob setting boundary_checks = FALSE, allow generate ICE curves go beyond 90th percentile bili. plots, helpful scale ICE data. subtract initial value predicted risk (.e., bili = 1) observation’s conditional expectation values. , Every curve start 0 plot shows change predicted risk function bili. Now can visualize curves. inspection figure, individual slopes cluster around overall trend - Good! small number individual slopes appear flat. may helpful investigate .","code":"pred_spec <- list(bili = seq(1, 10, length.out = 25)) ice_oob <- orsf_ice_oob(fit_surv, pred_spec, boundary_checks = FALSE) ice_oob #> id_variable id_row pred_horizon bili pred #> #> 1: 1 1 1826.25 1 0.8790861 #> 2: 1 2 1826.25 1 0.8132035 #> 3: 1 3 1826.25 1 0.6240238 #> 4: 1 4 1826.25 1 0.7461603 #> 5: 1 5 1826.25 1 0.5754091 #> --- #> 6896: 25 272 1826.25 10 0.7018976 #> 6897: 25 273 1826.25 10 0.4606246 #> 6898: 25 274 1826.25 10 0.3347082 #> 6899: 25 275 1826.25 10 0.6046024 #> 6900: 25 276 1826.25 10 0.2789017 ice_oob[, pred_subtract := rep(pred[id_variable==1], times=25)] ice_oob[, pred := pred - pred_subtract] ggplot(ice_oob, aes(x = bili, y = pred, group = id_row)) + geom_line(alpha = 0.15) + labs(y = 'Change in predicted risk') + geom_smooth(se = FALSE, aes(group = 1))"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"limitations-of-pd","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"Limitations of PD","title":"PD and ICE curves with ORSF","text":"Partial dependence number known limitations assumptions users aware (see Hooker, 2021). particular, partial dependence less intuitive >2 predictors examined jointly, assumed feature(s) partial dependence computed correlated features (likely true many cases). Accumulated local effect plots can used (see ) case feature independence valid assumption.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"references","dir":"Articles","previous_headings":"Individual conditional expectations (ICE)","what":"References","title":"PD and ICE curves with ORSF","text":"Hooker, Giles, Mentch, Lucas, Zhou, Siyu (2021). “Unrestricted permutation forces extrapolation: variable importance requires least one model, free variable importance.” Statistics Computing, 31, 1-16.","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Byron Jaeger. Author, maintainer. Nicholas Pajewski. Contributor. Sawyer Welden. Contributor. Christopher Jackson. Reviewer. Marvin Wright. Reviewer. Lukas Burk. Reviewer.","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Jaeger et al. (2022). aorsf: R package supervised learning using oblique random survival forest. Journal Open Source Software, 7(77), 4705. https://doi.org/10.21105/joss.04705. Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics. 2023 Aug 3:1-6. Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique Random Survival Forests. Annals Applied Statistics. 13(3): 1847-1883. URL https://doi.org/10.1214/19-AOAS1261 DOI: 10.1214/19-AOAS1261","code":"@Article{, title = {aorsf: An R package for supervised learning using the oblique random survival forest}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Nicholas M. Pajewski}, journal = {Journal of Open Source Software}, year = {2022}, volume = {7}, number = {77}, pages = {4705}, url = {https://doi.org/10.21105/joss.04705}, } @Article{, title = {Accelerated and interpretable oblique random survival forests}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Jaime L. Speiser and Matthew W. Segar and Ambarish Pandey and Nicholas M. Pajewski}, journal = {Journal of Computational and Graphical Statistics}, year = {2023}, url = {https://doi.org/10.1080/10618600.2023.2231048}, } @Article{, title = {Oblique Random Survival Forests}, author = {Byron C. Jaeger and D. Leann Long and Dustin M. Long and Mario Sims and Jeff M. Szychowski and Yuan-I Min and Leslie A. Mcclure and George Howard and Noah Simon}, journal = {Annals of Applied Statistics}, year = {2019}, volume = {13}, number = {3}, pages = {1847--1883}, url = {https://doi.org/10.1214/19-AOAS1261}, }"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"aorsf-","dir":"","previous_headings":"","what":"Accelerated Oblique Random Forests","title":"Accelerated Oblique Random Forests","text":"Fit, interpret, make predictions oblique random forests (RFs).","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"why-aorsf","dir":"","previous_headings":"","what":"Why aorsf?","title":"Accelerated Oblique Random Forests","text":"Fast versatile tools oblique RFs.1 Accurate predictions.2 Intuitive design formula based interface. Extensive input checks informative error messages. Compatible tidymodels mlr3","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Accelerated Oblique Random Forests","text":"can install aorsf CRAN using can install development version aorsf GitHub :","code":"install.packages(\"aorsf\") # install.packages(\"remotes\") remotes::install_github(\"ropensci/aorsf\")"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"get-started","dir":"","previous_headings":"","what":"Get started","title":"Accelerated Oblique Random Forests","text":"aorsf fits several types oblique RFs orsf() function, including classification, regression, survival RFs. classification, fit oblique RF predict penguin species using penguin data magnificent palmerpenguins R package regression, use data predict bill length penguins: personal favorite oblique survival RF accelerated Cox regression first type oblique RF aorsf provided (see JCGS paper). , use predict mortality risk following diagnosis primary biliary cirrhosis:","code":"library(aorsf) library(tidyverse) # An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = species ~ .) penguin_fit #> ---------- Oblique random classification forest #> #> Linear combinations: Accelerated Logistic regression #> N observations: 333 #> N classes: 3 #> N trees: 5 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 6 #> Min observations in leaf: 5 #> OOB stat value: 0.99 #> OOB stat type: AUC-ROC #> Variable importance: anova #> #> ----------------------------------------- # An oblique regression RF bill_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = bill_length_mm ~ .) bill_fit #> ---------- Oblique random regression forest #> #> Linear combinations: Accelerated Linear regression #> N observations: 333 #> N trees: 5 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 42.6 #> Min observations in leaf: 5 #> OOB stat value: 0.76 #> OOB stat type: RSQ #> Variable importance: anova #> #> ----------------------------------------- # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 20.4 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.79 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"what-does-oblique-mean","dir":"","previous_headings":"","what":"What does “oblique” mean?","title":"Accelerated Oblique Random Forests","text":"Decision trees grown splitting set training data non-overlapping subsets, goal similarity within new subsets . subsets created single predictor, decision tree axis-based subset boundaries perpendicular axis predictor. linear combinations (.e., weighted sum) variables used instead single variable, tree oblique boundaries neither parallel perpendicular axis. Figure: Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes. , difference translate real data, impact random forests comprising hundreds axis-based oblique trees? demonstrate using penguin data.3 also use function make several plots: also use grid points plotting decision surfaces: use orsf mtry=1 fit axis-based trees: Next use orsf_update copy modify original model, expanding fit oblique tree using mtry=2 instead mtry=1, include 500 trees instead 1: now need visualize decision surfaces using predictions four fits: Figure: Axis-based oblique decision surfaces single tree ensemble 500 trees. Axis-based trees boundaries perpendicular predictor axes, whereas oblique trees can boundaries neither parallel perpendicular predictor axes. Axis-based forests tend ‘step-function’ decision boundaries, oblique forests tend smooth decision boundaries.","code":"plot_decision_surface <- function(predictions, title, grid){ # this is not a general function for plotting # decision surfaces. It just helps to minimize # copying and pasting of code. class_preds <- bind_cols(grid, predictions) %>% pivot_longer(cols = c(Adelie, Chinstrap, Gentoo)) %>% group_by(flipper_length_mm, bill_length_mm) %>% arrange(desc(value)) %>% slice(1) cols <- c(\"darkorange\", \"purple\", \"cyan4\") ggplot(class_preds, aes(bill_length_mm, flipper_length_mm)) + geom_contour_filled(aes(z = value, fill = name), alpha = .25) + geom_point(data = penguins_orsf, aes(color = species, shape = species), alpha = 0.5) + scale_color_manual(values = cols) + scale_fill_manual(values = cols) + labs(x = \"Bill length, mm\", y = \"Flipper length, mm\") + theme_minimal() + scale_x_continuous(expand = c(0,0)) + scale_y_continuous(expand = c(0,0)) + theme(panel.grid = element_blank(), panel.border = element_rect(fill = NA), legend.position = '') + labs(title = title) } grid <- expand_grid( flipper_length_mm = seq(min(penguins_orsf$flipper_length_mm), max(penguins_orsf$flipper_length_mm), len = 200), bill_length_mm = seq(min(penguins_orsf$bill_length_mm), max(penguins_orsf$bill_length_mm), len = 200) ) fit_axis_tree <- penguins_orsf %>% orsf(species ~ bill_length_mm + flipper_length_mm, n_tree = 1, mtry = 1, tree_seeds = 106760) fit_axis_forest <- fit_axis_tree %>% orsf_update(n_tree = 500) fit_oblique_tree <- fit_axis_tree %>% orsf_update(mtry = 2) fit_oblique_forest <- fit_oblique_tree %>% orsf_update(n_tree = 500) preds <- list(fit_axis_tree, fit_axis_forest, fit_oblique_tree, fit_oblique_forest) %>% map(predict, new_data = grid, pred_type = 'prob') titles <- c(\"Axis-based tree\", \"Axis-based forest\", \"Oblique tree\", \"Oblique forest\") plots <- map2(preds, titles, plot_decision_surface, grid = grid)"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"variable-importance","dir":"","previous_headings":"","what":"Variable importance","title":"Accelerated Oblique Random Forests","text":"importance individual predictor variables can estimated three ways using aorsf can used type oblique RF. Also, variable importance functions always return named character vector negation2: variable assessed separately multiplying variable’s coefficients -1 determining much model’s performance changes. worse model’s performance negating coefficients given variable, important variable. technique promising b/c require permutation emphasizes variables larger coefficients linear combinations, also relatively new hasn’t studied much permutation importance. See Jaeger, (2023) details technique. permutation: variable assessed separately randomly permuting variable’s values determining much model’s performance changes. worse model’s performance permuting values given variable, important variable. technique flexible, intuitive, frequently used. also several known limitations analysis variance (ANOVA)4: p-value computed coefficient linear combination variables decision tree. Importance individual predictor variable proportion times p-value coefficient < 0.01. technique efficient computationally, may effective permutation negation terms selecting signal noise variables. See Menze, 2011 details technique. can supply R function estimate --bag error (see oob vignette) estimate --bag variable importance (see orsf_vi examples)","code":"orsf_vi_negate(pbc_fit) #> bili copper stage sex age #> 0.1552460736 0.1156218837 0.0796917628 0.0533427094 0.0283132385 #> albumin trt chol alk.phos platelet #> 0.0279823814 0.0168238416 0.0153010749 0.0148718669 0.0094582765 #> edema ascites spiders protime hepato #> 0.0067975986 0.0065505801 0.0062356214 -0.0004653046 -0.0026664147 #> ast trig #> -0.0028902524 -0.0106616501 orsf_vi_permute(penguin_fit) #> bill_length_mm bill_depth_mm body_mass_g island #> 0.121351910 0.101846889 0.097822451 0.080772909 #> sex flipper_length_mm year #> 0.035053517 0.008270751 -0.008058339 orsf_vi_anova(bill_fit) #> species sex bill_depth_mm flipper_length_mm #> 0.51652893 0.27906977 0.06315789 0.04950495 #> body_mass_g island year #> 0.04807692 0.02687148 0.00000000"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"partial-dependence-pd","dir":"","previous_headings":"","what":"Partial dependence (PD)","title":"Accelerated Oblique Random Forests","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction.. can use specific values predictor compute PD let aorsf pick reasonable values use pred_spec_auto(): summary function, orsf_summarize_uni(), computes PD many variables ask , using sensible values. PD, see vignette","code":"# pick your own values orsf_pd_oob(bill_fit, pred_spec = list(species = c(\"Adelie\", \"Gentoo\"))) #> species mean lwr medn upr #> #> 1: Adelie 39.99394 35.76532 39.80782 46.13931 #> 2: Gentoo 46.66565 40.02938 46.88517 51.61367 # let aorsf pick reasonable values for you: orsf_pd_oob(bill_fit, pred_spec = pred_spec_auto(bill_depth_mm, island)) #> bill_depth_mm island mean lwr medn upr #> #> 1: 14.3 Biscoe 43.94960 35.90421 45.30159 51.05109 #> 2: 15.6 Biscoe 44.24705 36.62759 45.57321 51.08020 #> 3: 17.3 Biscoe 44.84757 36.53804 45.62910 53.93833 #> 4: 18.7 Biscoe 45.08939 36.35893 46.16893 54.42075 #> 5: 19.5 Biscoe 45.13608 36.21033 46.08023 54.42075 #> --- #> 11: 14.3 Torgersen 43.55984 35.47143 44.18127 51.05109 #> 12: 15.6 Torgersen 43.77317 35.44683 44.28406 51.08020 #> 13: 17.3 Torgersen 44.56465 35.84585 44.83694 53.93833 #> 14: 18.7 Torgersen 44.68367 35.44010 44.86667 54.42075 #> 15: 19.5 Torgersen 44.64605 35.44010 44.86667 54.42075 orsf_summarize_uni(pbc_fit, n_variables = 2) #> #> -- bili (VI Rank: 1) ----------------------------- #> #> |----------------- Risk -----------------| #> Value Mean Median 25th % 75th % #> #> 0.60 0.2098108 0.07168855 0.01138461 0.2860450 #> 0.80 0.2117933 0.07692308 0.01709469 0.2884990 #> 1.40 0.2326560 0.08445419 0.02100837 0.3563622 #> 3.55 0.4265979 0.35820106 0.05128824 0.7342923 #> 7.30 0.4724608 0.44746241 0.11759259 0.8039683 #> #> -- copper (VI Rank: 2) --------------------------- #> #> |----------------- Risk -----------------| #> Value Mean Median 25th % 75th % #> #> 25.0 0.2332412 0.04425936 0.01587919 0.3888304 #> 42.5 0.2535448 0.07417582 0.01754386 0.4151786 #> 74.0 0.2825471 0.11111111 0.01988069 0.4770833 #> 130 0.3259604 0.18771003 0.04658385 0.5054348 #> 217 0.4213303 0.28571429 0.13345865 0.6859423 #> #> Predicted risk at time t = 1788 for top 2 predictors"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"individual-conditional-expectations-ice","dir":"","previous_headings":"","what":"Individual conditional expectations (ICE)","title":"Accelerated Oblique Random Forests","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"interaction-scores","dir":"","previous_headings":"","what":"Interaction scores","title":"Accelerated Oblique Random Forests","text":"orsf_vint() function computes score possible interaction model based PD using method described Greenwell et al, 2018.5 can slow larger datasets, substantial speedups occur making use multi-threading restricting search smaller set predictors. values score mean? values average standard deviation standard deviation PD one variable conditional variable. interpreted relative one another, .e., higher scoring interaction likely reflect real interaction two variables lower scoring one. interaction scores make sense? Let’s test top scoring lowest scoring interactions using coxph(). Note: exploratory true null hypothesis test. ? used data generate test null hypothesis. much conducting statistical inference test interactions coxph demonstrating interaction scores orsf_vint() provides consistent tests models.","code":"preds_interaction <- c(\"albumin\", \"protime\", \"bili\", \"spiders\", \"trt\") # While it is tempting to speed up `orsf_vint()` by growing a smaller # number of trees, results may become unstable with this shortcut. pbc_interactions <- pbc_fit %>% orsf_update(n_tree = 500, tree_seeds = 329) %>% orsf_vint(n_thread = 0, predictors = preds_interaction) pbc_interactions #> interaction score #> #> 1: albumin..protime 0.97837184 #> 2: protime..bili 0.78999788 #> 3: albumin..bili 0.59128756 #> 4: bili..spiders 0.13192184 #> 5: bili..trt 0.13192184 #> 6: albumin..spiders 0.06578222 #> 7: albumin..trt 0.06578222 #> 8: protime..spiders 0.03012718 #> 9: protime..trt 0.03012718 #> 10: spiders..trt 0.00000000 library(survival) # the top scoring interaction should get a lower p-value anova(coxph(Surv(time, status) ~ protime * albumin, data = pbc_orsf)) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> protime -538.51 23.353 1 1.349e-06 *** #> albumin -514.89 47.255 1 6.234e-12 *** #> protime:albumin -511.76 6.252 1 0.01241 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # the bottom scoring interaction should get a higher p-value anova(coxph(Surv(time, status) ~ spiders * trt, data = pbc_orsf)) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> spiders -538.58 23.2159 1 1.448e-06 *** #> trt -538.39 0.3877 1 0.5335 #> spiders:trt -538.29 0.2066 1 0.6494 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"comparison-to-existing-software","dir":"","previous_headings":"","what":"Comparison to existing software","title":"Accelerated Oblique Random Forests","text":"survival analysis, comparisons aorsf existing software presented JCGS paper. paper: describes aorsf detail summary procedures used tree fitting algorithm runs general benchmark comparing aorsf obliqueRSF several learners reports prediction accuracy computational efficiency learners. runs simulation study comparing variable importance techniques oblique survival RFs, axis based survival RFs, boosted trees. reports probability variable importance technique rank relevant variable higher importance irrelevant variable.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Accelerated Oblique Random Forests","text":"Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min Y, Mcclure LA, Howard G, Simon N (2019). “Oblique random survival forests.” Annals Applied Statistics, 13(3). doi:10.1214/19-aoas1261 https://doi.org/10.1214/19-aoas1261. Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM (2023). “Accelerated interpretable oblique random survival forests.” Journal Computational Graphical Statistics, 1-16. doi:10.1080/10618600.2023.2231048 https://doi.org/10.1080/10618600.2023.2231048. Horst , Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0, https://allisonhorst.github.io/palmerpenguins/. Menze, H B, Kelm, Michael B, Splitthoff, N D, Koethe, Ullrich, Hamprecht, F (2011). “oblique random forests.” Machine Learning Knowledge Discovery Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II 22, 453-469. Springer. Greenwell, M B, Boehmke, C B, McCarthy, J (2018). “simple effective model-based variable importance measure.” arXiv preprint arXiv:1805.04755.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"funding","dir":"","previous_headings":"","what":"Funding","title":"Accelerated Oblique Random Forests","text":"developers aorsf received financial support Center Biomedical Informatics, Wake Forest University School Medicine. also received support National Center Advancing Translational Sciences National Institutes Health Award Number UL1TR001420. content solely responsibility authors necessarily represent official views National Institutes Health.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":null,"dir":"Reference","previous_headings":"","what":"aorsf: Accelerated Oblique Random Forests — aorsf-package","title":"aorsf: Accelerated Oblique Random Forests — aorsf-package","text":"Fit, interpret, compute predictions oblique random forests. Includes support partial dependence, variable importance, passing customized functions variable importance identification linear combinations features. Methods oblique random survival forest described Jaeger et al., (2023) doi:10.1080/10618600.2023.2231048 .","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"aorsf: Accelerated Oblique Random Forests — aorsf-package","text":"Maintainer: Byron Jaeger bjaeger@wakehealth.edu (ORCID) contributors: Nicholas Pajewski [contributor] Sawyer Welden swelden@wakehealth.edu [contributor] Christopher Jackson chris.jackson@mrc-bsu.cam.ac.uk [reviewer] Marvin Wright [reviewer] Lukas Burk [reviewer]","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"Coerce to data.table — as.data.table.orsf_summary_uni","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"Convert 'orsf_summary' object data.table object.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"# S3 method for orsf_summary_uni as.data.table(x, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"x object class 'orsf_summary_uni' ... used","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"data.table","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"if (FALSE) { library(data.table) object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 25) smry <- orsf_summarize_uni(object, n_variables = 2) as.data.table(smry) }"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Oblique Random Forests — orsf","title":"Oblique Random Forests — orsf","text":"Grow specify oblique random forest. name orsf() implies function works survival forests, can used classification, regression, survival forests.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Oblique Random Forests — orsf","text":"","code":"orsf( data, formula, control = NULL, weights = NULL, n_tree = 500, n_split = 5, n_retry = 3, n_thread = 0, mtry = NULL, sample_with_replacement = TRUE, sample_fraction = 0.632, leaf_min_events = 1, leaf_min_obs = 5, split_rule = NULL, split_min_events = 5, split_min_obs = 10, split_min_stat = NULL, oobag_pred_type = NULL, oobag_pred_horizon = NULL, oobag_eval_every = NULL, oobag_fun = NULL, importance = \"anova\", importance_max_pvalue = 0.01, group_factors = TRUE, tree_seeds = NULL, attach_data = TRUE, no_fit = FALSE, na_action = \"fail\", verbose_progress = FALSE, ... ) orsf_train(object, attach_data = TRUE)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Oblique Random Forests — orsf","text":"data data.frame, tibble, data.table contains relevant variables. formula (formula) Two sided formula single outcome. terms right names predictor variables, symbol '.' may used indicate variables data except response. symbol '-' may also used indicate removal predictor. Details response vary depending forest type: Classification: response single variable, variable type factor data. Regression: response single variable, variable typee double integer least 10 unique numeric values data. Survival: response include time variable, followed status variable, may written inside call Surv (see examples). control (orsf_control) object returned one orsf_control functions: orsf_control_survival, orsf_control_classification, orsf_control_regression. NULL (default) use accelerated control, fastest available option. survival classification, Cox Logistic regression 1 iteration, regression ordinary least squares. weights (numeric vector) Optional. given, input length equal nrow(data) complete imputed data length equal nrow(na.omit(data)) na_action \"omit\". weights vector used count observations events prior growing node tree, orsf() scales weights sum(weights) == nrow(data). helps make tree depth consistent weighted un-weighted fits. n_tree (integer) number trees grow. Default n_tree = 500. n_split (integer) number cut-points assessed splitting node decision trees. Default n_split = 5. n_retry (integer) node splittable, current linear combination inputs unable provide valid split, orsf try new linear combination based different set randomly selected predictors, n_retry times. Default n_retry = 3. Set n_retry = 0 prevent retries. n_thread (integer) number threads use growing trees, computing predictions, computing importance. Default 0, allows suitable number threads used based availability. mtry (integer) Number predictors randomly included candidates splitting node. default smallest integer greater square root number total predictors, .e., mtry = ceiling(sqrt(number predictors)) sample_with_replacement (logical) TRUE (default), observations sampled replacement -bag sample created decision tree. FALSE, observations sampled without replacement tree -bag sample containing sample_fraction% original sample. sample_fraction (double) proportion observations trees' -bag sample contain, relative number rows data. used sample_with_replacement FALSE. Default value 0.632. leaf_min_events (integer) input relevant survival analysis, specifies minimum number events leaf node. Default leaf_min_events = 1 leaf_min_obs (integer) minimum number observations leaf node. Default leaf_min_obs = 5. split_rule (character) assess quality potential splitting rule node. Valid options survival : 'logrank' : log-rank test statistic (default). 'cstat' : Harrell's concordance statistic. classification, valid options : 'gini' : gini impurity (default) 'cstat' : area underneath ROC curve (AUC-ROC) regression, valid options : 'variance' : variance reduction (default) split_min_events (integer) minimum number events required node consider splitting . Default split_min_events = 5. input relevant survival trees. split_min_obs (integer) minimum number observations required node consider splitting . Default split_min_obs = 10. split_min_stat (double) minimum test statistic required split node. splits found statistic exceeding split_min_stat, given node either becomes leaf retry occurs (n_retry retries). Defaults 3.84 split_rule = 'logrank' 0.55 split_rule = 'cstat' (see first note ) 0.00 split_rule = 'gini' (see second note ) 0.00 split_rule = 'variance' Note 1 C-statistic splitting, C < 0.50, consider statistic value 1 - C allow good 'anti-predictive' splits. , C-statistic initially computed 0.1, considered 1 - 0.10 = 0.90. Note 2 Gini impurity, value 0 1 usually indicate best worst possible scores, respectively. make things simple avoid introducing split_max_stat input, flip values Gini impurity 1 0 indicate best worst possible scores, respectively. oobag_pred_type (character) type --bag predictions compute fitting ensemble. Valid options tree type: 'none' : compute --bag predictions 'leaf' : ID predicted leaf returned tree Valid options survival: 'risk' : probability event occurring oobag_pred_horizon (default). 'surv' : 1 - risk. 'chf' : cumulative hazard function oobag_pred_horizon. 'mort' : mortality, .e., number events expected observations training data identical given observation. Valid options classification: 'prob' : probability class (default) 'class' : class (.e., .max(prob)) Valid options regression: 'mean' : mean value (default) oobag_pred_horizon (numeric) numeric value indicating time used --bag predictions. Default median observed times, .e., oobag_pred_horizon = median(time). input relevant survival trees prediction type 'risk', 'surv', 'chf'. oobag_eval_every (integer) --bag performance ensemble checked every oobag_eval_every trees. , oobag_eval_every = 10, --bag performance checked growing 10th tree, 20th tree, . Default oobag_eval_every = n_tree. oobag_fun (function) used evaluating --bag prediction accuracy every oobag_eval_every trees. oobag_fun = NULL (default), evaluation statistic selected based tree type survival: Harrell's C-statistic (1982) classification: Area underneath ROC curve (AUC-ROC) regression: Traditional prediction R-squared use oobag_fun note following: oobag_fun three inputs: y_mat, w_vec, s_vec survival trees, y_mat two column matrix first column named 'time' second named 'status'. classification trees, y_mat matrix number columns = number distinct classes outcome. regression, y_mat matrix one column. s_vec numeric vector containing predictions oobag_fun return numeric output length 1 details, see --bag vignette. importance (character) Indicate method variable importance: 'none': variable importance computed. 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance details methods, see orsf_vi. importance_max_pvalue (double) relevant importance \"anova\". maximum p-value register positive case counting number times variable found 'significant' tree growth. Default 0.01, recommended Menze et al. group_factors (logical) relevant variable importance estimated. TRUE, importance factor variables reported overall aggregating importance individual levels factor. FALSE, importance individual factor levels returned. tree_seeds (integer vector) Optional. specified, random seeds set using values tree_seeds[] growing tree . Two forests grown number trees seeds exact --bag samples, making --bag error estimates forests comparable. NULL (default), seeds picked random. attach_data (logical) TRUE, copy training data attached output. required plan using functions like orsf_pd_oob orsf_summarize_uni interpret forest using training data. Default TRUE. no_fit (logical) TRUE, model fitting steps defined saved, training initiated. object returned can directly submitted orsf_train() long attach_data TRUE. na_action (character) happen data contains missing values (.e., NA values). Valid options : 'fail' : error thrown data contains NA values 'omit' : rows data incomplete data dropped 'impute_meanmode' : missing values continuous categorical variables data imputed using mean mode, respectively. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. ... arguments passed methods (currently used). object untrained 'aorsf' object, created setting no_fit = TRUE orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Oblique Random Forests — orsf","text":"obliqueForest object","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Oblique Random Forests — orsf","text":"function called orf()? earlier versions, aorsf package exclusively oblique random survival forests. formula survival oblique RFs: response formula can survival object returned Surv function, can also just time status variables. .e., Surv(time, status) ~ . works time + status ~ . works response can also survival object stored data. example, y ~ . valid formula data$y inherits Surv class. mtry: mtry parameter may temporarily reduced ensure linear models used find combinations predictors remain stable. occurs coefficients linear model fitting algorithms may become infinite number predictors exceeds number observations. oobag_fun: oobag_fun specified, used compute negation importance permutation importance, role ANOVA importance. n_thread: R function called C++ (.e., user-supplied function compute --bag error identify linear combinations variables), n_thread automatically set 1 attempting run R functions multiple threads cause R session crash.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-an-oblique-decision-tree-","dir":"Reference","previous_headings":"","what":"What is an oblique decision tree?","title":"Oblique Random Forests — orsf","text":"Decision trees developed splitting set training data two new subsets, goal similarity within new subsets . splitting process repeated resulting subsets data stopping criterion met. new subsets data formed based single predictor, decision tree said axis-based splits data appear perpendicular axis predictor. linear combinations variables used instead single variable, tree oblique splits data neither parallel right angle axis Figure : Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-a-random-forest-","dir":"Reference","previous_headings":"","what":"What is a random forest?","title":"Oblique Random Forests — orsf","text":"Random forests collections de-correlated decision trees. Predictions tree aggregated make ensemble prediction forest. details, see Breiman el, 2001.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"training-out-of-bag-error-and-testing","dir":"Reference","previous_headings":"","what":"Training, out-of-bag error, and testing","title":"Oblique Random Forests — orsf","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. '--bag' data instances bootstrapped training set. tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. Generalization error refers error random forest's predictions applied predict outcomes data used train , .e., testing data.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Oblique Random Forests — orsf","text":"orsf() entry-point aorsf package. can used fit classification, regression, survival forests. classification, fit oblique RF predict penguin species using penguin data magnificent palmerpenguins R package regression, use data predict bill length penguins: personal favorite oblique survival RF accelerated Cox regression first type oblique RF aorsf provided (see ArXiv paper; paper also published Journal Computational Graphical Statistics publicly available ). , use predict mortality risk following diagnosis primary biliary cirrhosis:","code":"library(aorsf) library(magrittr) # for %>% ## ## Attaching package: 'magrittr' ## The following object is masked from 'package:tidyr': ## ## extract ## The following objects are masked from 'package:testthat': ## ## equals, is_less_than, not # An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = species ~ .) penguin_fit ## ---------- Oblique random classification forest ## ## Linear combinations: Accelerated Logistic regression ## N observations: 333 ## N classes: 3 ## N trees: 5 ## N predictors total: 7 ## N predictors per node: 3 ## Average leaves per tree: 4.6 ## Min observations in leaf: 5 ## OOB stat value: 0.99 ## OOB stat type: AUC-ROC ## Variable importance: anova ## ## ----------------------------------------- # An oblique regression RF bill_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = bill_length_mm ~ .) bill_fit ## ---------- Oblique random regression forest ## ## Linear combinations: Accelerated Linear regression ## N observations: 333 ## N trees: 5 ## N predictors total: 7 ## N predictors per node: 3 ## Average leaves per tree: 51 ## Min observations in leaf: 5 ## OOB stat value: 0.70 ## OOB stat type: RSQ ## Variable importance: anova ## ## ----------------------------------------- # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 5 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 22.2 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.78 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"more-than-one-way-to-grow-a-forest","dir":"Reference","previous_headings":"","what":"More than one way to grow a forest","title":"Oblique Random Forests — orsf","text":"can use orsf(no_fit = TRUE) make specification grow forest instead fitted forest. ? Two reasons: computational tasks, may want check long take fit forest commit : fitting multiple forests, use blueprint along orsf_train() orsf_update() simplify code:","code":"orsf_spec <- orsf(pbc_orsf, formula = time + status ~ . - id, no_fit = TRUE) orsf_spec ## Untrained oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 0 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: none ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## ----------------------------------------- orsf_spec %>% orsf_update(n_tree = 10000) %>% orsf_time_to_train() ## Time difference of 2.429678 secs orsf_fit <- orsf_train(orsf_spec) orsf_fit_10 <- orsf_update(orsf_fit, leaf_min_obs = 10) orsf_fit_20 <- orsf_update(orsf_fit, leaf_min_obs = 20) orsf_fit$leaf_min_obs ## [1] 5 orsf_fit_10$leaf_min_obs ## [1] 10 orsf_fit_20$leaf_min_obs ## [1] 20"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"tidymodels","dir":"Reference","previous_headings":"","what":"tidymodels","title":"Oblique Random Forests — orsf","text":"tidymodels includes support aorsf computational engine: Prediction aorsf models different times also supported:","code":"library(tidymodels) library(censored) library(yardstick) pbc_tidy <- pbc_orsf %>% mutate(event_time = Surv(time, status), .before = 1) %>% select(-c(id, time, status)) %>% as_tibble() split <- initial_split(pbc_tidy) orsf_spec <- rand_forest() %>% set_engine(\"aorsf\") %>% set_mode(\"censored regression\") orsf_fit <- fit(orsf_spec, formula = event_time ~ ., data = training(split)) time_points <- seq(500, 3000, by = 500) test_pred <- augment(orsf_fit, new_data = testing(split), eval_time = time_points) brier_scores <- test_pred %>% brier_survival(truth = event_time, .pred) brier_scores ## # A tibble: 6 x 4 ## .metric .estimator .eval_time .estimate ## ## 1 brier_survival standard 500 0.0597 ## 2 brier_survival standard 1000 0.0943 ## 3 brier_survival standard 1500 0.0883 ## 4 brier_survival standard 2000 0.102 ## 5 brier_survival standard 2500 0.137 ## 6 brier_survival standard 3000 0.153 roc_scores <- test_pred %>% roc_auc_survival(truth = event_time, .pred) roc_scores ## # A tibble: 6 x 4 ## .metric .estimator .eval_time .estimate ##