diff --git a/articles/aorsf.html b/articles/aorsf.html index f723da52..f3961c84 100644 --- a/articles/aorsf.html +++ b/articles/aorsf.html @@ -136,10 +136,10 @@

Oblique RFs for #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 -#> Average leaves per tree: 19.4 +#> Average leaves per tree: 19.8 #> Min observations in leaf: 5 #> Min events in leaf: 1 -#> OOB stat value: 0.77 +#> OOB stat value: 0.76 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> @@ -159,9 +159,9 @@

Oblique RFs for #> N trees: 5 #> N predictors total: 7 #> N predictors per node: 3 -#> Average leaves per tree: 6.4 +#> Average leaves per tree: 6 #> Min observations in leaf: 5 -#> OOB stat value: 0.98 +#> OOB stat value: 0.99 #> OOB stat type: AUC-ROC #> Variable importance: anova #> @@ -183,7 +183,7 @@

Oblique RFs for #> N predictors per node: 4 #> Average leaves per tree: 4.8 #> Min observations in leaf: 5 -#> OOB stat value: 0.74 +#> OOB stat value: 0.75 #> OOB stat type: RSQ #> Variable importance: anova #> @@ -220,12 +220,14 @@

Variable importance
 
 orsf_vi_negate(pbc_fit)
-#>         bili       copper        stage          age      protime          sex 
-#>  0.114050616  0.058879090  0.046094524  0.038011836  0.028853966  0.020874091 
-#>          trt     platelet        edema      albumin      ascites     alk.phos 
-#>  0.017129950  0.013471420  0.012868400  0.012394794  0.010231183  0.009874770 
-#>      spiders         trig          ast         chol       hepato 
-#>  0.005304760  0.005121708  0.003857380  0.001530643 -0.006896236
+#> bili copper age protime spiders +#> 0.1168221744 0.0640918012 0.0318717527 0.0295703184 0.0199482278 +#> ascites stage trig ast hepato +#> 0.0145030496 0.0138362817 0.0093934850 0.0081600305 0.0081045745 +#> edema albumin trt chol platelet +#> 0.0074171879 0.0070565813 0.0049965458 0.0043845830 0.0007886543 +#> sex alk.phos +#> -0.0023614972 -0.0040932561
  • You can also compute variable importance using @@ -235,12 +237,14 @@

    Variable importance
     
     orsf_vi_permute(pbc_fit)
    -#>         bili        stage       copper          age      albumin         chol 
    -#>  0.081386482  0.027461497  0.026676054  0.024209679  0.018792662  0.012066683 
    -#>     alk.phos          ast         trig      spiders      ascites     platelet 
    -#>  0.009972421  0.009671777  0.006336094  0.004536710  0.004102601  0.003379097 
    -#>          trt          sex        edema      protime       hepato 
    -#>  0.001788329  0.000976518 -0.000169104 -0.002325850 -0.007547886
    +#> bili copper age ascites albumin +#> 0.0681612536 0.0264039589 0.0154990015 0.0145135549 0.0128863883 +#> ast spiders stage edema protime +#> 0.0112889819 0.0042083643 0.0036260906 0.0031464934 0.0029252926 +#> trt platelet sex hepato chol +#> -0.0002451595 -0.0002523982 -0.0005419264 -0.0010103185 -0.0012341940 +#> alk.phos trig +#> -0.0033725370 -0.0039837212

  • A faster alternative to permutation and negation importance is @@ -249,12 +253,12 @@

    Variable importance
     
     orsf_vi_anova(pbc_fit)
    -#>    ascites       bili     copper        ast    albumin      stage      edema 
    -#> 0.44444444 0.40000000 0.33333333 0.26666667 0.26666667 0.25000000 0.24047619 
    -#>        age       chol        sex    spiders    protime   alk.phos   platelet 
    -#> 0.19047619 0.18750000 0.18181818 0.16666667 0.15789474 0.13043478 0.12500000 
    -#>     hepato       trig        trt 
    -#> 0.12500000 0.06666667 0.04761905
    +#> ascites copper albumin bili edema age hepato +#> 0.50000000 0.41176471 0.35294118 0.35294118 0.29417989 0.26315789 0.23529412 +#> spiders protime chol stage alk.phos ast platelet +#> 0.21428571 0.21052632 0.16666667 0.13333333 0.06250000 0.05263158 0.04545455 +#> trig sex trt +#> 0.04545455 0.00000000 0.00000000

  • diff --git a/articles/fast.html b/articles/fast.html index 352f99b2..b125925e 100644 --- a/articles/fast.html +++ b/articles/fast.html @@ -127,7 +127,7 @@

    Don’t specify a control# control_fast() is much faster time_net['elapsed'] / time_fast['elapsed'] #> elapsed -#> 56.9 +#> 53.4

    Use n_thread @@ -155,10 +155,10 @@

    Use n_thread#> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 -#> Average leaves per tree: 22 +#> Average leaves per tree: 21.6 #> Min observations in leaf: 5 #> Min events in leaf: 1 -#> OOB stat value: 0.77 +#> OOB stat value: 0.78 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> @@ -206,7 +206,7 @@

    Do less#> N trees: 5 #> N predictors total: 18 #> N predictors per node: 5 -#> Average leaves per tree: 5.6 +#> Average leaves per tree: 6.8 #> Min observations in leaf: 5 #> Min events in leaf: 10 #> OOB stat value: none diff --git a/articles/oobag.html b/articles/oobag.html index adc88dcf..834d245e 100644 --- a/articles/oobag.html +++ b/articles/oobag.html @@ -140,9 +140,9 @@

    Out-of-bag predictions and error# what is the output from this function? fit$eval_oobag$stat_values #> [,1] -#> [1,] 0.7490789

    +#> [1,] 0.7923697

    The out-of-bag estimate of 1 (the default method to evaluate -out-of-bag predictions) is 0.7490789.

    +out-of-bag predictions) is 0.7923697.

    Monitoring out-of-bag error diff --git a/articles/oobag_files/figure-html/unnamed-chunk-2-1.png b/articles/oobag_files/figure-html/unnamed-chunk-2-1.png index 1f939662..0e8a59b1 100644 Binary files a/articles/oobag_files/figure-html/unnamed-chunk-2-1.png and b/articles/oobag_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/pkgdown.yml b/pkgdown.yml index ef1b556a..25e72899 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -6,7 +6,7 @@ articles: fast: fast.html oobag: oobag.html pd: pd.html -last_built: 2023-11-16T04:39Z +last_built: 2023-11-17T15:46Z urls: reference: https://bcjaeger.github.io/aorsf/reference article: https://bcjaeger.github.io/aorsf/articles diff --git a/reference/as.data.table.orsf_summary_uni.html b/reference/as.data.table.orsf_summary_uni.html index c6875dcd..9f1e94dc 100644 --- a/reference/as.data.table.orsf_summary_uni.html +++ b/reference/as.data.table.orsf_summary_uni.html @@ -96,14 +96,14 @@

    Examples as.data.table(smry) #> variable importance value mean medn lwr upr -#> 1: ascites 0.4916091 0 0.6923353 0.8129007 0.4723085 0.9385501 -#> 2: ascites 0.4916091 1 0.5332829 0.6019530 0.3495088 0.7316614 -#> 3: bili 0.4264008 0.80 0.7651674 0.8500293 0.6487390 0.9437248 -#> 4: bili 0.4264008 1.40 0.7431069 0.8221693 0.6026852 0.9308567 -#> 5: bili 0.4264008 3.52 0.6241095 0.6858060 0.4487054 0.8175104 -#> 6: edema 0.3064635 0 0.6989076 0.8175546 0.4723085 0.9399742 -#> 7: edema 0.3064635 0.5 0.5913448 0.6411541 0.3889686 0.7984051 -#> 8: edema 0.3064635 1 0.5876363 0.6248548 0.3604497 0.8203510 +#> 1: ascites 0.4924319 0 0.6900374 0.8002266 0.4747794 0.9329735 +#> 2: ascites 0.4924319 1 0.5276879 0.5921952 0.3307928 0.7210135 +#> 3: bili 0.4179174 0.80 0.7588320 0.8409506 0.6293245 0.9337076 +#> 4: bili 0.4179174 1.40 0.7387011 0.8207813 0.5972521 0.9214494 +#> 5: bili 0.4179174 3.52 0.6272947 0.6917634 0.4626750 0.8291606 +#> 6: edema 0.3079184 0 0.6948409 0.8086279 0.4904330 0.9359587 +#> 7: edema 0.3079184 0.5 0.5942084 0.6362531 0.3830464 0.7948496 +#> 8: edema 0.3079184 1 0.5920168 0.6305997 0.3821132 0.8220468 #> pred_horizon level #> 1: 1788 0 #> 2: 1788 1 diff --git a/reference/orsf.html b/reference/orsf.html index eebf2477..7a6bf5c1 100644 --- a/reference/orsf.html +++ b/reference/orsf.html @@ -106,34 +106,45 @@

    ArgumentsSurv (see examples). -The terms on the right are names of predictor variables.

    +

    (formula) Two sided formula with a single outcome. +The terms on the right are names of predictor variables, and the +symbol '.' may be used to indicate all variables in the data +except the response. The symbol '-' may also be used to indicate +removal of a predictor. Details on the response vary depending +on forest type:

    • Survival: The response should include a time variable, +followed by a status variable, and may be written inside a +call to Surv (see examples).

    • +
    • Classification: The response should be a single variable, +and that variable should have type factor in data.

    • +
    • Regression: The response should be a single variable, and +that variable should have typee double or integer with at +least 10 unique numeric values in data.

    • +
    control

    (orsf_control) An object returned from one of the -orsf_control functions:

    • orsf_control_fast (the default) uses a single iteration of Newton -Raphson scoring to identify a linear combination of predictors.

    • -
    • orsf_control_cph uses Newton Raphson scoring until a convergence -criteria is met.

    • -
    • orsf_control_net uses glmnet to identify linear combinations of -predictors, similar to Jaeger (2019).

    • -
    • orsf_control_custom allows the user to apply their own function -to create linear combinations of predictors.

    • -
    +orsf_control functions: orsf_control_survival, +orsf_control_classification, and orsf_control_regression. If +NULL (the default) will use an accelerated control, which is the +fastest available option. For survival and classification, this is +Cox and Logistic regression with 1 iteration, and for regression +it is ordinary least squares.

    weights

    (numeric vector) Optional. If given, this input should -have length equal to nrow(data). Values in weights are treated like -replication weights, i.e., a value of 2 is the same thing as having 2 -observations in data, each containing a copy of the corresponding -person's data.

    +have length equal to nrow(data) for complete or imputed data and should +have length equal to nrow(na.omit(data)) if na_action is "omit". +Values in weights are treated like replication weights, i.e., a value +of 2 is the same thing as having 2 observations in data, each +containing a copy of the corresponding person's data.

    Use weights cautiously, as orsf will count the number of observations and events prior to growing a node for a tree, so higher -values in weights will lead to deeper trees.

    +values in weights will lead to deeper trees. If you use this +input, it is highly recommended you scale the weights so that +sum(weights) == nrow(data), as this will help make tree depth +consistent with the default weights = rep(1, nrow(data))

    n_tree
    @@ -147,7 +158,7 @@

    ArgumentsArgumentsArgumentsArgumentsvignette.

    @@ -277,12 +313,12 @@

    Argumentsorsf_pd_oob or orsf_summarize_uni to interpret the forest using its training data. Default is TRUE.

    @@ -296,9 +332,7 @@

    ArgumentsExamples orsf_summarize_uni(object, n_variables = 3) #> -#> -- ascites (VI Rank: 1) ----------------------- +#> -- bili (VI Rank: 1) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0 0.6983709 0.7910345 0.5139682 0.9597133 -#> 1 0.5305812 0.5669828 0.3465873 0.7291136 +#> 0.80 0.7777513 0.8532309 0.6835249 0.9682600 +#> 1.40 0.7582240 0.8409578 0.6680678 0.9512311 +#> 3.52 0.6225496 0.6849123 0.4373820 0.8388405 #> -#> -- bili (VI Rank: 2) -------------------------- +#> -- ascites (VI Rank: 2) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0.80 0.7639857 0.8330195 0.6562526 0.9672986 -#> 1.40 0.7457920 0.8115713 0.6375998 0.9532067 -#> 3.52 0.6452987 0.7032310 0.4779466 0.8497936 +#> 0 0.6939223 0.7974914 0.4696991 0.9522116 +#> 1 0.5588307 0.6131822 0.3830100 0.7809927 #> -#> -- copper (VI Rank: 3) ------------------------ +#> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 42.8 0.7320214 0.8202788 0.5875294 0.9642543 -#> 74.0 0.7172284 0.8027461 0.5795696 0.9589637 -#> 129 0.6719926 0.7326571 0.5150342 0.9232044 +#> 0 0.6982167 0.8062842 0.4935759 0.9528082 +#> 0.5 0.6074940 0.6611793 0.3983464 0.8624686 +#> 1 0.6000213 0.6666069 0.3625468 0.8498958 #> #> Predicted survival at time t = 1788 for top 3 predictors @@ -183,24 +183,25 @@

    Examples#> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0.80 0.7639857 0.8330195 0.6562526 0.9672986 -#> 1.40 0.7457920 0.8115713 0.6375998 0.9532067 -#> 3.52 0.6452987 0.7032310 0.4779466 0.8497936 +#> 0.80 0.7777513 0.8532309 0.6835249 0.9682600 +#> 1.40 0.7582240 0.8409578 0.6680678 0.9512311 +#> 3.52 0.6225496 0.6849123 0.4373820 0.8388405 #> #> -- copper (VI Rank: 2) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 42.8 0.7320214 0.8202788 0.5875294 0.9642543 -#> 74.0 0.7172284 0.8027461 0.5795696 0.9589637 -#> 129 0.6719926 0.7326571 0.5150342 0.9232044 +#> 42.8 0.7284832 0.8305973 0.5811309 0.9723858 +#> 74.0 0.7079694 0.8118548 0.5308118 0.9430437 +#> 129 0.6578487 0.7345022 0.4382455 0.9122239 #> -#> -- sex (VI Rank: 3) --------------------------- +#> -- age (VI Rank: 3) --------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> m 0.6473382 0.7232680 0.4510599 0.8898719 -#> f 0.6949818 0.7990302 0.5011564 0.9597133 +#> 41.5 0.7385443 0.8644422 0.5752826 0.9697622 +#> 49.7 0.7081352 0.8355585 0.4877352 0.9656811 +#> 56.6 0.6674529 0.7596919 0.4288566 0.9388434 #> #> Predicted survival at time t = 1788 for top 3 predictors diff --git a/reference/orsf_time_to_train.html b/reference/orsf_time_to_train.html index 791c04e5..d72fa5fa 100644 --- a/reference/orsf_time_to_train.html +++ b/reference/orsf_time_to_train.html @@ -95,7 +95,7 @@

    Examplestime_estimated <- orsf_time_to_train(object, n_tree_subset = 50) print(time_estimated) -#> Time difference of 0.1945901 secs +#> Time difference of 0.1957631 secs # let's see how close the approximation was time_true_start <- Sys.time() @@ -105,11 +105,11 @@

    Examplestime_true <- time_true_stop - time_true_start print(time_true) -#> Time difference of 0.2257481 secs +#> Time difference of 0.2327383 secs # error abs(time_true - time_estimated) -#> Time difference of 0.03115797 secs +#> Time difference of 0.03697515 secs

    diff --git a/reference/orsf_vi.html b/reference/orsf_vi.html index d25a7ede..c3134b7a 100644 --- a/reference/orsf_vi.html +++ b/reference/orsf_vi.html @@ -117,10 +117,13 @@

    ArgumentsExamples orsf_vs(object, n_predictor_min = 15) #> n_predictors stat_value predictors_included -#> 1: 15 0.8409292 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... -#> 2: 16 0.8290536 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... -#> 3: 17 0.8334809 id,age,sex_f,ascites_1,hepato_1,spiders_1,... -#> 4: 18 0.8315537 id,trt_placebo,age,sex_f,ascites_1,hepato_1,... +#> 1: 15 0.8355123 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... +#> 2: 16 0.8367102 id,age,sex_f,ascites_1,hepato_1,spiders_1,... +#> 3: 17 0.8344185 id,age,sex_f,ascites_1,hepato_1,spiders_1,... +#> 4: 18 0.8323350 id,trt_placebo,age,sex_f,ascites_1,hepato_1,... #> predictor_dropped -#> 1: alk.phos -#> 2: platelet -#> 3: id +#> 1: stage +#> 2: id +#> 3: platelet #> 4: trt_placebo diff --git a/reference/print.ObliqueForest.html b/reference/print.ObliqueForest.html index 5709dd08..44331df9 100644 --- a/reference/print.ObliqueForest.html +++ b/reference/print.ObliqueForest.html @@ -136,10 +136,10 @@

    Examples#> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 -#> Average leaves per tree: 20.2 +#> Average leaves per tree: 22 #> Min observations in leaf: 5 #> Min events in leaf: 1 -#> OOB stat value: 0.77 +#> OOB stat value: 0.79 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> diff --git a/reference/print.orsf_summary_uni.html b/reference/print.orsf_summary_uni.html index e9513a83..5ab844db 100644 --- a/reference/print.orsf_summary_uni.html +++ b/reference/print.orsf_summary_uni.html @@ -102,24 +102,24 @@

    Examples#> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0 0.6917550 0.8087376 0.4765359 0.9345688 -#> 1 0.5285449 0.5892286 0.3475842 0.7204037 +#> 0 0.6938291 0.7966843 0.4904799 0.9324773 +#> 1 0.5405888 0.6094190 0.3515579 0.7427231 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0.80 0.7633098 0.8440832 0.6344838 0.9373619 -#> 1.40 0.7417346 0.8219937 0.6015828 0.9208078 -#> 3.52 0.6270205 0.6849291 0.4633733 0.8186574 +#> 0.80 0.7608539 0.8443806 0.6311260 0.9380968 +#> 1.40 0.7400537 0.8204494 0.5978744 0.9198812 +#> 3.52 0.6338126 0.7007166 0.4658716 0.8327246 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % -#> 0 0.6972450 0.8146059 0.4781274 0.9387859 -#> 0.5 0.5935130 0.6384339 0.3769381 0.8029641 -#> 1 0.5909165 0.6272369 0.3947232 0.8189532 +#> 0 0.6998313 0.8067008 0.5026885 0.9348496 +#> 0.5 0.5901635 0.6295332 0.3767099 0.7948707 +#> 1 0.5891139 0.6247675 0.3836066 0.8193277 #> #> Predicted survival at time t = 1788 for top 3 predictors diff --git a/search.json b/search.json index a7fe5e46..d7578984 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to aorsf","title":"Contributing to aorsf","text":"Want contribute aorsf? Great! aorsf initially stable state development, great deal active subsequent development envisioned. outline propose change aorsf. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to aorsf","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to aorsf","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to aorsf","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"ropensci/aorsf\", fork = TRUE). Install development dependencies devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to aorsf","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to aorsf","text":"Please note aorsf project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://bcjaeger.github.io/aorsf/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 aorsf authors (Byron C. Jaeger, Sawyer Welden, Nicholas M. Pajewski) Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"background","dir":"Articles","previous_headings":"","what":"Background","title":"Introduction to aorsf","text":"oblique random forest (RF) extension traditional (axis-based) RF. Instead using single variable split data grow new branches, trees oblique RF use weighted combination multiple variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"oblique-rfs-for-survival-classification-and-regression","dir":"Articles","previous_headings":"","what":"Oblique RFs for survival, classification, and regression","title":"Introduction to aorsf","text":"purpose aorsf (‘’ short accelerated) provide unifying framework fit oblique RFs can scale adequately large data sets. fastest algorithms available package used default often equivalent prediction accuracy computational approaches. Everything aorsf begins orsf() function. begin oblique RF survival using pbc_orsf data, oblique RF classification using penguins_orsf data, FILL REGRESSION. Note n_tree 5 convenience examples, >= 500 practice. may notice first input aorsf data. design choice makes easier use orsf pipes (.e., %>% |>). instance,","code":"library(aorsf) # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 19.4 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.77 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> ----------------------------------------- # An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = species ~ .) penguin_fit #> ---------- Oblique random classification forest #> #> Linear combinations: Accelerated Logistic regression #> N observations: 333 #> N classes: 3 #> N trees: 5 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 6.4 #> Min observations in leaf: 5 #> OOB stat value: 0.98 #> OOB stat type: AUC-ROC #> Variable importance: anova #> #> ----------------------------------------- # An oblique regression RF cars_fit <- orsf(data = mtcars, n_tree = 5, formula = mpg ~ .) cars_fit #> ---------- Oblique random regression forest #> #> Linear combinations: Accelerated Linear regression #> N observations: 32 #> N trees: 5 #> N predictors total: 10 #> N predictors per node: 4 #> Average leaves per tree: 4.8 #> Min observations in leaf: 5 #> OOB stat value: 0.74 #> OOB stat type: RSQ #> Variable importance: anova #> #> ----------------------------------------- library(dplyr) pbc_fit <- pbc_orsf |> select(-id) |> orsf(formula = Surv(time, status) ~ ., n_tree = 5)"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"interpretation","dir":"Articles","previous_headings":"","what":"Interpretation","title":"Introduction to aorsf","text":"aorsf includes several functions dedicated interpretation ORSFs, estimation partial dependence variable importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"variable-importance","dir":"Articles","previous_headings":"Interpretation","what":"Variable importance","title":"Introduction to aorsf","text":"multiple methods compute variable importance. compute negation importance, ORSF multiplies coefficient variable -1 re-computes --sample (sometimes referred --bag) accuracy ORSF model. can also compute variable importance using permutation, classical approach noises predictor assigned resulting degradation prediction accuracy importance predictor. faster alternative permutation negation importance ANOVA importance, computes proportion times variable obtains low p-value (p < 0.01) forest grown.","code":"orsf_vi_negate(pbc_fit) #> bili copper stage age protime sex #> 0.114050616 0.058879090 0.046094524 0.038011836 0.028853966 0.020874091 #> trt platelet edema albumin ascites alk.phos #> 0.017129950 0.013471420 0.012868400 0.012394794 0.010231183 0.009874770 #> spiders trig ast chol hepato #> 0.005304760 0.005121708 0.003857380 0.001530643 -0.006896236 orsf_vi_permute(pbc_fit) #> bili stage copper age albumin chol #> 0.081386482 0.027461497 0.026676054 0.024209679 0.018792662 0.012066683 #> alk.phos ast trig spiders ascites platelet #> 0.009972421 0.009671777 0.006336094 0.004536710 0.004102601 0.003379097 #> trt sex edema protime hepato #> 0.001788329 0.000976518 -0.000169104 -0.002325850 -0.007547886 orsf_vi_anova(pbc_fit) #> ascites bili copper ast albumin stage edema #> 0.44444444 0.40000000 0.33333333 0.26666667 0.26666667 0.25000000 0.24047619 #> age chol sex spiders protime alk.phos platelet #> 0.19047619 0.18750000 0.18181818 0.16666667 0.15789474 0.13043478 0.12500000 #> hepato trig trt #> 0.12500000 0.06666667 0.04761905"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"Interpretation","what":"Partial dependence (PD)","title":"Introduction to aorsf","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. PD, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"Interpretation","what":"Individual conditional expectations (ICE)","title":"Introduction to aorsf","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"what-about-the-original-orsf","dir":"Articles","previous_headings":"","what":"What about the original ORSF?","title":"Introduction to aorsf","text":"original ORSF (.e., obliqueRSF) used glmnet find linear combinations inputs. aorsf allows users implement approach using orsf_control_survival(method = 'net') function: net forests fit lot faster original ORSF function obliqueRSF. However, net forests still much slower cph ones.","code":"orsf_net <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, control = orsf_control_survival(method = 'net'))"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"aorsf-and-other-machine-learning-software","dir":"Articles","previous_headings":"","what":"aorsf and other machine learning software","title":"Introduction to aorsf","text":"unique feature aorsf fast algorithms fit ORSF ensembles. RLT obliqueRSF fit oblique random survival forests, aorsf faster. ranger randomForestSRC fit survival forests, neither package supports oblique splitting. obliqueRF fits oblique random forests classification regression, survival. PPforest fits oblique random forests classification survival. Note: default prediction behavior aorsf models produce predicted risk specific prediction horizon, default ranger randomForestSRC. think change future, computing time independent predictions aorsf helpful.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"learning-more","dir":"Articles","previous_headings":"","what":"Learning more","title":"Introduction to aorsf","text":"aorsf began dedicated package oblique random survival forests, papers published far focused survival analysis risk prediction. However, routines regression classification oblique RFs aorsf high overlap survival ones. See orsf details oblique random survival forests. see JCGS paper details algorithms used specifically aorsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"go-faster","dir":"Articles","previous_headings":"","what":"Go faster","title":"Tips to speed up computation","text":"Analyses can slow crawl models need hours run. article find tricks prevent bottleneck using orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-specify-a-control","dir":"Articles","previous_headings":"","what":"Don’t specify a control","title":"Tips to speed up computation","text":"default control orsf() NULL , unspecified, orsf() pick fastest possible control depending type forest grown. default control run-time compared approaches can striking. example:","code":"time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 5) ) # control_fast() is much faster time_net['elapsed'] / time_fast['elapsed'] #> elapsed #> 56.9"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"use-n_thread","dir":"Articles","previous_headings":"","what":"Use n_thread","title":"Tips to speed up computation","text":"n_thread argument uses multi-threading run aorsf functions parallel possible. know many threads want, e.g. want exactly 5, just say n_thread = 5. aren’t sure many threads available want use many can, say n_thread = 0 aorsf figure number . R single threaded language, multi-threading applied orsf() needs call R functions C++, occurs customized R function used find linear combination variables compute prediction accuracy.","code":"# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 22 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.77 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"do-less","dir":"Articles","previous_headings":"","what":"Do less","title":"Tips to speed up computation","text":"defaults orsf() can adjusted make run faster: set n_retry 0 instead 3 (default) set oobag_pred_type ‘none’ instead ‘surv’ (default) set ‘importance’ ‘none’ instead ‘anova’ (default) increase split_min_events, split_min_obs, leaf_min_events, leaf_min_obs make trees stop growing sooner increase split_min_stat make trees stop growing sooner Applying tips: default values make orsf() run slower, also usually make predictions accurate make fit easier interpret.","code":"orsf(pbc_orsf, formula = time+status~., n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 18 #> N predictors per node: 5 #> Average leaves per tree: 5.6 #> Min observations in leaf: 5 #> Min events in leaf: 10 #> OOB stat value: none #> OOB stat type: none #> Variable importance: none #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"show-progress","dir":"Articles","previous_headings":"","what":"Show progress","title":"Tips to speed up computation","text":"Setting verbose_progress = TRUE doesn’t make anything run faster, can help make feel like things running less slow.","code":"verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE) #> Growing trees: 100%. #> Computing predictions: 100%."},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-data","dir":"Articles","previous_headings":"","what":"Out-of-bag data","title":"Out-of-bag predictions and evaluation","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. ‘--bag’ data instances bootstrapped training set.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-predictions-and-error","dir":"Articles","previous_headings":"","what":"Out-of-bag predictions and error","title":"Out-of-bag predictions and evaluation","text":"tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. --bag prediction error plays central role routines estimate variable importance, e.g. negation importance. Let’s fit oblique random survival forest plot distribution ensemble --bag predictions. surprisingly, survival predictions 0 1. Next, let’s check --bag accuracy fit: --bag estimate 1 (default method evaluate --bag predictions) 0.7490789.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, oobag_pred_type = 'surv', n_tree = 5, oobag_pred_horizon = 2000) hist(fit$pred_oobag, main = 'Ensemble out-of-bag survival predictions at t=3,500') # what function is used to evaluate out-of-bag predictions? fit$eval_oobag$stat_type #> [1] 1 # what is the output from this function? fit$eval_oobag$stat_values #> [,1] #> [1,] 0.7490789"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"monitoring-out-of-bag-error","dir":"Articles","previous_headings":"","what":"Monitoring out-of-bag error","title":"Out-of-bag predictions and evaluation","text":"--bag data set contains one-third training set, --bag error estimate usually converges stable value trees added forest. want monitor convergence --bag error oblique random survival forest, can set oobag_eval_every compute --bag error every oobag_eval_every tree. example, let’s compute --bag error fitting tree forest 50 trees: general, least 500 trees recommended random forest fit. ’re just using 10 illustration.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_type = 'surv', oobag_pred_horizon = 2000, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag C-statistic computed after each new tree is grown.', xlab = 'Number of trees grown', ylab = fit$eval_oobag$stat_type ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"user-supplied-out-of-bag-evaluation-functions","dir":"Articles","previous_headings":"","what":"User-supplied out-of-bag evaluation functions","title":"Out-of-bag predictions and evaluation","text":"cases, may want control --bag error estimated. example, let’s use Brier score SurvMetrics package: two ways apply function compute --bag error. First, can apply function --bag survival predictions stored ‘aorsf’ objects, e.g: Second, can pass function orsf(), used place Harrell’s C-statistic:","code":"oobag_fun_brier <- function(y_mat, w_vec, s_vec){ # output is numeric vector of length 1 as.numeric( SurvMetrics::Brier( object = Surv(time = y_mat[, 1], event = y_mat[, 2]), pre_sp = s_vec, # t_star in Brier() should match oob_pred_horizon in orsf() t_star = 2000 ) ) } oobag_fun_brier(y_mat = pbc_orsf[,c('time', 'status')], s_vec = fit$pred_oobag) #> [1] 0.117498 fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_horizon = 2000, oobag_fun = oobag_fun_brier, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag error computed after each new tree is grown.', sub = 'For the Brier score, lower values indicate more accurate predictions', xlab = 'Number of trees grown', ylab = \"Brier score\" ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"specific-instructions-on-user-supplied-functions","dir":"Articles","previous_headings":"User-supplied out-of-bag evaluation functions","what":"Specific instructions on user-supplied functions","title":"Out-of-bag predictions and evaluation","text":"User-supplied functions must: exactly three arguments named y_mat, w_vec, s_vec. return numeric output length 1 either conditions true, error occur. simple test make sure user-supplied function work aorsf package :","code":"# Helper code to make sure your oobag_fun function will work with aorsf # time and status values test_time <- seq(from = 1, to = 5, length.out = 100) test_status <- rep(c(0,1), each = 50) # y-matrix is presumed to contain time and status (with column names) y_mat <- cbind(time = test_time, status = test_status) # s_vec is presumed to be a vector of survival probabilities s_vec <- seq(0.9, 0.1, length.out = 100) # see 1 in the checklist above names(formals(oobag_fun_brier)) == c(\"y_mat\", \"w_vec\", \"s_vec\") #> [1] TRUE TRUE TRUE test_output <- oobag_fun_brier(y_mat = y_mat, w_vec = w_vec, s_vec = s_vec) # test output should be numeric is.numeric(test_output) #> [1] TRUE # test_output should be a numeric value of length 1 length(test_output) == 1 #> [1] TRUE"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"notes","dir":"Articles","previous_headings":"","what":"Notes","title":"Out-of-bag predictions and evaluation","text":"evaluating --bag error: oobag_pred_horizon input orsf() determines prediction horizon --bag predictions. prediction horizon needs specified evaluate prediction accuracy cases, examples . sure check case using functions, , sure oobag_pred_horizon matches prediction horizon used custom function. functions expect predicted risk (.e., 1 - predicted survival), others expect predicted survival. cases, also able use function whatsoever compute --bag prediction error estimating negation permutation importance, assuming passes tests . Unfortunately, exception riskRegression::Score(), one favorites. experimented riskRegression::Score found work try run C++. sure case.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"","what":"Partial dependence (PD)","title":"PD and ICE curves with ORSF","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. Begin fitting ORSF ensemble. Set prediction horizon 5 years fit ensemble aorsf function pass ensemble assume want compute predictions 5 years.","code":"library(aorsf) pred_horizon <- 365.25 * 5 set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, n_tree = 50, oobag_pred_horizon = pred_horizon) fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 150 #> N events: 52 #> N trees: 50 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 10.26 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.82 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"three-ways-to-compute-pd","dir":"Articles","previous_headings":"","what":"Three ways to compute PD","title":"PD and ICE curves with ORSF","text":"can compute PD three ways aorsf: using -bag predictions training data using --bag predictions training data using predictions new set data -bag PD indicates relationships model learned training. helpful goal interpret model. --bag PD indicates relationships model learned training using --bag data simulates application model new data. want test model’s reliability fairness new data don’t access large testing set. new data PD shows model predicts outcomes observations seen. helpful want test model’s reliability fairness. Let’s re-fit ORSF model available data proceeding next sections.","code":"pd_inb <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5)) pd_inb #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7907730 0.2481813 0.8937603 0.9844993 #> 2: 1826.25 2 0.7601163 0.2197790 0.8682995 0.9727126 #> 3: 1826.25 3 0.7285689 0.2036100 0.8217230 0.9565828 #> 4: 1826.25 4 0.6746859 0.1718164 0.7566957 0.9123469 #> 5: 1826.25 5 0.6432024 0.1594598 0.7325270 0.8812735 pd_oob <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_oob #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7881621 0.2863629 0.8597642 0.9894571 #> 2: 1826.25 2 0.7555353 0.2442537 0.8200408 0.9748819 #> 3: 1826.25 3 0.7255229 0.2015414 0.8066997 0.9652856 #> 4: 1826.25 4 0.6627814 0.1825518 0.7251021 0.9222259 #> 5: 1826.25 5 0.6312946 0.1513669 0.6887701 0.9008661 pd_test <- orsf_pd_new(fit, new_data = pbc_orsf_test, pred_spec = list(bili = 1:5)) pd_test #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7431989 0.2273887 0.7909497 0.9839732 #> 2: 1826.25 2 0.7109910 0.1997981 0.7468900 0.9641052 #> 3: 1826.25 3 0.6810310 0.2020585 0.7197035 0.9462750 #> 4: 1826.25 4 0.6286608 0.1842981 0.6652184 0.8986032 #> 5: 1826.25 5 0.5963168 0.1735789 0.6239932 0.8723134 set.seed(329730) fit <- orsf(pbc_orsf, Surv(time, status) ~ . -id, n_tree = 50, oobag_pred_horizon = pred_horizon)"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"one-variable-one-horizon","dir":"Articles","previous_headings":"","what":"One variable, one horizon","title":"PD and ICE curves with ORSF","text":"Computing PD single variable straightforward: output shows expected predicted mortality risk men substantially higher women 5 years baseline.","code":"pd_sex <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\"))) pd_sex #> pred_horizon sex mean lwr medn upr #> 1: 1826.25 m 0.6435480 0.07603602 0.7117799 0.9703216 #> 2: 1826.25 f 0.6856422 0.05828848 0.8006676 0.9897476"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"one-variable-moving-horizon","dir":"Articles","previous_headings":"","what":"One variable, moving horizon","title":"PD and ICE curves with ORSF","text":"effect predictor varies time? PD can show . inspection, can see males higher risk females difference risk grows time. can also seen viewing ratio expected risk time:","code":"pd_sex_tv <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\")), pred_horizon = seq(365, 365*5)) ggplot(pd_sex_tv, aes(x = pred_horizon, y = mean, color = sex)) + geom_line() + labs(x = 'Time since baseline', y = 'Expected risk') library(data.table) ratio_tv <- pd_sex_tv[ , .(ratio = mean[sex == 'm'] / mean[sex == 'f']), by = pred_horizon ] ggplot(ratio_tv, aes(x = pred_horizon, y = ratio)) + geom_line(color = 'grey') + geom_smooth(color = 'black', se = FALSE) + labs(x = 'time since baseline', y = 'ratio in expected risk for males versus females') #> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"multiple-variables-marginally","dir":"Articles","previous_headings":"","what":"Multiple variables, marginally","title":"PD and ICE curves with ORSF","text":"want compute PD marginally multiple variables, just list variable values pred_spec specify expand_grid = FALSE. Now tedious wanted variables? bet. ’s made function . bonus, printed output sorted least important variables. ’s easy enough turn ‘summary’ object data.table downstream plotting tables.","code":"pd_two_vars <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\"), bili = 1:5), expand_grid = FALSE) pd_two_vars #> pred_horizon variable value level mean lwr medn upr #> 1: 1826.25 sex NA m 0.6435480 0.07603602 0.7117799 0.9703216 #> 2: 1826.25 sex NA f 0.6856422 0.05828848 0.8006676 0.9897476 #> 3: 1826.25 bili 1 0.7578522 0.14149456 0.8228529 0.9879458 #> 4: 1826.25 bili 2 0.7007216 0.09924444 0.7792010 0.9688718 #> 5: 1826.25 bili 3 0.6463678 0.08210453 0.7216161 0.9398680 #> 6: 1826.25 bili 4 0.6058594 0.06268395 0.6709035 0.9324928 #> 7: 1826.25 bili 5 0.5760493 0.05933883 0.6265555 0.9027867 pd_smry <- orsf_summarize_uni(fit, n_variables = 4) pd_smry #> #> -- ascites (VI Rank: 1) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6881650 0.7943433 0.4746425 0.9449489 #> 1 0.5111958 0.5708393 0.3199671 0.7159134 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7651622 0.8323025 0.6370644 0.9563400 #> 1.40 0.7343138 0.8090864 0.5983719 0.9342372 #> 3.52 0.6271650 0.6967314 0.4420082 0.8455518 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6941026 0.8006676 0.4662736 0.9449489 #> 0.5 0.5978522 0.6289791 0.3879660 0.8228830 #> 1 0.5981885 0.6461966 0.3858379 0.8131762 #> #> -- copper (VI Rank: 4) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 42.8 0.7397492 0.8450985 0.5764946 0.9533662 #> 74.0 0.7168641 0.8286188 0.5265654 0.9448001 #> 129 0.6565337 0.7409069 0.4453135 0.8936874 #> #> Predicted survival at time t = 1826.25 for top 4 predictors head(as.data.table(pd_smry)) #> variable importance Value Mean Median 25th % 75th % #> 1: ascites 0.5225225 0 0.6881650 0.7943433 0.4746425 0.9449489 #> 2: ascites 0.5225225 1 0.5111958 0.5708393 0.3199671 0.7159134 #> 3: bili 0.3807339 0.80 0.7651622 0.8323025 0.6370644 0.9563400 #> 4: bili 0.3807339 1.40 0.7343138 0.8090864 0.5983719 0.9342372 #> 5: bili 0.3807339 3.52 0.6271650 0.6967314 0.4420082 0.8455518 #> 6: edema 0.2965828 0 0.6941026 0.8006676 0.4662736 0.9449489 #> pred_horizon level #> 1: 1826.25 0 #> 2: 1826.25 1 #> 3: 1826.25 #> 4: 1826.25 #> 5: 1826.25 #> 6: 1826.25 0"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"multiple-variables-jointly","dir":"Articles","previous_headings":"","what":"Multiple variables, jointly","title":"PD and ICE curves with ORSF","text":"PD can show expected value model’s predictions function specific predictor, function multiple predictors. instance, can estimate predicted risk joint function bili, edema, trt: inspection, model’s predictions indicate slightly lower risk placebo group, seem change much different values bili edema. clear increase predicted risk higher levels edema higher levels bili slope predicted risk function bili appears highest among patients edema 0.5. effect bili modified edema 0.5? quick sanity check coxph suggests .","code":"pred_spec = list(bili = seq(1, 5, length.out = 20), edema = levels(pbc_orsf_train$edema), trt = levels(pbc_orsf$trt)) pd_bili_edema <- orsf_pd_oob(fit, pred_spec) library(ggplot2) ggplot(pd_bili_edema, aes(x = bili, y = medn, col = trt, linetype = edema)) + geom_line() + labs(y = 'Expected predicted risk') library(survival) pbc_orsf$edema_05 <- ifelse(pbc_orsf$edema == '0.5', 'yes', 'no') fit_cph <- coxph(Surv(time,status) ~ edema_05 * bili, data = pbc_orsf) anova(fit_cph) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> edema_05 -546.83 6.7248 1 0.009508 ** #> bili -513.59 66.4689 1 3.555e-16 *** #> edema_05:bili -510.54 6.1112 1 0.013433 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"","what":"Individual conditional expectations (ICE)","title":"PD and ICE curves with ORSF","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. Just like PD, can compute ICE using -bag, --bag, testing data, principles apply. ’ll use --bag estimates .","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"visualizing-ice-curves","dir":"Articles","previous_headings":"","what":"Visualizing ICE curves","title":"PD and ICE curves with ORSF","text":"Inspecting ICE curves observation can help identify whether heterogeneity model’s predictions. .e., effect variable follow pattern data, groups variable impacts risk differently? going turn boundary checking orsf_ice_oob setting boundary_checks = FALSE, allow generate ICE curves go beyond 90th percentile bili. id_variable identifier current value variable(s) data. redundant one variable, helpful multiple variables. id_row identifier observation original data. used group observation’s predictions together plots. plots, helpful scale ICE data. subtract initial value predicted risk (.e., bili = 1) observation’s conditional expectation values. , Every curve start 0 plot shows change predicted risk function bili. Now can visualize curves. inspection figure, individual slopes cluster around overall trend - Good! small number individual slopes appear flat. may helpful investigate .","code":"pred_spec <- list(bili = seq(1, 10, length.out = 25)) ice_oob <- orsf_ice_oob(fit, pred_spec, boundary_checks = FALSE) ice_oob #> id_variable id_row pred_horizon bili pred #> 1: 1 1 1826.25 1 0.1256531 #> 2: 1 2 1826.25 1 0.1266534 #> 3: 1 3 1826.25 1 0.2493624 #> 4: 1 4 1826.25 1 0.1644180 #> 5: 1 5 1826.25 1 0.4634519 #> --- #> 6896: 25 272 1826.25 10 0.4739199 #> 6897: 25 273 1826.25 10 0.5088416 #> 6898: 25 274 1826.25 10 0.8041698 #> 6899: 25 275 1826.25 10 0.3732715 #> 6900: 25 276 1826.25 10 0.7077810 ice_oob[, pred_subtract := rep(pred[id_variable==1], times=25)] ice_oob[, pred := pred - pred_subtract] library(ggplot2) ggplot(ice_oob, aes(x = bili, y = pred, group = id_row)) + geom_line(alpha = 0.15) + labs(y = 'Change in predicted risk') + geom_smooth(se = FALSE, aes(group = 1)) #> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"limitations-of-pd","dir":"Articles","previous_headings":"","what":"Limitations of PD","title":"PD and ICE curves with ORSF","text":"Partial dependence number known limitations assumptions users aware (see Hooker, 2021). particular, partial dependence less intuitive >2 predictors examined jointly, assumed feature(s) partial dependence computed correlated features (likely true many cases). Accumulated local effect plots can used (see ) case feature independence valid assumption.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"PD and ICE curves with ORSF","text":"Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires least One Model, Free Variable Importance. arXiv e-prints 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Byron Jaeger. Author, maintainer. Nicholas Pajewski. Contributor. Sawyer Welden. Contributor. Christopher Jackson. Reviewer. Marvin Wright. Reviewer. Lukas Burk. Reviewer.","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Jaeger et al. (2022). aorsf: R package supervised learning using oblique random survival forest. Journal Open Source Software, 7(77), 4705. https://doi.org/10.21105/joss.04705. Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics. 2023 Aug 3:1-6. Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique Random Survival Forests. Annals Applied Statistics. 13(3): 1847-1883. URL https://doi.org/10.1214/19-AOAS1261 DOI: 10.1214/19-AOAS1261","code":"@Article{, title = {aorsf: An R package for supervised learning using the oblique random survival forest}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Nicholas M. Pajewski}, journal = {Journal of Open Source Software}, year = {2022}, volume = {7}, number = {77}, pages = {4705}, url = {https://doi.org/10.21105/joss.04705}, } @Article{, title = {Accelerated and interpretable oblique random survival forests}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Jaime L. Speiser and Matthew W. Segar and Ambarish Pandey and Nicholas M. Pajewski}, journal = {Journal of Computational and Graphical Statistics}, year = {2023}, url = {https://doi.org/10.1080/10618600.2023.2231048}, } @Article{, title = {Oblique Random Survival Forests}, author = {Byron C. Jaeger and D. Leann Long and Dustin M. Long and Mario Sims and Jeff M. Szychowski and Yuan-I Min and Leslie A. Mcclure and George Howard and Noah Simon}, journal = {Annals of Applied Statistics}, year = {2019}, volume = {13}, number = {3}, pages = {1847--1883}, url = {https://doi.org/10.1214/19-AOAS1261}, }"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"aorsf-","dir":"","previous_headings":"","what":"Accelerated Oblique Random Survival Forests","title":"Accelerated Oblique Random Survival Forests","text":"Fit, interpret, make predictions oblique random survival forests (ORSFs).","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"why-aorsf","dir":"","previous_headings":"","what":"Why aorsf?","title":"Accelerated Oblique Random Survival Forests","text":"Hundreds times faster obliqueRSF.1 Accurate predictions censored outcomes.2 Negation importance, novel technique estimate variable importance ORSFs.2 Intuitive API formula based interface. Extensive input checks informative error messages.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Accelerated Oblique Random Survival Forests","text":"can install aorsf CRAN using can install development version aorsf GitHub :","code":"install.packages(\"aorsf\") # install.packages(\"remotes\") remotes::install_github(\"ropensci/aorsf\")"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"what-is-an-oblique-decision-tree","dir":"","previous_headings":"","what":"What is an oblique decision tree?","title":"Accelerated Oblique Random Survival Forests","text":"Decision trees developed splitting set training data two new subsets, goal similarity within new subsets . splitting process repeated resulting subsets data stopping criterion met. new subsets data formed based single predictor, decision tree said axis-based splits data appear perpendicular axis predictor. linear combinations variables used instead single variable, tree oblique splits data neither parallel right angle axis. Figure: Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Accelerated Oblique Random Survival Forests","text":"orsf() function can fit several types ORSF ensembles. personal favorite accelerated ORSF great combination prediction accuracy computational efficiency (see JCGS paper).2","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5)"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"inspect","dir":"","previous_headings":"Examples","what":"Inspect","title":"Accelerated Oblique Random Survival Forests","text":"Printing output orsf() give information descriptive statistics ensemble. See print.orsf_fit description line printed output. See orsf examples details controlling ORSF ensemble fits using prediction modeling workflows.","code":"fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated #> N observations: 150 #> N events: 52 #> N trees: 500 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 10 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.83 #> OOB stat type: Harrell's C-statistic #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"variable-importance","dir":"","previous_headings":"Examples","what":"Variable importance","title":"Accelerated Oblique Random Survival Forests","text":"importance individual variables can estimated three ways using aorsf: negation2: variable assessed separately multiplying variable’s coefficients -1 determining much model’s performance changes. worse model’s performance negating coefficients given variable, important variable. technique promising b/c require permutation emphasizes variables larger coefficients linear combinations, also relatively new hasn’t studied much permutation importance. See Jaeger, (2023) details technique. permutation: variable assessed separately randomly permuting variable’s values determining much model’s performance changes. worse model’s performance permuting values given variable, important variable. technique flexible, intuitive, frequently used. also several known limitations analysis variance (ANOVA)3: p-value computed coefficient linear combination variables decision tree. Importance individual predictor variable proportion times p-value coefficient < 0.01. technique efficient computationally, may effective permutation negation terms selecting signal noise variables. See Menze, 2011 details technique. can supply R function estimate --bag error using negation permutation importance (see oob vignette)","code":"orsf_vi_negate(fit) #> bili sex copper stage age #> 0.1162463868 0.0517905362 0.0375565841 0.0240450064 0.0239056901 #> ast protime hepato edema ascites #> 0.0191083400 0.0158014897 0.0139536512 0.0119264604 0.0100865906 #> albumin chol spiders trt trig #> 0.0085394443 0.0037903802 0.0030727468 0.0020617896 0.0018361632 #> alk.phos platelet #> 0.0006586211 -0.0044967624 orsf_vi_permute(fit) #> bili copper age stage sex #> 0.0523994364 0.0187964038 0.0152246586 0.0115192591 0.0110127557 #> ast hepato edema ascites albumin #> 0.0100104477 0.0082889176 0.0079183046 0.0077834483 0.0070642325 #> protime trig chol spiders alk.phos #> 0.0066513097 0.0015656325 0.0014474560 0.0006015308 0.0001369292 #> trt platelet #> -0.0013984860 -0.0022427356 orsf_vi_anova(fit) #> bili ascites edema copper stage sex age #> 0.48778004 0.44943820 0.41677872 0.31865585 0.26675095 0.26458616 0.25448430 #> ast hepato albumin chol protime trig spiders #> 0.21743929 0.19945726 0.18191604 0.15240328 0.15076561 0.13709677 0.11833550 #> alk.phos platelet trt #> 0.10113636 0.06302021 0.05019305"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"partial-dependence-pd","dir":"","previous_headings":"Examples","what":"Partial dependence (PD)","title":"Accelerated Oblique Random Survival Forests","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. summary function, orsf_summarize_uni(), computes PD many variables ask , using sensible values. PD, see vignette","code":"orsf_summarize_uni(fit, n_variables = 2) #> #> -- bili (VI Rank: 1) --------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> 0.70 0.1986719 0.1044026 0.04354701 0.2968290 #> 1.3 0.2132847 0.1210276 0.05245387 0.3208855 #> 3.2 0.2883814 0.1917119 0.11951296 0.4147258 #> #> -- sex (VI Rank: 2) ---------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> m 0.3394141 0.2313787 0.13762339 0.5311308 #> f 0.2390067 0.1112093 0.04782891 0.3773551 #> #> Predicted risk at time t = 1826.25 for top 2 predictors"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"individual-conditional-expectations-ice","dir":"","previous_headings":"Examples","what":"Individual conditional expectations (ICE)","title":"Accelerated Oblique Random Survival Forests","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"comparison-to-existing-software","dir":"","previous_headings":"","what":"Comparison to existing software","title":"Accelerated Oblique Random Survival Forests","text":"Comparisons aorsf existing software presented JCGS paper. paper: describes aorsf detail summary procedures used tree fitting algorithm runs general benchmark comparing aorsf obliqueRSF several learners reports prediction accuracy computational efficiency learners. runs simulation study comparing variable importance techniques ORSFs, axis based RSFs, boosted trees. reports probability variable importance technique rank relevant variable higher importance irrelevant variable. hands-comparison aorsf R packages provided orsf examples","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Accelerated Oblique Random Survival Forests","text":"Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"funding","dir":"","previous_headings":"","what":"Funding","title":"Accelerated Oblique Random Survival Forests","text":"developers aorsf receive financial support Center Biomedical Informatics, Wake Forest University School Medicine. also receive support National Center Advancing Translational Sciences National Institutes Health Award Number UL1TR001420. content solely responsibility authors necessarily represent official views National Institutes Health.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":null,"dir":"Reference","previous_headings":"","what":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","title":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","text":"Fit, interpret, make predictions oblique random survival forests. Oblique decision trees notoriously slow compared axis based counterparts, 'aorsf' runs fast faster axis-based decision tree algorithms right-censored time--event outcomes. Methods accelerate interpret oblique random survival forest described Jaeger et al., (2023) doi:10.1080/10618600.2023.2231048 .","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","text":"Maintainer: Byron Jaeger bjaeger@wakehealth.edu (ORCID) contributors: Nicholas Pajewski [contributor] Sawyer Welden swelden@wakehealth.edu [contributor] Christopher Jackson chris.jackson@mrc-bsu.cam.ac.uk [reviewer] Marvin Wright [reviewer] Lukas Burk [reviewer]","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"Coerce to data.table — as.data.table.orsf_summary_uni","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"Convert 'orsf_summary' object data.table object.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"# S3 method for orsf_summary_uni as.data.table(x, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"x object class 'orsf_summary_uni' ... used","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"data.table","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"library(data.table) object <- orsf(pbc_orsf, Surv(time, status) ~ . - id) smry <- orsf_summarize_uni(object, n_variables = 3) as.data.table(smry) #> variable importance value mean medn lwr upr #> 1: ascites 0.4916091 0 0.6923353 0.8129007 0.4723085 0.9385501 #> 2: ascites 0.4916091 1 0.5332829 0.6019530 0.3495088 0.7316614 #> 3: bili 0.4264008 0.80 0.7651674 0.8500293 0.6487390 0.9437248 #> 4: bili 0.4264008 1.40 0.7431069 0.8221693 0.6026852 0.9308567 #> 5: bili 0.4264008 3.52 0.6241095 0.6858060 0.4487054 0.8175104 #> 6: edema 0.3064635 0 0.6989076 0.8175546 0.4723085 0.9399742 #> 7: edema 0.3064635 0.5 0.5913448 0.6411541 0.3889686 0.7984051 #> 8: edema 0.3064635 1 0.5876363 0.6248548 0.3604497 0.8203510 #> pred_horizon level #> 1: 1788 0 #> 2: 1788 1 #> 3: 1788 #> 4: 1788 #> 5: 1788 #> 6: 1788 0 #> 7: 1788 0.5 #> 8: 1788 1"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Oblique Random Survival Forest (ORSF) — orsf","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Fit oblique random survival forest","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"","code":"orsf( data, formula, control = NULL, weights = NULL, n_tree = 500, n_split = 5, n_retry = 3, n_thread = 1, mtry = NULL, sample_with_replacement = TRUE, sample_fraction = 0.632, leaf_min_events = 1, leaf_min_obs = 5, split_rule = NULL, split_min_events = 5, split_min_obs = 10, split_min_stat = NULL, oobag_pred_type = NULL, oobag_pred_horizon = NULL, oobag_eval_every = n_tree, oobag_fun = NULL, importance = \"anova\", importance_max_pvalue = 0.01, group_factors = TRUE, tree_seeds = NULL, attach_data = TRUE, no_fit = FALSE, na_action = \"fail\", verbose_progress = FALSE, ... ) orsf_train(object, attach_data = TRUE)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"data data.frame, tibble, data.table contains relevant variables. formula (formula) response left hand side include time variable, followed status variable, may written inside call Surv (see examples). terms right names predictor variables. control (orsf_control) object returned one orsf_control functions: orsf_control_fast (default) uses single iteration Newton Raphson scoring identify linear combination predictors. orsf_control_cph uses Newton Raphson scoring convergence criteria met. orsf_control_net uses glmnet identify linear combinations predictors, similar Jaeger (2019). orsf_control_custom allows user apply function create linear combinations predictors. weights (numeric vector) Optional. given, input length equal nrow(data). Values weights treated like replication weights, .e., value 2 thing 2 observations data, containing copy corresponding person's data. Use weights cautiously, orsf count number observations events prior growing node tree, higher values weights lead deeper trees. n_tree (integer) number trees grow. Default n_tree = 500. n_split (integer) number cut-points assessed splitting node decision trees. Default n_split = 5. n_retry (integer) node can split, current linear combination inputs unable provide valid split, orsf try new linear combination based different set randomly selected predictors, n_retry times. Default n_retry = 3. Set n_retry = 0 prevent retries. n_thread (integer) number threads use growing trees, computing predictions, computing importance. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. mtry (integer) Number predictors randomly included candidates splitting node. default smallest integer greater square root number total predictors, .e., mtry = ceiling(sqrt(number predictors)) sample_with_replacement (logical) TRUE (default), observations sampled replacement -bag sample created decision tree. FALSE, observations sampled without replacement tree -bag sample containing sample_fraction% original sample. sample_fraction (double) proportion observations trees' -bag sample contain, relative number rows data. used sample_with_replacement FALSE. Default value 0.632. leaf_min_events (integer) minimum number events leaf node. Default leaf_min_events = 1 leaf_min_obs (integer) minimum number observations leaf node. Default leaf_min_obs = 5. split_rule (character) assess quality potential splitting rule node. Valid options 'logrank' : log-rank test statistic. 'cstat' : Harrell's concordance statistic. split_min_events (integer) minimum number events required node consider splitting . Default split_min_events = 5 split_min_obs (integer) minimum number observations required node consider splitting . Default split_min_obs = 10. split_min_stat (double) minimum test statistic required split node. Default 3.841459 split_rule = 'logrank' 0.50 split_rule = 'cstat'. splits found statistic exceeding split_min_stat, given node either becomes leaf retry occurs (n_retry retries). oobag_pred_type (character) type --bag predictions compute fitting ensemble. Valid options 'none' : compute --bag predictions 'risk' : probability event occurring oobag_pred_horizon. 'surv' : 1 - risk. 'chf' : cumulative hazard function oobag_pred_horizon. 'mort' : mortality, .e., number events expected observations training data identical given observation. oobag_pred_horizon (numeric) numeric value indicating time used --bag predictions. Default median observed times, .e., oobag_pred_horizon = median(time). oobag_eval_every (integer) --bag performance ensemble checked every oobag_eval_every trees. , oobag_eval_every = 10, --bag performance checked growing 10th tree, 20th tree, . Default oobag_eval_every = n_tree. oobag_fun (function) used evaluating --bag prediction accuracy every oobag_eval_every trees. oobag_fun = NULL (default), Harrell's C-statistic (1982) used evaluate accuracy. use oobag_fun note following: oobag_fun two inputs: y_mat s_vec y_mat two column matrix first column named 'time', second named 'status' s_vec numeric vector containing predicted survival probabilities. oobag_fun return numeric output length 1 details, see --bag vignette. importance (character) Indicate method variable importance: 'none': variable importance computed. 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance details methods, see orsf_vi. importance_max_pvalue (double) relevant importance \"anova\". maximum p-value register positive case counting number times variable found 'significant' tree growth. Default 0.01, recommended Menze et al. group_factors (logical) relevant variable importance estimated. TRUE, importance factor variables reported overall aggregating importance individual levels factor. FALSE, importance individual factor levels returned. tree_seeds (integer vector) Optional. specified, random seeds set using values tree_seeds[] growing tree . Two forests grown number trees seeds exact --bag samples, making --bag error estimates forests comparable. NULL (default), seeds set training process. attach_data (logical) TRUE, copy training data attached output. helpful plan using functions like orsf_pd_oob orsf_summarize_uni interpret forest using training data. Default TRUE. no_fit (logical) TRUE, model fitting steps defined saved, training initiated. object returned can directly submitted orsf_train() long attach_data TRUE. na_action (character) happen data contains missing values (.e., NA values). Valid options : 'fail' : error thrown data contains NA values 'omit' : rows data incomplete data dropped 'impute_meanmode' : missing values continuous categorical variables data imputed using mean mode, respectively. Note option selected attach_data TRUE, data attached output imputed version data. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. ... arguments passed methods (currently used). object untrained 'aorsf' object, created setting no_fit = TRUE orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"accelerated oblique RSF object (aorsf)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"function based similar ORSF function obliqueRSF R package. primary difference function runs much faster. speed increase attributable better management memory (.e., unnecessary copies inputs) using Newton Raphson scoring algorithm identify linear combinations inputs rather performing penalized regression using routines glmnet.modified Newton Raphson scoring algorithm function applies adaptation C++ routine developed Terry M. Therneau fits Cox proportional hazards models (see survival::coxph() specifically survival::coxph.fit()).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"details-on-inputs","dir":"Reference","previous_headings":"","what":"Details on inputs","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"formula: response formula can survival object returned Surv function, can also just time status variables. .e., Surv(time, status) ~ . works just like time + status ~ . . symbol right hand side short-hand using variables data (omitting left hand side formula) predictors. order variables left hand side matters. .e., writing status + time ~ . make orsf assume status variable actually time variable. response variable can survival object stored data. example, y ~ . valid formula data$y inherits Surv class. Although can fit oblique random survival forest 1 predictor variable, formula least 2 predictors. reason recommendation linear combination predictors trivial one predictor. mtry: mtry parameter may temporarily reduced ensure least 2 events per predictor variable. occurs using orsf_control_cph coefficients Newton Raphson scoring algorithm may become unstable number covariates greater equal number events. reduction occur using orsf_control_net. oobag_fun: oobag_fun specified, used compute negation importance permutation importance, role ANOVA importance. n_thread: R function must called C++ (.e., user-supplied function compute --bag error identify linear combinations variables), n_thread automatically set 1 attempting run R functions multiple threads cause R session crash.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-an-oblique-decision-tree-","dir":"Reference","previous_headings":"","what":"What is an oblique decision tree?","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Decision trees developed splitting set training data two new subsets, goal similarity within new subsets . splitting process repeated resulting subsets data stopping criterion met. new subsets data formed based single predictor, decision tree said axis-based splits data appear perpendicular axis predictor. linear combinations variables used instead single variable, tree oblique splits data neither parallel right angle axis Figure : Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-a-random-forest-","dir":"Reference","previous_headings":"","what":"What is a random forest?","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Random forests collections de-correlated decision trees. Predictions tree aggregated make ensemble prediction forest. details, see Breiman el, 2001.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"training-out-of-bag-error-and-testing","dir":"Reference","previous_headings":"","what":"Training, out-of-bag error, and testing","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. '--bag' data instances bootstrapped training set. tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. Generalization error refers error random forest's predictions applied predict outcomes data used train , .e., testing data.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"missing-data","dir":"Reference","previous_headings":"","what":"Missing data","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Data passed aorsf functions allowed missing values. user impute missing values using R package purpose, recipes mlr3pipelines.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"First load relevant packages entry-point aorsf standard call orsf(): printing fit provides quick descriptive summaries:","code":"set.seed(329730) suppressPackageStartupMessages({ library(aorsf) library(survival) library(tidymodels) library(tidyverse) library(randomForestSRC) library(ranger) library(riskRegression) library(obliqueRSF) }) fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 20.98 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"model-control","dir":"Reference","previous_headings":"","what":"Model control","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"examples make use orsf_control_ functions build compare models based --bag predictions. also standardize --bag samples using input argument tree_seeds","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"accelerated-linear-combinations","dir":"Reference","previous_headings":"","what":"Accelerated linear combinations","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"accelerated ORSF ensemble default nice balance computational speed prediction accuracy. runs single iteration Newton Raphson scoring Cox partial likelihood function find linear combinations predictors.","code":"fit_accel <- orsf(pbc_orsf, control = orsf_control_survival(), formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-cox-regression","dir":"Reference","previous_headings":"","what":"Linear combinations with Cox regression","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"orsf_control_cph runs Cox regression non-terminal node survival tree, using regression coefficients create linear combinations predictors:","code":"control_cph <- orsf_control_survival(method = 'glm', scale_x = TRUE, max_iter = 20) fit_cph <- orsf(pbc_orsf, control = control_cph, formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-penalized-cox-regression","dir":"Reference","previous_headings":"","what":"Linear combinations with penalized cox regression","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"orsf_control_net runs penalized Cox regression non-terminal node survival tree, using regression coefficients create linear combinations predictors. can really helpful want feature selection within node, lot slower options.","code":"# select 3 predictors out of 5 to be used in # each linear combination of predictors. control_net <- orsf_control_survival(method = 'net', target_df = 3) fit_net <- orsf(pbc_orsf, control = control_net, formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-your-own-function","dir":"Reference","previous_headings":"","what":"Linear combinations with your own function","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Let’s make three customized functions identify linear combinations predictors. first uses random coefficients second derives coefficients principal component analysis. third uses orsf() inside orsf(). can plug functions orsf_control_custom(), pass result orsf(): fit seems work best example? Let’s find evaluating --bag survival predictions. AUC values, highest lowest: indices prediction accuracy: inspection, net, accel, rlt high discrimination index prediction accuracy. rando pca less well, aren’t bad.","code":"f_rando <- function(x_node, y_node, w_node){ matrix(runif(ncol(x_node)), ncol=1) } f_pca <- function(x_node, y_node, w_node) { # estimate two principal components. pca <- stats::prcomp(x_node, rank. = 2) # use the second principal component to split the node pca$rotation[, 1L, drop = FALSE] } # This approach is known as reinforcement learning trees. # some special care is taken to prevent your R session from crashing. # Specifically, random coefficients are used when n_obs <= 10 # or n_events <= 5. f_aorsf <- function(x_node, y_node, w_node){ colnames(y_node) <- c('time', 'status') colnames(x_node) <- paste(\"x\", seq(ncol(x_node)), sep = '') data <- as.data.frame(cbind(y_node, x_node)) if(nrow(data) <= 10 || sum(y_node[,'status']) <= 5) return(matrix(runif(ncol(x_node)), ncol = 1)) fit <- orsf(data, time + status ~ ., weights = as.numeric(w_node), n_tree = 25, importance = 'permute') out <- orsf_vi(fit) # drop the least two important variables n_vars <- length(out) out[c(n_vars, n_vars-1)] <- 0 # ensure out has same variable order as input out <- out[colnames(x_node)] matrix(out, ncol = 1) } fit_rando <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_rando), tree_seeds = 329) fit_pca <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_pca), tree_seeds = 329) fit_rlt <- orsf(pbc_orsf, time + status ~ . - id, control = orsf_control_survival(method = f_aorsf), tree_seeds = 329) risk_preds <- list( accel = 1 - fit_accel$pred_oobag, cph = 1 - fit_cph$pred_oobag, net = 1 - fit_net$pred_oobag, rando = 1 - fit_rando$pred_oobag, pca = 1 - fit_pca$pred_oobag, rlt = 1 - fit_rlt$pred_oobag ) sc <- Score(object = risk_preds, formula = Surv(time, status) ~ 1, data = pbc_orsf, summary = 'IPA', times = fit_accel$pred_horizon) sc$AUC$score[order(-AUC)] ## model times AUC se lower upper ## 1: net 1788 0.9134593 0.02079935 0.8726933 0.9542253 ## 2: rlt 1788 0.9129537 0.01979016 0.8741657 0.9517417 ## 3: accel 1788 0.9112315 0.02098077 0.8701099 0.9523530 ## 4: cph 1788 0.9063871 0.02165434 0.8639453 0.9488288 ## 5: rando 1788 0.9023489 0.02218936 0.8588586 0.9458393 ## 6: pca 1788 0.8994220 0.02201713 0.8562692 0.9425748 sc$Brier$score[order(-IPA), .(model, times, IPA)] ## model times IPA ## 1: net 1788 0.4916038 ## 2: accel 1788 0.4879683 ## 3: cph 1788 0.4751883 ## 4: rlt 1788 0.4640282 ## 5: pca 1788 0.4370592 ## 6: rando 1788 0.4258344 ## 7: Null model 1788 0.0000000"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"tidymodels","dir":"Reference","previous_headings":"","what":"tidymodels","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"example uses tidymodels functions stops short using official tidymodels workflow. working getting aorsf pulled censored package update real workflows happens!","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"comparing-orsf-with-other-learners","dir":"Reference","previous_headings":"","what":"Comparing ORSF with other learners","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Start recipe pre-process data Next create 10-fold cross validation object pre-process data: Define functions ‘workflow’ randomForestSRC, ranger, aorsf. Run ‘workflows’ fold: Next unnest column get back tibble testing data predictions. finish aggregating predictions computing performance testing data. Note computing one statistic predictions instead computing one statistic fold. approach fine smaller testing sets /small event counts. inspection, aorsf obtained slightly higher discrimination (AUC) aorsf obtained higher index prediction accuracy (IPA)","code":"imputer <- recipe(pbc_orsf, formula = time + status ~ .) %>% step_rm(id) %>% step_impute_mean(all_numeric_predictors()) %>% step_impute_mode(all_nominal_predictors()) # 10-fold cross validation; make a container for the pre-processed data analyses <- vfold_cv(data = pbc_orsf, v = 10) %>% mutate(recipe = map(splits, ~prep(imputer, training = training(.x))), train = map(recipe, juice), test = map2(splits, recipe, ~bake(.y, new_data = testing(.x)))) analyses ## # 10-fold cross-validation ## # A tibble: 10 x 5 ## splits id recipe train test ## ## 1 Fold01 ## 2 Fold02 ## 3 Fold03 ## 4 Fold04 ## 5 Fold05 ## 6 Fold06 ## 7 Fold07 ## 8 Fold08 ## 9 Fold09 ## 10 Fold10 rfsrc_wf <- function(train, test, pred_horizon){ # rfsrc does not like tibbles, so cast input data into data.frames train <- as.data.frame(train) test <- as.data.frame(test) rfsrc(formula = Surv(time, status) ~ ., data = train) %>% predictRisk(newdata = test, times = pred_horizon) %>% as.numeric() } ranger_wf <- function(train, test, pred_horizon){ ranger(Surv(time, status) ~ ., data = train) %>% predictRisk(newdata = test, times = pred_horizon) %>% as.numeric() } aorsf_wf <- function(train, test, pred_horizon){ train %>% orsf(Surv(time, status) ~ .,) %>% predict(new_data = test, pred_type = 'risk', pred_horizon = pred_horizon) %>% as.numeric() } # 5 year risk prediction ph <- 365.25 * 5 results <- analyses %>% transmute(test, pred_aorsf = map2(train, test, aorsf_wf, pred_horizon = ph), pred_rfsrc = map2(train, test, rfsrc_wf, pred_horizon = ph), pred_ranger = map2(train, test, ranger_wf, pred_horizon = ph)) results <- results %>% unnest(everything()) glimpse(results) ## Rows: 276 ## Columns: 23 ## $ id 1, 8, 15, 17, 23, 35, 56, 68, 92, 94, 97, 109, 116, 130, 1~ ## $ trt d_penicill_main, placebo, d_penicill_main, placebo, placeb~ ## $ age 58.76523, 53.05681, 64.64613, 52.18344, 55.96715, 48.61875~ ## $ sex f, f, f, f, f, f, f, f, f, f, m, f, f, f, f, f, f, f, f, f~ ## $ ascites 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~ ## $ hepato 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0~ ## $ spiders 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0~ ## $ edema 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0.5, 0, 0.5, 0, 0, 0, 0.5, 0~ ## $ bili 14.5, 0.3, 0.8, 2.7, 17.4, 1.2, 1.1, 0.7, 1.4, 3.2, 2.0, 0~ ## $ chol 261, 280, 231, 274, 395, 314, 498, 174, 206, 201, 420, 120~ ## $ albumin 2.60, 4.00, 3.87, 3.15, 2.94, 3.20, 3.80, 4.09, 3.13, 3.11~ ## $ copper 156, 52, 173, 159, 558, 201, 88, 58, 36, 178, 62, 53, 74, ~ ## $ alk.phos 1718.0, 4651.2, 9009.8, 1533.0, 6064.8, 12258.8, 13862.4, ~ ## $ ast 137.95, 28.38, 127.71, 117.80, 227.04, 72.24, 95.46, 71.30~ ## $ trig 172, 189, 96, 128, 191, 151, 319, 46, 70, 69, 91, 52, 382,~ ## $ platelet 190, 373, 295, 224, 214, 431, 365, 203, 145, 188, 344, 271~ ## $ protime 12.2, 11.0, 11.0, 10.5, 11.7, 10.6, 10.6, 10.6, 12.2, 11.8~ ## $ stage 4, 3, 3, 4, 4, 3, 2, 3, 4, 4, 3, 3, 3, 3, 2, 2, 3, 2, 3, 2~ ## $ time 400, 2466, 3584, 769, 264, 2847, 1847, 4039, 388, 750, 611~ ## $ status 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0~ ## $ pred_aorsf 0.95858574, 0.05070837, 0.22924369, 0.47048899, 0.94442624~ ## $ pred_rfsrc 0.85474291, 0.05480510, 0.31751746, 0.45156052, 0.85784657~ ## $ pred_ranger 0.77897322, 0.04599438, 0.30267242, 0.43467381, 0.77914735~ Score( object = list(aorsf = results$pred_aorsf, rfsrc = results$pred_rfsrc, ranger = results$pred_ranger), formula = Surv(time, status) ~ 1, data = results, summary = 'IPA', times = ph ) ## ## Metric AUC: ## ## Results by model: ## ## model times AUC lower upper ## 1: aorsf 1826 90.7 86.4 95.0 ## 2: rfsrc 1826 89.9 85.6 94.2 ## 3: ranger 1826 89.7 85.5 94.0 ## ## Results of model comparisons: ## ## times model reference delta.AUC lower upper p ## 1: 1826 rfsrc aorsf -0.8 -2.2 0.5 0.2 ## 2: 1826 ranger aorsf -1.0 -2.5 0.5 0.2 ## 3: 1826 ranger rfsrc -0.1 -1.3 1.0 0.8 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The higher AUC the better. ## ## Metric Brier: ## ## Results by model: ## ## model times Brier lower upper IPA ## 1: Null model 1826.25 20.5 18.1 22.9 0.0 ## 2: aorsf 1826.25 10.8 8.6 13.0 47.3 ## 3: rfsrc 1826.25 11.8 9.7 13.9 42.5 ## 4: ranger 1826.25 11.9 9.8 14.0 41.9 ## ## Results of model comparisons: ## ## times model reference delta.Brier lower upper p ## 1: 1826.25 aorsf Null model -9.7 -12.3 -7.1 2.179169e-13 ## 2: 1826.25 rfsrc Null model -8.7 -11.0 -6.4 3.915661e-14 ## 3: 1826.25 ranger Null model -8.6 -10.9 -6.2 8.869452e-13 ## 4: 1826.25 rfsrc aorsf 1.0 0.3 1.7 5.383491e-03 ## 5: 1826.25 ranger aorsf 1.1 0.5 1.7 7.890574e-04 ## 6: 1826.25 ranger rfsrc 0.1 -0.4 0.6 6.310608e-01 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The lower Brier the better, the higher IPA the better."},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"mlr-pipelines","dir":"Reference","previous_headings":"","what":"mlr3 pipelines","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Warning: code may may run depending current version mlr3proba. First load additional mlr3 libraries. Next ’ll define tasks learners engage . Now can make benchmark designed compare three favorite learners: Let’s look overall results: inspection, aorsf higher expected value ‘surv.cindex’ (higher better) aorsf lower expected value ‘surv.graf’ (lower better)","code":"suppressPackageStartupMessages({ library(mlr3verse) library(mlr3proba) library(mlr3extralearners) library(mlr3viz) library(mlr3benchmark) }) # Mayo Clinic Primary Biliary Cholangitis Data task_pbc <- TaskSurv$new( id = 'pbc', backend = select(pbc_orsf, -id) %>% mutate(stage = as.numeric(stage)), time = \"time\", event = \"status\" ) # Veteran's Administration Lung Cancer Trial data(veteran, package = \"randomForestSRC\") task_veteran <- TaskSurv$new( id = 'veteran', backend = veteran, time = \"time\", event = \"status\" ) # NKI 70 gene signature data_nki <- OpenML::getOMLDataSet(data.id = 1228) task_nki <- TaskSurv$new( id = 'nki', backend = data_nki$data, time = \"time\", event = \"event\" ) # Gene Expression-Based Survival Prediction in Lung Adenocarcinoma data_lung <- OpenML::getOMLDataSet(data.id = 1245) task_lung <- TaskSurv$new( id = 'nki', backend = data_lung$data %>% mutate(OS_event = as.numeric(OS_event) -1), time = \"OS_years\", event = \"OS_event\" ) # Chemotherapy for Stage B/C colon cancer # (there are two rows per person, one for death # and the other for recurrence, hence the two tasks) task_colon_death <- TaskSurv$new( id = 'colon_death', backend = survival::colon %>% filter(etype == 2) %>% drop_na() %>% # drop id, redundant variables select(-id, -study, -node4, -etype), mutate(OS_event = as.numeric(OS_event) -1), time = \"time\", event = \"status\" ) task_colon_recur <- TaskSurv$new( id = 'colon_death', backend = survival::colon %>% filter(etype == 1) %>% drop_na() %>% # drop id, redundant variables select(-id, -study, -node4, -etype), mutate(OS_event = as.numeric(OS_event) -1), time = \"time\", event = \"status\" ) # putting them all together tasks <- list(task_pbc, task_veteran, task_nki, task_lung, task_colon_death, task_colon_recur, # add a few more pre-made ones tsk(\"actg\"), tsk('gbcs'), tsk('grace'), tsk(\"unemployment\"), tsk(\"whas\")) # Learners with default parameters learners <- lrns(c(\"surv.ranger\", \"surv.rfsrc\", \"surv.aorsf\")) # Brier (Graf) score, c-index and training time as measures measures <- msrs(c(\"surv.graf\", \"surv.cindex\", \"time_train\")) # Benchmark with 5-fold CV design <- benchmark_grid( tasks = tasks, learners = learners, resamplings = rsmps(\"cv\", folds = 5) ) benchmark_result <- benchmark(design) bm_scores <- benchmark_result$score(measures, predict_sets = \"test\") bm_scores %>% select(task_id, learner_id, surv.graf, surv.cindex, time_train) %>% group_by(learner_id) %>% filter(!is.infinite(surv.graf)) %>% summarize( across( .cols = c(surv.graf, surv.cindex, time_train), .fns = ~mean(.x, na.rm = TRUE) ) ) ## # A tibble: 3 x 4 ## learner_id surv.graf surv.cindex time_train ## ## 1 surv.aorsf 0.150 0.734 0.383 ## 2 surv.ranger 0.164 0.716 2.04 ## 3 surv.rfsrc 0.155 0.724 0.757"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating Yield Medical Tests. JAMA 1982; 247(18):2543-2546. DOI: 10.1001/jama.1982.03320430047030 Breiman L. Random forests. Machine learning 2001 Oct; 45(1):5-32. DOI: 10.1023/:1010933404324 Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Annals applied statistics 2008 Sep; 2(3):841-60. DOI: 10.1214/08-AOAS169 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29 Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":null,"dir":"Reference","previous_headings":"","what":"Oblique random forest control — orsf_control","title":"Oblique random forest control — orsf_control","text":"Oblique random forest control","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Oblique random forest control — orsf_control","text":"","code":"orsf_control( tree_type, method, scale_x, ties, net_mix, target_df, max_iter, epsilon, ... ) orsf_control_classification( method = \"glm\", scale_x = TRUE, net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... ) orsf_control_regression( method = \"glm\", scale_x = TRUE, net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... ) orsf_control_survival( method = \"glm\", scale_x = TRUE, ties = \"efron\", net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Oblique random forest control — orsf_control","text":"tree_type (character) type tree. Valid options \"classification\", .e., categorical outcomes \"regression\", .e., continuous outcomes \"survival\", .e., time-event outcomes method (character function) identify linear linear combinations predictors. method character value, must one : 'glm': linear, logistic, cox regression 'net': 'glm' penalty terms 'pca': principal component analysis 'random': random draw uniform distribution method function, used identify linear combinations predictor variables. method must case accept three inputs named x_node, y_node w_node, expect following types dimensions: x_node (matrix; n rows, p columns) y_node (matrix; n rows, 2 columns) w_node (matrix; n rows, 1 column) addition, method must return matrix p rows 1 column. scale_x (logical) TRUE, values predictors scaled prior instance finding linear combination predictors, using summary values data current node decision tree. ties (character) character string specifying method tie handling. relevant modeling survival outcomes using method engages tied outcome times. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. net_mix (double) elastic net mixing parameter. value 1 gives lasso penalty, value 0 gives ridge penalty. multiple values alpha given, penalized model fit using alpha value prior splitting node. target_df (integer) Preferred number variables used linear combination. example, mtry 5 target_df 3, sample 5 predictors look best linear combination using 3 . max_iter (integer) iteration continues convergence (see eps ) number attempted iterations equal iter_max. epsilon (double) using modeling based method, iteration continues algorithm relative change kind objective less epsilon, absolute change less sqrt(epsilon). ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Oblique random forest control — orsf_control","text":"object class 'orsf_control', used input control argument orsf. Components : tree_type: type trees fit lincomb_type: method linear combinations lincomb_eps: epsilon convergence lincomb_iter_max: max iterations lincomb_scale: scale . lincomb_alpha: mixing parameter lincomb_df_target: target degrees freedom lincomb_ties_method: method ties survival time lincomb_R_function: R function custom splits","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Oblique random forest control — orsf_control","text":"Adjust scale_x risk. Setting scale_x = FALSE reduce computation time also make orsf model dependent scale data, default value TRUE.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cox regression ORSF control — orsf_control_cph","title":"Cox regression ORSF control — orsf_control_cph","text":"Use coefficients proportional hazards model create linear combinations predictor variables fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cox regression ORSF control — orsf_control_cph","text":"","code":"orsf_control_cph(method = \"efron\", eps = 1e-09, iter_max = 20, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cox regression ORSF control — orsf_control_cph","text":"method (character) character string specifying method tie handling. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. eps (double) using Newton Raphson scoring identify linear combinations inputs, iteration continues algorithm relative change log partial likelihood less eps, absolute change less sqrt(eps). Must positive. default value 1e-09 used consistency survival::coxph.control. iter_max (integer) iteration continues convergence (see eps ) number attempted iterations equal iter_max. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cox regression ORSF control — orsf_control_cph","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cox regression ORSF control — orsf_control_cph","text":"code survival package modified make routine. details Cox proportional hazards model, see coxph /Therneau Grambsch (2000).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Cox regression ORSF control — orsf_control_cph","text":"Therneau T.M., Grambsch P.M. (2000) Cox Model. : Modeling Survival Data: Extending Cox Model. Statistics Biology Health. Springer, New York, NY. DOI: 10.1007/978-1-4757-3294-8_3","code":""},{"path":[]},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Custom ORSF control — orsf_control_custom","text":"","code":"orsf_control_custom(beta_fun, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Custom ORSF control — orsf_control_custom","text":"beta_fun (function) function define coefficients used linear combinations predictor variables. beta_fun must accept three inputs named x_node, y_node w_node, expect following types dimensions: x_node (matrix; n rows, p columns) y_node (matrix; n rows, 2 columns) w_node (matrix; n rows, 1 column) addition, beta_fun must return matrix p rows 1 column. conditions met, orsf_control_custom() let know. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Custom ORSF control — orsf_control_custom","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Custom ORSF control — orsf_control_custom","text":"Two customized functions identify linear combinations predictors shown . first uses random coefficients second derives coefficients principal component analysis.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"random-coefficients","dir":"Reference","previous_headings":"","what":"Random coefficients","title":"Custom ORSF control — orsf_control_custom","text":"f_rando() function get random coefficients: can plug f_rando orsf_control_survival(), pass result orsf():","code":"f_rando <- function(x_node, y_node, w_node){ matrix(runif(ncol(x_node)), ncol=1) } library(aorsf) fit_rando <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_rando), n_tree = 500) fit_rando ## ---------- Oblique random survival forest ## ## Linear combinations: Custom user function ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 19.682 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.83 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"principal-components","dir":"Reference","previous_headings":"","what":"Principal components","title":"Custom ORSF control — orsf_control_custom","text":"Follow steps , starting custom function: plug function orsf_control_survival() pass result orsf():","code":"f_pca <- function(x_node, y_node, w_node) { # estimate two principal components. pca <- stats::prcomp(x_node, rank. = 2) # use the second principal component to split the node pca$rotation[, 2L, drop = FALSE] } fit_pca <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_pca), n_tree = 500)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"evaluate","dir":"Reference","previous_headings":"","what":"Evaluate","title":"Custom ORSF control — orsf_control_custom","text":"well two customized ORSFs ? Let’s compute indices prediction accuracy based --bag predictions: PCA ORSF quite well! (higher IPA better)","code":"library(riskRegression) ## riskRegression version 2023.09.08 library(survival) risk_preds <- list(rando = 1 - fit_rando$pred_oobag, pca = 1 - fit_pca$pred_oobag) sc <- Score(object = risk_preds, formula = Surv(time, status) ~ 1, data = pbc_orsf, summary = 'IPA', times = fit_pca$pred_horizon) sc$Brier ## ## Results by model: ## ## model times Brier lower upper IPA ## 1: Null model 1788 20.479 18.090 22.868 0.000 ## 2: rando 1788 11.872 9.771 13.972 42.031 ## 3: pca 1788 12.990 10.971 15.009 36.569 ## ## Results of model comparisons: ## ## times model reference delta.Brier lower upper p ## 1: 1788 rando Null model -8.607 -10.809 -6.406 1.832790e-14 ## 2: 1788 pca Null model -7.489 -9.213 -5.765 1.664802e-17 ## 3: 1788 pca rando 1.118 0.258 1.979 1.087482e-02 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The lower Brier the better, the higher IPA the better."},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":null,"dir":"Reference","previous_headings":"","what":"Accelerated ORSF control — orsf_control_fast","title":"Accelerated ORSF control — orsf_control_fast","text":"Fast methods identify linear combinations predictors fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Accelerated ORSF control — orsf_control_fast","text":"","code":"orsf_control_fast(method = \"efron\", do_scale = TRUE, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Accelerated ORSF control — orsf_control_fast","text":"method (character) character string specifying method tie handling. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. do_scale (logical) TRUE, values predictors scaled prior instance Newton Raphson scoring, using summary values data current node decision tree. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Accelerated ORSF control — orsf_control_fast","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Accelerated ORSF control — orsf_control_fast","text":"code survival package modified make routine. Adjust do_scale risk. Setting do_scale = FALSE reduce computation time also make orsf model dependent scale data, default value TRUE.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":null,"dir":"Reference","previous_headings":"","what":"Penalized Cox regression ORSF control — orsf_control_net","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"Use regularized Cox proportional hazard models identify linear combinations input variables fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"","code":"orsf_control_net(alpha = 1/2, df_target = NULL, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"alpha (double) elastic net mixing parameter. value 1 gives lasso penalty, value 0 gives ridge penalty. multiple values alpha given, penalized model fit using alpha value prior splitting node. df_target (integer) Preferred number variables used linear combination. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"df_target less mtry, separate argument orsf indicates number variables chosen random prior finding linear combination variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths Cox's proportional hazards model via coordinate descent. Journal statistical software 2011 Mar; 39(5):1. DOI: 10.18637/jss.v039.i05","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF Individual Conditional Expectations — orsf_ice_oob","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"Compute individual conditional expectations ORSF model. Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. can compute individual conditional expectations three ways using random forest: using -bag predictions training data using --bag predictions training data using predictions new set data See examples details","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"","code":"orsf_ice_oob( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... ) orsf_ice_inb( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... ) orsf_ice_new( object, pred_spec, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"object (orsf_fit) trained oblique random survival forest (see orsf). pred_spec (named list data.frame). pred_spec named list, item list vector values used points partial dependence function. name item list indicate variable modified take corresponding values. pred_spec data.frame, columns indicate variable names, values indicate variable values, partial dependence computed using inputs row. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction expand_grid (logical) TRUE, partial dependence computed possible combinations inputs pred_spec. FALSE, partial dependence computed variable pred_spec, separately. boundary_checks (logical) TRUE, pred_spec checked make sure requested values 10th 90th percentile object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. ... arguments passed methods (currently used). new_data data.frame, tibble, data.table compute predictions . na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'omit' : rows new_data incomplete data dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"data.table containing individual conditional expectations specified variable(s) specified prediction horizon(s).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"Begin fitting ORSF ensemble Use ensemble compute ICE values using --bag predictions: Much detailed examples given vignette","code":"library(aorsf) set.seed(329) fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 21.026 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## ----------------------------------------- pred_spec <- list(bili = seq(1, 10, length.out = 25)) ice_oob <- orsf_ice_oob(fit, pred_spec, boundary_checks = FALSE) ice_oob ## id_variable id_row pred_horizon bili pred ## 1: 1 1 1788 1 0.1264442 ## 2: 1 2 1788 1 0.1739727 ## 3: 1 3 1788 1 0.3904517 ## 4: 1 4 1788 1 0.2874752 ## 5: 1 5 1788 1 0.4398522 ## --- ## 6896: 25 272 1788 10 0.3076971 ## 6897: 25 273 1788 10 0.4942110 ## 6898: 25 274 1788 10 0.6407498 ## 6899: 25 275 1788 10 0.3871298 ## 6900: 25 276 1788 10 0.6479179"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF partial dependence — orsf_pd_oob","title":"ORSF partial dependence — orsf_pd_oob","text":"Compute partial dependence ORSF model. Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model's prediction. can compute partial dependence three ways using random forest: using -bag predictions training data using --bag predictions training data using predictions new set data See examples details","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF partial dependence — orsf_pd_oob","text":"","code":"orsf_pd_oob( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... ) orsf_pd_inb( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... ) orsf_pd_new( object, pred_spec, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF partial dependence — orsf_pd_oob","text":"object (orsf_fit) trained oblique random survival forest (see orsf). pred_spec (named list data.frame). pred_spec named list, item list vector values used points partial dependence function. name item list indicate variable modified take corresponding values. pred_spec data.frame, columns indicate variable names, values indicate variable values, partial dependence computed using inputs row. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction expand_grid (logical) TRUE, partial dependence computed possible combinations inputs pred_spec. FALSE, partial dependence computed variable pred_spec, separately. prob_values (numeric) vector values 0 1, indicating quantiles used summarize partial dependence values set inputs. prob_values length prob_labels. quantiles calculated based predictions object set values indicated pred_spec. prob_labels (character) vector labels length prob_values, label indicating corresponding value prob_values labelled summarized outputs. prob_labels length prob_values. boundary_checks (logical) TRUE, pred_spec checked make sure requested values 10th 90th percentile object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. ... arguments passed methods (currently used). new_data data.frame, tibble, data.table compute predictions . na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'omit' : rows new_data incomplete data dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF partial dependence — orsf_pd_oob","text":"data.table containing partial dependence values specified variable(s) specified prediction horizon(s).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF partial dependence — orsf_pd_oob","text":"Partial dependence number known limitations assumptions users aware (see Hooker, 2021). particular, partial dependence less intuitive >2 predictors examined jointly, assumed feature(s) partial dependence computed correlated features (likely true many cases). Accumulated local effect plots can used (see ) case feature independence valid assumption.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF partial dependence — orsf_pd_oob","text":"Begin fitting ORSF ensemble:","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"three-ways-to-compute-pd-and-ice","dir":"Reference","previous_headings":"","what":"Three ways to compute PD and ICE","title":"ORSF partial dependence — orsf_pd_oob","text":"can compute partial dependence ICE three ways aorsf: using -bag predictions training data using --bag predictions training data using predictions new set data -bag partial dependence indicates relationships model learned training. helpful goal interpret model. --bag partial dependence indicates relationships model learned training using --bag data simulates application model new data. want test model’s reliability fairness new data don’t access large testing set. new data partial dependence shows model predicts outcomes observations seen. helpful want test model’s reliability fairness.","code":"pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5)) pd_train ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7932390 0.2177461 0.9060625 0.9816153 ## 2: 1826.25 2 0.7642403 0.1988035 0.8717127 0.9710504 ## 3: 1826.25 3 0.7240284 0.1770122 0.8303501 0.9480047 ## 4: 1826.25 4 0.6744615 0.1615326 0.7599508 0.9088882 ## 5: 1826.25 5 0.6313355 0.1553589 0.7152580 0.8658139 pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_train ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7840481 0.2727537 0.8694252 0.9809905 ## 2: 1826.25 2 0.7549406 0.2525478 0.8333524 0.9693362 ## 3: 1826.25 3 0.7158234 0.2364582 0.7890158 0.9461864 ## 4: 1826.25 4 0.6656823 0.2260407 0.7158336 0.9151153 ## 5: 1826.25 5 0.6225353 0.2071656 0.6734005 0.8681677 pd_test <- orsf_pd_new(fit, new_data = pbc_orsf_test, pred_spec = list(bili = 1:5)) pd_test ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7524101 0.1868769 0.8121185 0.9803382 ## 2: 1826.25 2 0.7234050 0.1759562 0.7754099 0.9653244 ## 3: 1826.25 3 0.6816975 0.1581292 0.7224945 0.9403449 ## 4: 1826.25 4 0.6339907 0.1467816 0.6598026 0.9000773 ## 5: 1826.25 5 0.5911775 0.1387876 0.6186801 0.8504577"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"ORSF partial dependence — orsf_pd_oob","text":"Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires least One Model, Free Variable Importance. arXiv e-prints 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":null,"dir":"Reference","previous_headings":"","what":"Scale input data — orsf_scale_cph","title":"Scale input data — orsf_scale_cph","text":"functions exported users may access internal routines used scale inputs orsf_control_cph used.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Scale input data — orsf_scale_cph","text":"","code":"orsf_scale_cph(x_mat, w_vec = NULL) orsf_unscale_cph(x_mat)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Scale input data — orsf_scale_cph","text":"x_mat (numeric matrix) matrix values scaled unscaled. Note orsf_unscale_cph accept x_mat inputs attribute containing transform values, added automatically orsf_scale_cph. w_vec (numeric vector) optional vector weights. weights supplied (default), observations equally weighted. supplied, w_vec must length equal nrow(x_mat).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Scale input data — orsf_scale_cph","text":"scaled unscaled x_mat.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Scale input data — orsf_scale_cph","text":"data transformed first subtracting mean multiplying scale. inverse transform can completed using orsf_unscale_cph dividing column corresponding scale adding mean. values means scales stored attribute output returned orsf_scale_cph (see examples)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Scale input data — orsf_scale_cph","text":"","code":"x_mat <- as.matrix(pbc_orsf[, c('bili', 'age', 'protime')]) head(x_mat) #> bili age protime #> 1 14.5 58.76523 12.2 #> 2 1.1 56.44627 10.6 #> 3 1.4 70.07255 12.0 #> 4 1.8 54.74059 10.3 #> 5 3.4 38.10541 10.9 #> 7 1.0 55.53457 9.7 x_scaled <- orsf_scale_cph(x_mat) head(x_scaled) #> bili age protime #> [1,] 3.77308887 1.0412574 1.9694656 #> [2,] -0.75476469 0.7719344 -0.1822316 #> [3,] -0.65339483 2.3544852 1.7005035 #> [4,] -0.51823502 0.5738373 -0.5856748 #> [5,] 0.02240421 -1.3581657 0.2212116 #> [6,] -0.78855464 0.6660494 -1.3925613 attributes(x_scaled) # note the transforms attribute #> $dim #> [1] 276 3 #> #> $dimnames #> $dimnames[[1]] #> NULL #> #> $dimnames[[2]] #> [1] \"bili\" \"age\" \"protime\" #> #> #> $transforms #> mean scale #> [1,] 3.333696 0.3378995 #> [2,] 49.799661 0.1161396 #> [3,] 10.735507 1.3448108 #> x_unscaled <- orsf_unscale_cph(x_scaled) head(x_unscaled) #> bili age protime #> [1,] 14.5 58.76523 12.2 #> [2,] 1.1 56.44627 10.6 #> [3,] 1.4 70.07255 12.0 #> [4,] 1.8 54.74059 10.3 #> [5,] 3.4 38.10541 10.9 #> [6,] 1.0 55.53457 9.7 # numeric difference in x_mat and x_unscaled should be practically 0 max(abs(x_mat - x_unscaled)) #> [1] 3.552714e-15"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF summary; univariate — orsf_summarize_uni","title":"ORSF summary; univariate — orsf_summarize_uni","text":"Summarize univariate information ORSF object","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF summary; univariate — orsf_summarize_uni","text":"","code":"orsf_summarize_uni( object, n_variables = NULL, pred_horizon = NULL, pred_type = NULL, importance = NULL, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF summary; univariate — orsf_summarize_uni","text":"object (orsf_fit) trained oblique random survival forest (see orsf). n_variables (integer) many variables summarized? Setting input lower number reduce computation time. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction importance (character) Indicate method variable importance: 'none': variable importance computed. 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance details methods, see orsf_vi. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF summary; univariate — orsf_summarize_uni","text":"object class 'orsf_summary', includes data importance individual predictors. expected values predictions specific values predictors.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF summary; univariate — orsf_summarize_uni","text":"pred_horizon left unspecified, median value time--event variable object's training data used. recommended always specify prediction horizon, median time may especially meaningful horizon compute predicted risk values . object already variable importance values, can safely bypass computation variable importance function setting importance = 'none'.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF summary; univariate — orsf_summarize_uni","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 25) # since anova importance was used to make object, it is also # used for ranking variables in the summary, unless we specify # a different type of importance orsf_summarize_uni(object, n_variables = 3) #> #> -- ascites (VI Rank: 1) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6983709 0.7910345 0.5139682 0.9597133 #> 1 0.5305812 0.5669828 0.3465873 0.7291136 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7639857 0.8330195 0.6562526 0.9672986 #> 1.40 0.7457920 0.8115713 0.6375998 0.9532067 #> 3.52 0.6452987 0.7032310 0.4779466 0.8497936 #> #> -- copper (VI Rank: 3) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 42.8 0.7320214 0.8202788 0.5875294 0.9642543 #> 74.0 0.7172284 0.8027461 0.5795696 0.9589637 #> 129 0.6719926 0.7326571 0.5150342 0.9232044 #> #> Predicted survival at time t = 1788 for top 3 predictors # if we want to summarize object according to variables # ranked by negation importance, we can compute negation # importance within orsf_summarize_uni() as follows: orsf_summarize_uni(object, n_variables = 3, importance = 'negate') #> #> -- bili (VI Rank: 1) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7639857 0.8330195 0.6562526 0.9672986 #> 1.40 0.7457920 0.8115713 0.6375998 0.9532067 #> 3.52 0.6452987 0.7032310 0.4779466 0.8497936 #> #> -- copper (VI Rank: 2) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 42.8 0.7320214 0.8202788 0.5875294 0.9642543 #> 74.0 0.7172284 0.8027461 0.5795696 0.9589637 #> 129 0.6719926 0.7326571 0.5150342 0.9232044 #> #> -- sex (VI Rank: 3) --------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> m 0.6473382 0.7232680 0.4510599 0.8898719 #> f 0.6949818 0.7990302 0.5011564 0.9597133 #> #> Predicted survival at time t = 1788 for top 3 predictors"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate training time — orsf_time_to_train","title":"Estimate training time — orsf_time_to_train","text":"Estimate training time","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate training time — orsf_time_to_train","text":"","code":"orsf_time_to_train(object, n_tree_subset = 50)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate training time — orsf_time_to_train","text":"object untrained aorsf object n_tree_subset (integer) many trees fit order estimate time needed train object. default value 50, usually gives good enough approximation.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate training time — orsf_time_to_train","text":"difftime object.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate training time — orsf_time_to_train","text":"","code":"# specify but do not train the model by setting no_fit = TRUE. object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 500, no_fit = TRUE) # grow 50 trees to approximate the time it will take to grow 500 trees time_estimated <- orsf_time_to_train(object, n_tree_subset = 50) print(time_estimated) #> Time difference of 0.1945901 secs # let's see how close the approximation was time_true_start <- Sys.time() fit <- orsf_train(object) time_true_stop <- Sys.time() time_true <- time_true_stop - time_true_start print(time_true) #> Time difference of 0.2257481 secs # error abs(time_true - time_estimated) #> Time difference of 0.03115797 secs"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF variable importance — orsf_vi","title":"ORSF variable importance — orsf_vi","text":"Estimate importance individual variables using oblique random survival forests.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF variable importance — orsf_vi","text":"","code":"orsf_vi( object, group_factors = TRUE, importance = NULL, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_negate( object, group_factors = TRUE, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_permute( object, group_factors = TRUE, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_anova(object, group_factors = TRUE, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF variable importance — orsf_vi","text":"object (orsf_fit) trained oblique random survival forest (see orsf). group_factors (logical) TRUE, importance factor variables reported overall aggregating importance individual levels factor. FALSE, importance individual factor levels returned. importance (character) Indicate method variable importance: 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance oobag_fun (function) used evaluating --bag prediction accuracy negating coefficients (importance = 'negate') permuting values predictor (importance = 'permute') oobag_fun = NULL (default), Harrell's C-statistic (1982) used evaluate accuracy. use oobag_fun note following: oobag_fun two inputs: y_mat s_vec y_mat two column matrix first column named 'time', second named 'status' s_vec numeric vector containing predicted survival probabilities. oobag_fun return numeric output length 1 oobag_fun used created object initial value --bag prediction accuracy consistent values computed variable importance estimated. details, see --bag vignette. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF variable importance — orsf_vi","text":"orsf_vi functions return named numeric vector. Names vector predictor variables used object Values vector estimated importance given predictor. returned vector sorted highest lowest value, higher values indicating higher importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF variable importance — orsf_vi","text":"orsf_fit object fitted importance = 'anova', 'negate', 'permute', output vector importance values based requested type importance. However, may still want call orsf_vi() output want group factor levels one overall importance value. orsf_vi() general purpose function extract compute variable importance estimates 'orsf_fit' object (see orsf). orsf_vi_negate(), orsf_vi_permute(), orsf_vi_anova() wrappers orsf_vi(). way functions work depends whether object given already variable importance estimates (see examples).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"variable-importance-methods","dir":"Reference","previous_headings":"","what":"Variable importance methods","title":"ORSF variable importance — orsf_vi","text":"negation importance: variable assessed separately multiplying variable's coefficients -1 determining much model's performance changes. worse model's performance negating coefficients given variable, important variable. technique promising b/c require permutation emphasizes variables larger coefficients linear combinations, also relatively new studied much permutation importance. See Jaeger, (2023) details technique. permutation importance: variable assessed separately randomly permuting variable's values determining much model's performance changes. worse model's performance permuting values given variable, important variable. technique flexible, intuitive, frequently used. also several known limitations analysis variance (ANOVA) importance: p-value computed coefficient linear combination variables decision tree. Importance individual predictor variable proportion times p-value coefficient < 0.01. technique efficient computationally, may effective permutation negation terms selecting signal noise variables. See Menze, 2011 details technique.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"anova-importance","dir":"Reference","previous_headings":"","what":"ANOVA importance","title":"ORSF variable importance — orsf_vi","text":"default variable importance technique, ANOVA, calculated fit ORSF ensemble. ANOVA default fast, may decisive permutation negation techniques variable selection.","code":"fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 21.114 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"raw-vi-values","dir":"Reference","previous_headings":"","what":"Raw VI values","title":"ORSF variable importance — orsf_vi","text":"‘raw’ variable importance values can accessed fit object ‘raw’ values factors aggregated single value. Currently one value k-1 levels k level factor. example, can see edema_1 edema_0.5 importance values edema factor variable levels 0, 0.5, 1.","code":"attr(fit, 'importance_values') ## NULL"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"collapse-vi-across-factor-levels","dir":"Reference","previous_headings":"","what":"Collapse VI across factor levels","title":"ORSF variable importance — orsf_vi","text":"get aggregated values across levels factor, access importance element orsf fit: use orsf_vi() group_factors set TRUE (default) Note can make default returned importance values ungrouped setting group_factors FALSE orsf_vi functions orsf function.","code":"fit$importance ## ascites bili edema copper age albumin protime ## 0.49306931 0.40601166 0.30339953 0.26839623 0.26685796 0.25804901 0.22825024 ## chol stage ast spiders hepato sex trig ## 0.19290780 0.18941048 0.17435648 0.16943522 0.15678670 0.13905325 0.12965799 ## alk.phos platelet trt ## 0.11338661 0.09012876 0.06778770 orsf_vi(fit) ## ascites bili edema copper age albumin protime ## 0.49306931 0.40601166 0.30339953 0.26839623 0.26685796 0.25804901 0.22825024 ## chol stage ast spiders hepato sex trig ## 0.19290780 0.18941048 0.17435648 0.16943522 0.15678670 0.13905325 0.12965799 ## alk.phos platelet trt ## 0.11338661 0.09012876 0.06778770"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"add-vi-to-an-orsf","dir":"Reference","previous_headings":"","what":"Add VI to an ORSF","title":"ORSF variable importance — orsf_vi","text":"can fit ORSF without VI, add VI later","code":"fit_no_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'none') # Note: you can't call orsf_vi_anova() on fit_no_vi because anova # VI can only be computed while the forest is being grown. orsf_vi_negate(fit_no_vi) ## bili copper sex protime albumin age ## 0.122344895 0.047850279 0.035986359 0.023711711 0.021831451 0.021503160 ## stage ascites chol ast spiders hepato ## 0.019718835 0.012550534 0.011115307 0.009845811 0.007601474 0.007055077 ## edema trt alk.phos trig platelet ## 0.006411580 0.003666224 0.002388178 0.001156845 -0.001214167 orsf_vi_permute(fit_no_vi) ## bili copper protime age ascites ## 5.513908e-02 2.181846e-02 1.246900e-02 1.192659e-02 1.176139e-02 ## albumin stage chol spiders edema ## 1.175554e-02 9.479348e-03 6.215674e-03 5.752179e-03 4.960035e-03 ## ast hepato sex trig alk.phos ## 4.647971e-03 3.594325e-03 2.477936e-03 1.162558e-03 6.778008e-05 ## platelet trt ## -1.132546e-03 -1.376816e-03"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"orsf-and-vi-all-at-once","dir":"Reference","previous_headings":"","what":"ORSF and VI all at once","title":"ORSF variable importance — orsf_vi","text":"fit ORSF compute vi time can still get negation VI fit, needs computed","code":"fit_permute_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'permute') # get the vi instantly (i.e., it doesn't need to be computed again) orsf_vi_permute(fit_permute_vi) ## bili copper protime albumin age ## 0.0582460338 0.0255992039 0.0130100780 0.0129532316 0.0121027391 ## ascites stage chol ast edema ## 0.0119289124 0.0084185175 0.0071302967 0.0053592731 0.0051471990 ## spiders hepato sex trig alk.phos ## 0.0046418826 0.0036776097 0.0026334550 0.0024978806 0.0013078222 ## platelet trt ## 0.0003504423 -0.0013892173 orsf_vi_negate(fit_permute_vi) ## bili copper sex protime age albumin ## 0.1259391254 0.0507141085 0.0363834330 0.0235136073 0.0233592840 0.0225371677 ## stage chol ascites ast spiders edema ## 0.0211978251 0.0141956334 0.0141890702 0.0108977272 0.0073762768 0.0070333453 ## hepato alk.phos trig trt platelet ## 0.0050661672 0.0048879157 0.0044980321 0.0039418881 0.0007189274"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"ORSF variable importance — orsf_vi","text":"Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating Yield Medical Tests. JAMA 1982; 247(18):2543-2546. DOI: 10.1001/jama.1982.03320430047030 Breiman L. Random forests. Machine learning 2001 Oct; 45(1):5-32. DOI: 10.1023/:1010933404324 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":null,"dir":"Reference","previous_headings":"","what":"Variable selection — orsf_vs","title":"Variable selection — orsf_vs","text":"Variable selection","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Variable selection — orsf_vs","text":"","code":"orsf_vs(object, n_predictor_min = 3, verbose_progress = FALSE)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Variable selection — orsf_vs","text":"object (orsf_fit) trained oblique random survival forest (see orsf). n_predictor_min (integer) minimum number predictors allowed verbose_progress (logical) implemented yet. progress printed console?","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Variable selection — orsf_vs","text":"data.table four columns: n_predictors: number predictors used stat_value: --bag statistic predictors_included: names predictors included predictor_dropped: predictor selected dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Variable selection — orsf_vs","text":"tree_seeds specified object successive run orsf evaluated --bag samples initial run.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Variable selection — orsf_vs","text":"","code":"object <- orsf(formula = time + status ~ ., data = pbc_orsf, n_tree = 25, importance = 'anova') orsf_vs(object, n_predictor_min = 15) #> n_predictors stat_value predictors_included #> 1: 15 0.8409292 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... #> 2: 16 0.8290536 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... #> 3: 17 0.8334809 id,age,sex_f,ascites_1,hepato_1,spiders_1,... #> 4: 18 0.8315537 id,trt_placebo,age,sex_f,ascites_1,hepato_1,... #> predictor_dropped #> 1: alk.phos #> 2: platelet #> 3: id #> 4: trt_placebo"},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"data light modification survival::pbc data. modifications :","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"","code":"pbc_orsf"},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"data frame 276 rows 20 variables: id case number time number days registration earlier death, transplantion, study analysis July, 1986 status status endpoint, 0 censored transplant, 1 dead trt randomized treatment group: D-penicillmain placebo age years sex m/f ascites presence ascites hepato presence hepatomegaly enlarged liver spiders blood vessel malformations skin edema 0 edema, 0.5 untreated successfully treated, 1 edema despite diuretic therapy bili serum bilirubin (mg/dl) chol serum cholesterol (mg/dl) albumin serum albumin (g/dl) copper urine copper (ug/day) alk.phos alkaline phosphotase (U/liter) ast aspartate aminotransferase, called SGOT (U/ml) trig triglycerides (mg/dl) platelet platelet count protime standardized blood clotting time stage histologic stage disease (needs biopsy)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"T Therneau P Grambsch (2000), Modeling Survival Data: Extending Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"removed rows missing data converted status 0 censor transplant, 1 dead converted stage ordered factor. converted trt, ascites, hepato, spiders, edema factors.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"data copied lightly modified penguins data palmerpenguins R package. modification removal rows missing data. data include measurements penguin species, island Palmer Archipelago, size (flipper length, body mass, bill dimensions), sex.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"","code":"penguins_orsf"},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"tibble 333 rows 8 variables: species factor denoting penguin species (Adélie, Chinstrap Gentoo) island factor denoting island Palmer Archipelago, Antarctica (Biscoe, Dream Torgersen) bill_length_mm number denoting bill length (millimeters) bill_depth_mm number denoting bill depth (millimeters) flipper_length_mm integer denoting flipper length (millimeters) body_mass_g integer denoting body mass (grams) sex factor denoting penguin sex (female, male) year integer denoting study year (2007, 2008, 2009)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"Adélie penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Adélie penguins (Pygoscelis adeliae) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f Gentoo penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Gentoo penguin (Pygoscelis papua) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689 Chinstrap penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Chinstrap penguin (Pygoscelis antarcticus) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative. doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e Originally published : Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism Environmental Variability within Community Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute predictions using ORSF — predict.ObliqueForest","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"Predicted risk, survival, hazard, mortality ORSF model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"","code":"# S3 method for ObliqueForest predict( object, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", boundary_checks = TRUE, n_thread = 1, verbose_progress = FALSE, pred_aggregate = TRUE, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"object (orsf_fit) trained oblique random survival forest (see orsf). new_data data.frame, tibble, data.table compute predictions . pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'pass' : output NA rows new_data 1 NA value predictors used object 'omit' : rows new_data incomplete data dropped 'impute_meanmode' : missing values continuous categorical variables new_data imputed using mean mode, respectively. clarify, mean mode used impute missing values training data object, new_data. boundary_checks (logical) TRUE, pred_horizon checked make sure requested values less maximum observed time object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. pred_aggregate (logical) TRUE (default), predictions aggregated trees taking mean. FALSE, returned output contain one row per observation one column tree. length pred_horizon two pred_aggregate FALSE, result list matrices, 'th item list corresponding 'th value pred_horizon. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"matrix predictions. Column j matrix corresponds value j pred_horizon. Row matrix corresponds row new_data.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"new_data must columns equivalent types data used train object. Also, factors new_data must levels data used train object. pred_horizon values exceed maximum follow-time object's training data, truly want , set boundary_checks = FALSE can use pred_horizon large want. Note predictions beyond maximum follow-time object's training data equal predictions maximum follow-time, aorsf estimate survival beyond maximum observed time. unspecified, pred_horizon may automatically specified value used oobag_pred_horizon object created (see orsf).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"Begin fitting ORSF ensemble: Predict risk, survival, cumulative hazard one several times: Predict mortality, defined number events forest’s population observations characteristics like current observation. type prediction require specify prediction horizon","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5) # predicted risk, the default predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'risk', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.45965512 0.73309199 0.89715078 ## [2,] 0.03235764 0.09091330 0.18045864 ## [3,] 0.12091603 0.25919883 0.39403239 ## [4,] 0.01488893 0.03745896 0.07571412 ## [5,] 0.01279842 0.02623832 0.06015808 # predicted survival, i.e., 1 - risk predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'surv', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.5403449 0.2669080 0.1028492 ## [2,] 0.9676424 0.9090867 0.8195414 ## [3,] 0.8790840 0.7408012 0.6059676 ## [4,] 0.9851111 0.9625410 0.9242859 ## [5,] 0.9872016 0.9737617 0.9398419 # predicted cumulative hazard function # (expected number of events for person i at time j) predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'chf', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.65381651 1.28606246 1.75476570 ## [2,] 0.03531788 0.10967272 0.24697387 ## [3,] 0.15371784 0.36989220 0.65462524 ## [4,] 0.01549537 0.04229610 0.09352493 ## [5,] 0.01290261 0.02687956 0.06916273 predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'mort') ## [,1] ## [1,] 79.795533 ## [2,] 22.393743 ## [3,] 38.749709 ## [4,] 13.552788 ## [5,] 9.984989"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect your ORSF model — print.ObliqueForest","title":"Inspect your ORSF model — print.ObliqueForest","text":"Printing ORSF model tells : Linear combinations: identified? N observations: Number rows training data N events: Number events training data N trees: Number trees forest N predictors total: Total number columns predictor matrix N predictors per node: Number variables used linear combinations Average leaves per tree: proxy depth trees Min observations leaf: See leaf_min_obs orsf Min events leaf: See leaf_min_events orsf OOB stat value: --bag error fitting trees OOB stat type: --bag error computed? Variable importance: variable importance computed?","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect your ORSF model — print.ObliqueForest","text":"","code":"# S3 method for ObliqueForest print(x, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect your ORSF model — print.ObliqueForest","text":"x (orsf_fit) oblique random survival forest (ORSF; see orsf). ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect your ORSF model — print.ObliqueForest","text":"x, invisibly.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect your ORSF model — print.ObliqueForest","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 5) print(object) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 20.2 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.77 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"Print ORSF summary — print.orsf_summary_uni","title":"Print ORSF summary — print.orsf_summary_uni","text":"Print ORSF summary","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print ORSF summary — print.orsf_summary_uni","text":"","code":"# S3 method for orsf_summary_uni print(x, n_variables = NULL, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print ORSF summary — print.orsf_summary_uni","text":"x object class 'orsf_summary' n_variables number variables print ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Print ORSF summary — print.orsf_summary_uni","text":"invisibly, x","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Print ORSF summary — print.orsf_summary_uni","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id) smry <- orsf_summarize_uni(object, n_variables = 3) print(smry) #> #> -- ascites (VI Rank: 1) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6917550 0.8087376 0.4765359 0.9345688 #> 1 0.5285449 0.5892286 0.3475842 0.7204037 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7633098 0.8440832 0.6344838 0.9373619 #> 1.40 0.7417346 0.8219937 0.6015828 0.9208078 #> 3.52 0.6270205 0.6849291 0.4633733 0.8186574 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6972450 0.8146059 0.4781274 0.9387859 #> 0.5 0.5935130 0.6384339 0.3769381 0.8029641 #> 1 0.5909165 0.6272369 0.3947232 0.8189532 #> #> Predicted survival at time t = 1788 for top 3 predictors"},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-012-unreleased","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.2 (unreleased)","title":"aorsf 0.1.2 (unreleased)","text":"Added orsf_control functions classification, regression, survival (https://github.com/ropensci/aorsf/pull/25). optimization implemented matrix multiplication prediction (https://github.com/ropensci/aorsf/pull/20)","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-011","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.1","title":"aorsf 0.1.1","text":"CRAN release: 2023-10-26 fixed uninitialized value pd_type","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-010","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.0","title":"aorsf 0.1.0","text":"CRAN release: 2023-10-13 Re-worked internal C++ routines following design ranger. Re-worked progress printed console verbose_progress TRUE, following design ranger. Messages now indicate action taken, % complete, approximate time finishing action. Improved variable importance, following design ranger. Importance now computed tree--tree instead aggregate. Additionally, mortality type prediction used importance survival trees, since mortality depend pred_horizon. Allowed multi-threading performed orsf(), predict.orsf_fit(), functions orsf_vi() orsf_pd() family. Allowed sampling without replacement sampling specific fraction observations orsf() Included Harrell’s C-statistic option assessing goodness splits growing trees. Fixed issue uninformative error message occur pred_horizon > max(time) orsf_summarize_uni. Thanks @JyHao1 @DustinMLong finding !","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-007","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.7","title":"aorsf 0.0.7","text":"CRAN release: 2023-01-12 Additional changes internal testing avoid problems ATLAS","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-006","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.6","title":"aorsf 0.0.6","text":"CRAN release: 2023-01-06 Minor fix internal tests failing run ATLAS","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-005","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.5","title":"aorsf 0.0.5","text":"CRAN release: 2022-12-14 orsf() longer throws errors warnings try give single predictor. note added documentation details ?orsf explains using single predictor orsf() somewhat useless. done resolve https://github.com/mlr-org/mlr3extralearners/issues/259. predict.orsf_fit now accepts pred_horizon = 0 returns sensible values. Thanks @mattwarkentin feature request. added function perform variable selection, orsf_vs(). Made variable importance consistent respect group_factors. Originally, output orsf ungrouped VI values orsf_vi grouped values. update, orsf defaults grouped values. ungrouped values can still recovered. Fixed issue orsf_pd functions output data returned original scale.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-004","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.4","title":"aorsf 0.0.4","text":"CRAN release: 2022-11-07 orsf formulas now accepts Surv objects (see https://github.com/ropensci/aorsf/issues/11) Added verbose_progress input orsf, prints messages console indicating progress. Allowance missing values orsf. Mean mode imputation performed observations missing data. values can also used impute new data missing values. Centering scaling predictors now done prior growing forest.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-003","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.3","title":"aorsf 0.0.3","text":"CRAN release: 2022-10-09 Included rOpenSci reviewers Christopher Jackson, Marvin N Wright, Lukas Burk DESCRIPTION reviewers. Thank ! Added clarification docs pros/cons different variable importance techniques Added regression tests aorsf versus obliqueRSF (similar) Additional support tests functions long right hand sides Updated --bag vignette appropriate custom functions. Allow status values input data general, .e., just 0 1. Allow missing values predict functions, including partial dependence.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-002","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.2","title":"aorsf 0.0.2","text":"CRAN release: 2022-09-05 Modified unit tests compatibility extra checks run CRAN.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-001","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.1","title":"aorsf 0.0.1","text":"CRAN release: 2022-08-23 Added orsf_control_custom(), allows users submit custom functions identifying linear combinations inputs growing oblique decision trees. Added weights input orsf, allowing users fit orsf specific data training set. Added chf mort options predict.orsf_fit(). Mortality predictions fully implemented yet - supported partial dependence --bag error estimates. features added future update.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-0009000","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.0.9000","title":"aorsf 0.0.0.9000","text":"Core features implemented: fit, interpret, predict using oblique random survival forests. Vignettes + Readme covering usage core features. Website hosted GitHub pages, managed pkgdown.","code":""}] +[{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to aorsf","title":"Contributing to aorsf","text":"Want contribute aorsf? Great! aorsf initially stable state development, great deal active subsequent development envisioned. outline propose change aorsf. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to aorsf","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to aorsf","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to aorsf","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"ropensci/aorsf\", fork = TRUE). Install development dependencies devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to aorsf","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to aorsf","text":"Please note aorsf project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://bcjaeger.github.io/aorsf/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 aorsf authors (Byron C. Jaeger, Sawyer Welden, Nicholas M. Pajewski) Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"background","dir":"Articles","previous_headings":"","what":"Background","title":"Introduction to aorsf","text":"oblique random forest (RF) extension traditional (axis-based) RF. Instead using single variable split data grow new branches, trees oblique RF use weighted combination multiple variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"oblique-rfs-for-survival-classification-and-regression","dir":"Articles","previous_headings":"","what":"Oblique RFs for survival, classification, and regression","title":"Introduction to aorsf","text":"purpose aorsf (‘’ short accelerated) provide unifying framework fit oblique RFs can scale adequately large data sets. fastest algorithms available package used default often equivalent prediction accuracy computational approaches. Everything aorsf begins orsf() function. begin oblique RF survival using pbc_orsf data, oblique RF classification using penguins_orsf data, FILL REGRESSION. Note n_tree 5 convenience examples, >= 500 practice. may notice first input aorsf data. design choice makes easier use orsf pipes (.e., %>% |>). instance,","code":"library(aorsf) # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 19.8 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.76 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> ----------------------------------------- # An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, n_tree = 5, formula = species ~ .) penguin_fit #> ---------- Oblique random classification forest #> #> Linear combinations: Accelerated Logistic regression #> N observations: 333 #> N classes: 3 #> N trees: 5 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 6 #> Min observations in leaf: 5 #> OOB stat value: 0.99 #> OOB stat type: AUC-ROC #> Variable importance: anova #> #> ----------------------------------------- # An oblique regression RF cars_fit <- orsf(data = mtcars, n_tree = 5, formula = mpg ~ .) cars_fit #> ---------- Oblique random regression forest #> #> Linear combinations: Accelerated Linear regression #> N observations: 32 #> N trees: 5 #> N predictors total: 10 #> N predictors per node: 4 #> Average leaves per tree: 4.8 #> Min observations in leaf: 5 #> OOB stat value: 0.75 #> OOB stat type: RSQ #> Variable importance: anova #> #> ----------------------------------------- library(dplyr) pbc_fit <- pbc_orsf |> select(-id) |> orsf(formula = Surv(time, status) ~ ., n_tree = 5)"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"interpretation","dir":"Articles","previous_headings":"","what":"Interpretation","title":"Introduction to aorsf","text":"aorsf includes several functions dedicated interpretation ORSFs, estimation partial dependence variable importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"variable-importance","dir":"Articles","previous_headings":"Interpretation","what":"Variable importance","title":"Introduction to aorsf","text":"multiple methods compute variable importance. compute negation importance, ORSF multiplies coefficient variable -1 re-computes --sample (sometimes referred --bag) accuracy ORSF model. can also compute variable importance using permutation, classical approach noises predictor assigned resulting degradation prediction accuracy importance predictor. faster alternative permutation negation importance ANOVA importance, computes proportion times variable obtains low p-value (p < 0.01) forest grown.","code":"orsf_vi_negate(pbc_fit) #> bili copper age protime spiders #> 0.1168221744 0.0640918012 0.0318717527 0.0295703184 0.0199482278 #> ascites stage trig ast hepato #> 0.0145030496 0.0138362817 0.0093934850 0.0081600305 0.0081045745 #> edema albumin trt chol platelet #> 0.0074171879 0.0070565813 0.0049965458 0.0043845830 0.0007886543 #> sex alk.phos #> -0.0023614972 -0.0040932561 orsf_vi_permute(pbc_fit) #> bili copper age ascites albumin #> 0.0681612536 0.0264039589 0.0154990015 0.0145135549 0.0128863883 #> ast spiders stage edema protime #> 0.0112889819 0.0042083643 0.0036260906 0.0031464934 0.0029252926 #> trt platelet sex hepato chol #> -0.0002451595 -0.0002523982 -0.0005419264 -0.0010103185 -0.0012341940 #> alk.phos trig #> -0.0033725370 -0.0039837212 orsf_vi_anova(pbc_fit) #> ascites copper albumin bili edema age hepato #> 0.50000000 0.41176471 0.35294118 0.35294118 0.29417989 0.26315789 0.23529412 #> spiders protime chol stage alk.phos ast platelet #> 0.21428571 0.21052632 0.16666667 0.13333333 0.06250000 0.05263158 0.04545455 #> trig sex trt #> 0.04545455 0.00000000 0.00000000"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"Interpretation","what":"Partial dependence (PD)","title":"Introduction to aorsf","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. PD, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"Interpretation","what":"Individual conditional expectations (ICE)","title":"Introduction to aorsf","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"what-about-the-original-orsf","dir":"Articles","previous_headings":"","what":"What about the original ORSF?","title":"Introduction to aorsf","text":"original ORSF (.e., obliqueRSF) used glmnet find linear combinations inputs. aorsf allows users implement approach using orsf_control_survival(method = 'net') function: net forests fit lot faster original ORSF function obliqueRSF. However, net forests still much slower cph ones.","code":"orsf_net <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, control = orsf_control_survival(method = 'net'))"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"aorsf-and-other-machine-learning-software","dir":"Articles","previous_headings":"","what":"aorsf and other machine learning software","title":"Introduction to aorsf","text":"unique feature aorsf fast algorithms fit ORSF ensembles. RLT obliqueRSF fit oblique random survival forests, aorsf faster. ranger randomForestSRC fit survival forests, neither package supports oblique splitting. obliqueRF fits oblique random forests classification regression, survival. PPforest fits oblique random forests classification survival. Note: default prediction behavior aorsf models produce predicted risk specific prediction horizon, default ranger randomForestSRC. think change future, computing time independent predictions aorsf helpful.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"learning-more","dir":"Articles","previous_headings":"","what":"Learning more","title":"Introduction to aorsf","text":"aorsf began dedicated package oblique random survival forests, papers published far focused survival analysis risk prediction. However, routines regression classification oblique RFs aorsf high overlap survival ones. See orsf details oblique random survival forests. see JCGS paper details algorithms used specifically aorsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"go-faster","dir":"Articles","previous_headings":"","what":"Go faster","title":"Tips to speed up computation","text":"Analyses can slow crawl models need hours run. article find tricks prevent bottleneck using orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-specify-a-control","dir":"Articles","previous_headings":"","what":"Don’t specify a control","title":"Tips to speed up computation","text":"default control orsf() NULL , unspecified, orsf() pick fastest possible control depending type forest grown. default control run-time compared approaches can striking. example:","code":"time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 5) ) # control_fast() is much faster time_net['elapsed'] / time_fast['elapsed'] #> elapsed #> 53.4"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"use-n_thread","dir":"Articles","previous_headings":"","what":"Use n_thread","title":"Tips to speed up computation","text":"n_thread argument uses multi-threading run aorsf functions parallel possible. know many threads want, e.g. want exactly 5, just say n_thread = 5. aren’t sure many threads available want use many can, say n_thread = 0 aorsf figure number . R single threaded language, multi-threading applied orsf() needs call R functions C++, occurs customized R function used find linear combination variables compute prediction accuracy.","code":"# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 21.6 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.78 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"do-less","dir":"Articles","previous_headings":"","what":"Do less","title":"Tips to speed up computation","text":"defaults orsf() can adjusted make run faster: set n_retry 0 instead 3 (default) set oobag_pred_type ‘none’ instead ‘surv’ (default) set ‘importance’ ‘none’ instead ‘anova’ (default) increase split_min_events, split_min_obs, leaf_min_events, leaf_min_obs make trees stop growing sooner increase split_min_stat make trees stop growing sooner Applying tips: default values make orsf() run slower, also usually make predictions accurate make fit easier interpret.","code":"orsf(pbc_orsf, formula = time+status~., n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 18 #> N predictors per node: 5 #> Average leaves per tree: 6.8 #> Min observations in leaf: 5 #> Min events in leaf: 10 #> OOB stat value: none #> OOB stat type: none #> Variable importance: none #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"show-progress","dir":"Articles","previous_headings":"","what":"Show progress","title":"Tips to speed up computation","text":"Setting verbose_progress = TRUE doesn’t make anything run faster, can help make feel like things running less slow.","code":"verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE) #> Growing trees: 100%. #> Computing predictions: 100%."},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-data","dir":"Articles","previous_headings":"","what":"Out-of-bag data","title":"Out-of-bag predictions and evaluation","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. ‘--bag’ data instances bootstrapped training set.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-predictions-and-error","dir":"Articles","previous_headings":"","what":"Out-of-bag predictions and error","title":"Out-of-bag predictions and evaluation","text":"tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. --bag prediction error plays central role routines estimate variable importance, e.g. negation importance. Let’s fit oblique random survival forest plot distribution ensemble --bag predictions. surprisingly, survival predictions 0 1. Next, let’s check --bag accuracy fit: --bag estimate 1 (default method evaluate --bag predictions) 0.7923697.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, oobag_pred_type = 'surv', n_tree = 5, oobag_pred_horizon = 2000) hist(fit$pred_oobag, main = 'Ensemble out-of-bag survival predictions at t=3,500') # what function is used to evaluate out-of-bag predictions? fit$eval_oobag$stat_type #> [1] 1 # what is the output from this function? fit$eval_oobag$stat_values #> [,1] #> [1,] 0.7923697"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"monitoring-out-of-bag-error","dir":"Articles","previous_headings":"","what":"Monitoring out-of-bag error","title":"Out-of-bag predictions and evaluation","text":"--bag data set contains one-third training set, --bag error estimate usually converges stable value trees added forest. want monitor convergence --bag error oblique random survival forest, can set oobag_eval_every compute --bag error every oobag_eval_every tree. example, let’s compute --bag error fitting tree forest 50 trees: general, least 500 trees recommended random forest fit. ’re just using 10 illustration.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_type = 'surv', oobag_pred_horizon = 2000, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag C-statistic computed after each new tree is grown.', xlab = 'Number of trees grown', ylab = fit$eval_oobag$stat_type ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"user-supplied-out-of-bag-evaluation-functions","dir":"Articles","previous_headings":"","what":"User-supplied out-of-bag evaluation functions","title":"Out-of-bag predictions and evaluation","text":"cases, may want control --bag error estimated. example, let’s use Brier score SurvMetrics package: two ways apply function compute --bag error. First, can apply function --bag survival predictions stored ‘aorsf’ objects, e.g: Second, can pass function orsf(), used place Harrell’s C-statistic:","code":"oobag_fun_brier <- function(y_mat, w_vec, s_vec){ # output is numeric vector of length 1 as.numeric( SurvMetrics::Brier( object = Surv(time = y_mat[, 1], event = y_mat[, 2]), pre_sp = s_vec, # t_star in Brier() should match oob_pred_horizon in orsf() t_star = 2000 ) ) } oobag_fun_brier(y_mat = pbc_orsf[,c('time', 'status')], s_vec = fit$pred_oobag) #> [1] 0.117498 fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_horizon = 2000, oobag_fun = oobag_fun_brier, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag error computed after each new tree is grown.', sub = 'For the Brier score, lower values indicate more accurate predictions', xlab = 'Number of trees grown', ylab = \"Brier score\" ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"specific-instructions-on-user-supplied-functions","dir":"Articles","previous_headings":"User-supplied out-of-bag evaluation functions","what":"Specific instructions on user-supplied functions","title":"Out-of-bag predictions and evaluation","text":"User-supplied functions must: exactly three arguments named y_mat, w_vec, s_vec. return numeric output length 1 either conditions true, error occur. simple test make sure user-supplied function work aorsf package :","code":"# Helper code to make sure your oobag_fun function will work with aorsf # time and status values test_time <- seq(from = 1, to = 5, length.out = 100) test_status <- rep(c(0,1), each = 50) # y-matrix is presumed to contain time and status (with column names) y_mat <- cbind(time = test_time, status = test_status) # s_vec is presumed to be a vector of survival probabilities s_vec <- seq(0.9, 0.1, length.out = 100) # see 1 in the checklist above names(formals(oobag_fun_brier)) == c(\"y_mat\", \"w_vec\", \"s_vec\") #> [1] TRUE TRUE TRUE test_output <- oobag_fun_brier(y_mat = y_mat, w_vec = w_vec, s_vec = s_vec) # test output should be numeric is.numeric(test_output) #> [1] TRUE # test_output should be a numeric value of length 1 length(test_output) == 1 #> [1] TRUE"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"notes","dir":"Articles","previous_headings":"","what":"Notes","title":"Out-of-bag predictions and evaluation","text":"evaluating --bag error: oobag_pred_horizon input orsf() determines prediction horizon --bag predictions. prediction horizon needs specified evaluate prediction accuracy cases, examples . sure check case using functions, , sure oobag_pred_horizon matches prediction horizon used custom function. functions expect predicted risk (.e., 1 - predicted survival), others expect predicted survival. cases, also able use function whatsoever compute --bag prediction error estimating negation permutation importance, assuming passes tests . Unfortunately, exception riskRegression::Score(), one favorites. experimented riskRegression::Score found work try run C++. sure case.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"","what":"Partial dependence (PD)","title":"PD and ICE curves with ORSF","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. Begin fitting ORSF ensemble. Set prediction horizon 5 years fit ensemble aorsf function pass ensemble assume want compute predictions 5 years.","code":"library(aorsf) pred_horizon <- 365.25 * 5 set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, n_tree = 50, oobag_pred_horizon = pred_horizon) fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 150 #> N events: 52 #> N trees: 50 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 10.26 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.82 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"three-ways-to-compute-pd","dir":"Articles","previous_headings":"","what":"Three ways to compute PD","title":"PD and ICE curves with ORSF","text":"can compute PD three ways aorsf: using -bag predictions training data using --bag predictions training data using predictions new set data -bag PD indicates relationships model learned training. helpful goal interpret model. --bag PD indicates relationships model learned training using --bag data simulates application model new data. want test model’s reliability fairness new data don’t access large testing set. new data PD shows model predicts outcomes observations seen. helpful want test model’s reliability fairness. Let’s re-fit ORSF model available data proceeding next sections.","code":"pd_inb <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5)) pd_inb #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7907730 0.2481813 0.8937603 0.9844993 #> 2: 1826.25 2 0.7601163 0.2197790 0.8682995 0.9727126 #> 3: 1826.25 3 0.7285689 0.2036100 0.8217230 0.9565828 #> 4: 1826.25 4 0.6746859 0.1718164 0.7566957 0.9123469 #> 5: 1826.25 5 0.6432024 0.1594598 0.7325270 0.8812735 pd_oob <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_oob #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7881621 0.2863629 0.8597642 0.9894571 #> 2: 1826.25 2 0.7555353 0.2442537 0.8200408 0.9748819 #> 3: 1826.25 3 0.7255229 0.2015414 0.8066997 0.9652856 #> 4: 1826.25 4 0.6627814 0.1825518 0.7251021 0.9222259 #> 5: 1826.25 5 0.6312946 0.1513669 0.6887701 0.9008661 pd_test <- orsf_pd_new(fit, new_data = pbc_orsf_test, pred_spec = list(bili = 1:5)) pd_test #> pred_horizon bili mean lwr medn upr #> 1: 1826.25 1 0.7431989 0.2273887 0.7909497 0.9839732 #> 2: 1826.25 2 0.7109910 0.1997981 0.7468900 0.9641052 #> 3: 1826.25 3 0.6810310 0.2020585 0.7197035 0.9462750 #> 4: 1826.25 4 0.6286608 0.1842981 0.6652184 0.8986032 #> 5: 1826.25 5 0.5963168 0.1735789 0.6239932 0.8723134 set.seed(329730) fit <- orsf(pbc_orsf, Surv(time, status) ~ . -id, n_tree = 50, oobag_pred_horizon = pred_horizon)"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"one-variable-one-horizon","dir":"Articles","previous_headings":"","what":"One variable, one horizon","title":"PD and ICE curves with ORSF","text":"Computing PD single variable straightforward: output shows expected predicted mortality risk men substantially higher women 5 years baseline.","code":"pd_sex <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\"))) pd_sex #> pred_horizon sex mean lwr medn upr #> 1: 1826.25 m 0.6435480 0.07603602 0.7117799 0.9703216 #> 2: 1826.25 f 0.6856422 0.05828848 0.8006676 0.9897476"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"one-variable-moving-horizon","dir":"Articles","previous_headings":"","what":"One variable, moving horizon","title":"PD and ICE curves with ORSF","text":"effect predictor varies time? PD can show . inspection, can see males higher risk females difference risk grows time. can also seen viewing ratio expected risk time:","code":"pd_sex_tv <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\")), pred_horizon = seq(365, 365*5)) ggplot(pd_sex_tv, aes(x = pred_horizon, y = mean, color = sex)) + geom_line() + labs(x = 'Time since baseline', y = 'Expected risk') library(data.table) ratio_tv <- pd_sex_tv[ , .(ratio = mean[sex == 'm'] / mean[sex == 'f']), by = pred_horizon ] ggplot(ratio_tv, aes(x = pred_horizon, y = ratio)) + geom_line(color = 'grey') + geom_smooth(color = 'black', se = FALSE) + labs(x = 'time since baseline', y = 'ratio in expected risk for males versus females') #> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"multiple-variables-marginally","dir":"Articles","previous_headings":"","what":"Multiple variables, marginally","title":"PD and ICE curves with ORSF","text":"want compute PD marginally multiple variables, just list variable values pred_spec specify expand_grid = FALSE. Now tedious wanted variables? bet. ’s made function . bonus, printed output sorted least important variables. ’s easy enough turn ‘summary’ object data.table downstream plotting tables.","code":"pd_two_vars <- orsf_pd_oob(fit, pred_spec = list(sex = c(\"m\", \"f\"), bili = 1:5), expand_grid = FALSE) pd_two_vars #> pred_horizon variable value level mean lwr medn upr #> 1: 1826.25 sex NA m 0.6435480 0.07603602 0.7117799 0.9703216 #> 2: 1826.25 sex NA f 0.6856422 0.05828848 0.8006676 0.9897476 #> 3: 1826.25 bili 1 0.7578522 0.14149456 0.8228529 0.9879458 #> 4: 1826.25 bili 2 0.7007216 0.09924444 0.7792010 0.9688718 #> 5: 1826.25 bili 3 0.6463678 0.08210453 0.7216161 0.9398680 #> 6: 1826.25 bili 4 0.6058594 0.06268395 0.6709035 0.9324928 #> 7: 1826.25 bili 5 0.5760493 0.05933883 0.6265555 0.9027867 pd_smry <- orsf_summarize_uni(fit, n_variables = 4) pd_smry #> #> -- ascites (VI Rank: 1) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6881650 0.7943433 0.4746425 0.9449489 #> 1 0.5111958 0.5708393 0.3199671 0.7159134 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7651622 0.8323025 0.6370644 0.9563400 #> 1.40 0.7343138 0.8090864 0.5983719 0.9342372 #> 3.52 0.6271650 0.6967314 0.4420082 0.8455518 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6941026 0.8006676 0.4662736 0.9449489 #> 0.5 0.5978522 0.6289791 0.3879660 0.8228830 #> 1 0.5981885 0.6461966 0.3858379 0.8131762 #> #> -- copper (VI Rank: 4) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 42.8 0.7397492 0.8450985 0.5764946 0.9533662 #> 74.0 0.7168641 0.8286188 0.5265654 0.9448001 #> 129 0.6565337 0.7409069 0.4453135 0.8936874 #> #> Predicted survival at time t = 1826.25 for top 4 predictors head(as.data.table(pd_smry)) #> variable importance Value Mean Median 25th % 75th % #> 1: ascites 0.5225225 0 0.6881650 0.7943433 0.4746425 0.9449489 #> 2: ascites 0.5225225 1 0.5111958 0.5708393 0.3199671 0.7159134 #> 3: bili 0.3807339 0.80 0.7651622 0.8323025 0.6370644 0.9563400 #> 4: bili 0.3807339 1.40 0.7343138 0.8090864 0.5983719 0.9342372 #> 5: bili 0.3807339 3.52 0.6271650 0.6967314 0.4420082 0.8455518 #> 6: edema 0.2965828 0 0.6941026 0.8006676 0.4662736 0.9449489 #> pred_horizon level #> 1: 1826.25 0 #> 2: 1826.25 1 #> 3: 1826.25 #> 4: 1826.25 #> 5: 1826.25 #> 6: 1826.25 0"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"multiple-variables-jointly","dir":"Articles","previous_headings":"","what":"Multiple variables, jointly","title":"PD and ICE curves with ORSF","text":"PD can show expected value model’s predictions function specific predictor, function multiple predictors. instance, can estimate predicted risk joint function bili, edema, trt: inspection, model’s predictions indicate slightly lower risk placebo group, seem change much different values bili edema. clear increase predicted risk higher levels edema higher levels bili slope predicted risk function bili appears highest among patients edema 0.5. effect bili modified edema 0.5? quick sanity check coxph suggests .","code":"pred_spec = list(bili = seq(1, 5, length.out = 20), edema = levels(pbc_orsf_train$edema), trt = levels(pbc_orsf$trt)) pd_bili_edema <- orsf_pd_oob(fit, pred_spec) library(ggplot2) ggplot(pd_bili_edema, aes(x = bili, y = medn, col = trt, linetype = edema)) + geom_line() + labs(y = 'Expected predicted risk') library(survival) pbc_orsf$edema_05 <- ifelse(pbc_orsf$edema == '0.5', 'yes', 'no') fit_cph <- coxph(Surv(time,status) ~ edema_05 * bili, data = pbc_orsf) anova(fit_cph) #> Analysis of Deviance Table #> Cox model: response is Surv(time, status) #> Terms added sequentially (first to last) #> #> loglik Chisq Df Pr(>|Chi|) #> NULL -550.19 #> edema_05 -546.83 6.7248 1 0.009508 ** #> bili -513.59 66.4689 1 3.555e-16 *** #> edema_05:bili -510.54 6.1112 1 0.013433 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"","what":"Individual conditional expectations (ICE)","title":"PD and ICE curves with ORSF","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. Just like PD, can compute ICE using -bag, --bag, testing data, principles apply. ’ll use --bag estimates .","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"visualizing-ice-curves","dir":"Articles","previous_headings":"","what":"Visualizing ICE curves","title":"PD and ICE curves with ORSF","text":"Inspecting ICE curves observation can help identify whether heterogeneity model’s predictions. .e., effect variable follow pattern data, groups variable impacts risk differently? going turn boundary checking orsf_ice_oob setting boundary_checks = FALSE, allow generate ICE curves go beyond 90th percentile bili. id_variable identifier current value variable(s) data. redundant one variable, helpful multiple variables. id_row identifier observation original data. used group observation’s predictions together plots. plots, helpful scale ICE data. subtract initial value predicted risk (.e., bili = 1) observation’s conditional expectation values. , Every curve start 0 plot shows change predicted risk function bili. Now can visualize curves. inspection figure, individual slopes cluster around overall trend - Good! small number individual slopes appear flat. may helpful investigate .","code":"pred_spec <- list(bili = seq(1, 10, length.out = 25)) ice_oob <- orsf_ice_oob(fit, pred_spec, boundary_checks = FALSE) ice_oob #> id_variable id_row pred_horizon bili pred #> 1: 1 1 1826.25 1 0.1256531 #> 2: 1 2 1826.25 1 0.1266534 #> 3: 1 3 1826.25 1 0.2493624 #> 4: 1 4 1826.25 1 0.1644180 #> 5: 1 5 1826.25 1 0.4634519 #> --- #> 6896: 25 272 1826.25 10 0.4739199 #> 6897: 25 273 1826.25 10 0.5088416 #> 6898: 25 274 1826.25 10 0.8041698 #> 6899: 25 275 1826.25 10 0.3732715 #> 6900: 25 276 1826.25 10 0.7077810 ice_oob[, pred_subtract := rep(pred[id_variable==1], times=25)] ice_oob[, pred := pred - pred_subtract] library(ggplot2) ggplot(ice_oob, aes(x = bili, y = pred, group = id_row)) + geom_line(alpha = 0.15) + labs(y = 'Change in predicted risk') + geom_smooth(se = FALSE, aes(group = 1)) #> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"limitations-of-pd","dir":"Articles","previous_headings":"","what":"Limitations of PD","title":"PD and ICE curves with ORSF","text":"Partial dependence number known limitations assumptions users aware (see Hooker, 2021). particular, partial dependence less intuitive >2 predictors examined jointly, assumed feature(s) partial dependence computed correlated features (likely true many cases). Accumulated local effect plots can used (see ) case feature independence valid assumption.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"PD and ICE curves with ORSF","text":"Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires least One Model, Free Variable Importance. arXiv e-prints 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Byron Jaeger. Author, maintainer. Nicholas Pajewski. Contributor. Sawyer Welden. Contributor. Christopher Jackson. Reviewer. Marvin Wright. Reviewer. Lukas Burk. Reviewer.","code":""},{"path":"https://bcjaeger.github.io/aorsf/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Jaeger et al. (2022). aorsf: R package supervised learning using oblique random survival forest. Journal Open Source Software, 7(77), 4705. https://doi.org/10.21105/joss.04705. Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics. 2023 Aug 3:1-6. Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique Random Survival Forests. Annals Applied Statistics. 13(3): 1847-1883. URL https://doi.org/10.1214/19-AOAS1261 DOI: 10.1214/19-AOAS1261","code":"@Article{, title = {aorsf: An R package for supervised learning using the oblique random survival forest}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Nicholas M. Pajewski}, journal = {Journal of Open Source Software}, year = {2022}, volume = {7}, number = {77}, pages = {4705}, url = {https://doi.org/10.21105/joss.04705}, } @Article{, title = {Accelerated and interpretable oblique random survival forests}, author = {Byron C. Jaeger and Sawyer Welden and Kristin Lenoir and Jaime L. Speiser and Matthew W. Segar and Ambarish Pandey and Nicholas M. Pajewski}, journal = {Journal of Computational and Graphical Statistics}, year = {2023}, url = {https://doi.org/10.1080/10618600.2023.2231048}, } @Article{, title = {Oblique Random Survival Forests}, author = {Byron C. Jaeger and D. Leann Long and Dustin M. Long and Mario Sims and Jeff M. Szychowski and Yuan-I Min and Leslie A. Mcclure and George Howard and Noah Simon}, journal = {Annals of Applied Statistics}, year = {2019}, volume = {13}, number = {3}, pages = {1847--1883}, url = {https://doi.org/10.1214/19-AOAS1261}, }"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"aorsf-","dir":"","previous_headings":"","what":"Accelerated Oblique Random Survival Forests","title":"Accelerated Oblique Random Survival Forests","text":"Fit, interpret, make predictions oblique random survival forests (ORSFs).","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"why-aorsf","dir":"","previous_headings":"","what":"Why aorsf?","title":"Accelerated Oblique Random Survival Forests","text":"Hundreds times faster obliqueRSF.1 Accurate predictions censored outcomes.2 Negation importance, novel technique estimate variable importance ORSFs.2 Intuitive API formula based interface. Extensive input checks informative error messages.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Accelerated Oblique Random Survival Forests","text":"can install aorsf CRAN using can install development version aorsf GitHub :","code":"install.packages(\"aorsf\") # install.packages(\"remotes\") remotes::install_github(\"ropensci/aorsf\")"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"what-is-an-oblique-decision-tree","dir":"","previous_headings":"","what":"What is an oblique decision tree?","title":"Accelerated Oblique Random Survival Forests","text":"Decision trees developed splitting set training data two new subsets, goal similarity within new subsets . splitting process repeated resulting subsets data stopping criterion met. new subsets data formed based single predictor, decision tree said axis-based splits data appear perpendicular axis predictor. linear combinations variables used instead single variable, tree oblique splits data neither parallel right angle axis. Figure: Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes.","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Accelerated Oblique Random Survival Forests","text":"orsf() function can fit several types ORSF ensembles. personal favorite accelerated ORSF great combination prediction accuracy computational efficiency (see JCGS paper).2","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5)"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"inspect","dir":"","previous_headings":"Examples","what":"Inspect","title":"Accelerated Oblique Random Survival Forests","text":"Printing output orsf() give information descriptive statistics ensemble. See print.orsf_fit description line printed output. See orsf examples details controlling ORSF ensemble fits using prediction modeling workflows.","code":"fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated #> N observations: 150 #> N events: 52 #> N trees: 500 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 10 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.83 #> OOB stat type: Harrell's C-statistic #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"variable-importance","dir":"","previous_headings":"Examples","what":"Variable importance","title":"Accelerated Oblique Random Survival Forests","text":"importance individual variables can estimated three ways using aorsf: negation2: variable assessed separately multiplying variable’s coefficients -1 determining much model’s performance changes. worse model’s performance negating coefficients given variable, important variable. technique promising b/c require permutation emphasizes variables larger coefficients linear combinations, also relatively new hasn’t studied much permutation importance. See Jaeger, (2023) details technique. permutation: variable assessed separately randomly permuting variable’s values determining much model’s performance changes. worse model’s performance permuting values given variable, important variable. technique flexible, intuitive, frequently used. also several known limitations analysis variance (ANOVA)3: p-value computed coefficient linear combination variables decision tree. Importance individual predictor variable proportion times p-value coefficient < 0.01. technique efficient computationally, may effective permutation negation terms selecting signal noise variables. See Menze, 2011 details technique. can supply R function estimate --bag error using negation permutation importance (see oob vignette)","code":"orsf_vi_negate(fit) #> bili sex copper stage age #> 0.1162463868 0.0517905362 0.0375565841 0.0240450064 0.0239056901 #> ast protime hepato edema ascites #> 0.0191083400 0.0158014897 0.0139536512 0.0119264604 0.0100865906 #> albumin chol spiders trt trig #> 0.0085394443 0.0037903802 0.0030727468 0.0020617896 0.0018361632 #> alk.phos platelet #> 0.0006586211 -0.0044967624 orsf_vi_permute(fit) #> bili copper age stage sex #> 0.0523994364 0.0187964038 0.0152246586 0.0115192591 0.0110127557 #> ast hepato edema ascites albumin #> 0.0100104477 0.0082889176 0.0079183046 0.0077834483 0.0070642325 #> protime trig chol spiders alk.phos #> 0.0066513097 0.0015656325 0.0014474560 0.0006015308 0.0001369292 #> trt platelet #> -0.0013984860 -0.0022427356 orsf_vi_anova(fit) #> bili ascites edema copper stage sex age #> 0.48778004 0.44943820 0.41677872 0.31865585 0.26675095 0.26458616 0.25448430 #> ast hepato albumin chol protime trig spiders #> 0.21743929 0.19945726 0.18191604 0.15240328 0.15076561 0.13709677 0.11833550 #> alk.phos platelet trt #> 0.10113636 0.06302021 0.05019305"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"partial-dependence-pd","dir":"","previous_headings":"Examples","what":"Partial dependence (PD)","title":"Accelerated Oblique Random Survival Forests","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. summary function, orsf_summarize_uni(), computes PD many variables ask , using sensible values. PD, see vignette","code":"orsf_summarize_uni(fit, n_variables = 2) #> #> -- bili (VI Rank: 1) --------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> 0.70 0.1986719 0.1044026 0.04354701 0.2968290 #> 1.3 0.2132847 0.1210276 0.05245387 0.3208855 #> 3.2 0.2883814 0.1917119 0.11951296 0.4147258 #> #> -- sex (VI Rank: 2) ---------------------------- #> #> |---------------- Risk ----------------| #> Value Mean Median 25th % 75th % #> m 0.3394141 0.2313787 0.13762339 0.5311308 #> f 0.2390067 0.1112093 0.04782891 0.3773551 #> #> Predicted risk at time t = 1826.25 for top 2 predictors"},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"individual-conditional-expectations-ice","dir":"","previous_headings":"Examples","what":"Individual conditional expectations (ICE)","title":"Accelerated Oblique Random Survival Forests","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"comparison-to-existing-software","dir":"","previous_headings":"","what":"Comparison to existing software","title":"Accelerated Oblique Random Survival Forests","text":"Comparisons aorsf existing software presented JCGS paper. paper: describes aorsf detail summary procedures used tree fitting algorithm runs general benchmark comparing aorsf obliqueRSF several learners reports prediction accuracy computational efficiency learners. runs simulation study comparing variable importance techniques ORSFs, axis based RSFs, boosted trees. reports probability variable importance technique rank relevant variable higher importance irrelevant variable. hands-comparison aorsf R packages provided orsf examples","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Accelerated Oblique Random Survival Forests","text":"Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29","code":""},{"path":"https://bcjaeger.github.io/aorsf/index.html","id":"funding","dir":"","previous_headings":"","what":"Funding","title":"Accelerated Oblique Random Survival Forests","text":"developers aorsf receive financial support Center Biomedical Informatics, Wake Forest University School Medicine. also receive support National Center Advancing Translational Sciences National Institutes Health Award Number UL1TR001420. content solely responsibility authors necessarily represent official views National Institutes Health.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":null,"dir":"Reference","previous_headings":"","what":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","title":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","text":"Fit, interpret, make predictions oblique random survival forests. Oblique decision trees notoriously slow compared axis based counterparts, 'aorsf' runs fast faster axis-based decision tree algorithms right-censored time--event outcomes. Methods accelerate interpret oblique random survival forest described Jaeger et al., (2023) doi:10.1080/10618600.2023.2231048 .","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/aorsf-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"aorsf: Accelerated Oblique Random Survival Forests — aorsf-package","text":"Maintainer: Byron Jaeger bjaeger@wakehealth.edu (ORCID) contributors: Nicholas Pajewski [contributor] Sawyer Welden swelden@wakehealth.edu [contributor] Christopher Jackson chris.jackson@mrc-bsu.cam.ac.uk [reviewer] Marvin Wright [reviewer] Lukas Burk [reviewer]","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"Coerce to data.table — as.data.table.orsf_summary_uni","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"Convert 'orsf_summary' object data.table object.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"# S3 method for orsf_summary_uni as.data.table(x, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"x object class 'orsf_summary_uni' ... used","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"data.table","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/as.data.table.orsf_summary_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Coerce to data.table — as.data.table.orsf_summary_uni","text":"","code":"library(data.table) object <- orsf(pbc_orsf, Surv(time, status) ~ . - id) smry <- orsf_summarize_uni(object, n_variables = 3) as.data.table(smry) #> variable importance value mean medn lwr upr #> 1: ascites 0.4924319 0 0.6900374 0.8002266 0.4747794 0.9329735 #> 2: ascites 0.4924319 1 0.5276879 0.5921952 0.3307928 0.7210135 #> 3: bili 0.4179174 0.80 0.7588320 0.8409506 0.6293245 0.9337076 #> 4: bili 0.4179174 1.40 0.7387011 0.8207813 0.5972521 0.9214494 #> 5: bili 0.4179174 3.52 0.6272947 0.6917634 0.4626750 0.8291606 #> 6: edema 0.3079184 0 0.6948409 0.8086279 0.4904330 0.9359587 #> 7: edema 0.3079184 0.5 0.5942084 0.6362531 0.3830464 0.7948496 #> 8: edema 0.3079184 1 0.5920168 0.6305997 0.3821132 0.8220468 #> pred_horizon level #> 1: 1788 0 #> 2: 1788 1 #> 3: 1788 #> 4: 1788 #> 5: 1788 #> 6: 1788 0 #> 7: 1788 0.5 #> 8: 1788 1"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Oblique Random Survival Forest (ORSF) — orsf","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Fit oblique random survival forest","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"","code":"orsf( data, formula, control = NULL, weights = NULL, n_tree = 500, n_split = 5, n_retry = 3, n_thread = 1, mtry = NULL, sample_with_replacement = TRUE, sample_fraction = 0.632, leaf_min_events = 1, leaf_min_obs = 5, split_rule = NULL, split_min_events = 5, split_min_obs = 10, split_min_stat = NULL, oobag_pred_type = NULL, oobag_pred_horizon = NULL, oobag_eval_every = n_tree, oobag_fun = NULL, importance = \"anova\", importance_max_pvalue = 0.01, group_factors = TRUE, tree_seeds = NULL, attach_data = TRUE, no_fit = FALSE, na_action = \"fail\", verbose_progress = FALSE, ... ) orsf_train(object, attach_data = TRUE)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"data data.frame, tibble, data.table contains relevant variables. formula (formula) Two sided formula single outcome. terms right names predictor variables, symbol '.' may used indicate variables data except response. symbol '-' may also used indicate removal predictor. Details response vary depending forest type: Survival: response include time variable, followed status variable, may written inside call Surv (see examples). Classification: response single variable, variable type factor data. Regression: response single variable, variable typee double integer least 10 unique numeric values data. control (orsf_control) object returned one orsf_control functions: orsf_control_survival, orsf_control_classification, orsf_control_regression. NULL (default) use accelerated control, fastest available option. survival classification, Cox Logistic regression 1 iteration, regression ordinary least squares. weights (numeric vector) Optional. given, input length equal nrow(data) complete imputed data length equal nrow(na.omit(data)) na_action \"omit\". Values weights treated like replication weights, .e., value 2 thing 2 observations data, containing copy corresponding person's data. Use weights cautiously, orsf count number observations events prior growing node tree, higher values weights lead deeper trees. use input, highly recommended scale weights sum(weights) == nrow(data), help make tree depth consistent default weights = rep(1, nrow(data)) n_tree (integer) number trees grow. Default n_tree = 500. n_split (integer) number cut-points assessed splitting node decision trees. Default n_split = 5. n_retry (integer) node splittable, current linear combination inputs unable provide valid split, orsf try new linear combination based different set randomly selected predictors, n_retry times. Default n_retry = 3. Set n_retry = 0 prevent retries. n_thread (integer) number threads use growing trees, computing predictions, computing importance. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. mtry (integer) Number predictors randomly included candidates splitting node. default smallest integer greater square root number total predictors, .e., mtry = ceiling(sqrt(number predictors)) sample_with_replacement (logical) TRUE (default), observations sampled replacement -bag sample created decision tree. FALSE, observations sampled without replacement tree -bag sample containing sample_fraction% original sample. sample_fraction (double) proportion observations trees' -bag sample contain, relative number rows data. used sample_with_replacement FALSE. Default value 0.632. leaf_min_events (integer) minimum number events leaf node. Default leaf_min_events = 1 leaf_min_obs (integer) minimum number observations leaf node. Default leaf_min_obs = 5. split_rule (character) assess quality potential splitting rule node. Valid options survival : 'logrank' : log-rank test statistic (default). 'cstat' : Harrell's concordance statistic. classification, valid options : 'gini' : gini impurity (default) 'cstat' : area underneath ROC curve (AUC-ROC) regression, valid options : 'variance' : variance reduction (default) split_min_events (integer) minimum number events required node consider splitting . Default split_min_events = 5. input relevant survival trees. split_min_obs (integer) minimum number observations required node consider splitting . Default split_min_obs = 10. split_min_stat (double) minimum test statistic required split node. splits found statistic exceeding split_min_stat, given node either becomes leaf retry occurs (n_retry retries). Defaults 3.84 split_rule = 'logrank' 0.50 split_rule = 'cstat' (see first note ) 0.00 split_rule = 'gini' (see second note ) 0.00 split_rule = 'variance' Note 1 C-statistic splitting, C < 0.50, consider statistic value 1 - C allow good 'anti-predictive' splits. , C-statistic initially computed 0.1, considered 1 - 0.10 = 0.90. Note 2 Gini impurity, value 0 1 usually indicate best worst possible scores, respectively. make things simple avoid introducing split_max_stat input, flip values Gini impurity 1 0 indicate best worst possible scores, respectively. oobag_pred_type (character) type --bag predictions compute fitting ensemble. Valid options tree type: 'none' : compute --bag predictions 'leaf' : ID predicted leaf returned tree Valid options survival: 'risk' : probability event occurring oobag_pred_horizon (default). 'surv' : 1 - risk. 'chf' : cumulative hazard function oobag_pred_horizon. 'mort' : mortality, .e., number events expected observations training data identical given observation. Valid options classification: 'prob' : probability class (default) 'class' : class (.e., .max(prob)) Valid options regression: 'mean' : mean value (default) oobag_pred_horizon (numeric) numeric value indicating time used --bag predictions. Default median observed times, .e., oobag_pred_horizon = median(time). input relevant survival trees prediction type 'risk', 'surv', 'chf'. oobag_eval_every (integer) --bag performance ensemble checked every oobag_eval_every trees. , oobag_eval_every = 10, --bag performance checked growing 10th tree, 20th tree, . Default oobag_eval_every = n_tree. oobag_fun (function) used evaluating --bag prediction accuracy every oobag_eval_every trees. oobag_fun = NULL (default), evaluation statistic selected based tree type survival: Harrell's C-statistic (1982) classification: Area underneath ROC curve (AUC-ROC) regression: Traditional prediction R-squared use oobag_fun note following: oobag_fun three inputs: y_mat, w_vec, s_vec survival trees, y_mat two column matrix first column named 'time' second named 'status'. classification trees, y_mat matrix number columns = number distinct classes outcome. regression, y_mat matrix one column. s_vec numeric vector containing predictions oobag_fun return numeric output length 1 details, see --bag vignette. importance (character) Indicate method variable importance: 'none': variable importance computed. 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance details methods, see orsf_vi. importance_max_pvalue (double) relevant importance \"anova\". maximum p-value register positive case counting number times variable found 'significant' tree growth. Default 0.01, recommended Menze et al. group_factors (logical) relevant variable importance estimated. TRUE, importance factor variables reported overall aggregating importance individual levels factor. FALSE, importance individual factor levels returned. tree_seeds (integer vector) Optional. specified, random seeds set using values tree_seeds[] growing tree . Two forests grown number trees seeds exact --bag samples, making --bag error estimates forests comparable. NULL (default), seeds picked random. attach_data (logical) TRUE, copy training data attached output. required plan using functions like orsf_pd_oob orsf_summarize_uni interpret forest using training data. Default TRUE. no_fit (logical) TRUE, model fitting steps defined saved, training initiated. object returned can directly submitted orsf_train() long attach_data TRUE. na_action (character) happen data contains missing values (.e., NA values). Valid options : 'fail' : error thrown data contains NA values 'omit' : rows data incomplete data dropped 'impute_meanmode' : missing values continuous categorical variables data imputed using mean mode, respectively. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. ... arguments passed methods (currently used). object untrained 'aorsf' object, created setting no_fit = TRUE orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"accelerated oblique RSF object (aorsf)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"function based similar ORSF function obliqueRSF R package. primary difference function runs much faster. speed increase attributable better management memory (.e., unnecessary copies inputs) using Newton Raphson scoring algorithm identify linear combinations inputs rather performing penalized regression using routines glmnet.modified Newton Raphson scoring algorithm function applies adaptation C++ routine developed Terry M. Therneau fits Cox proportional hazards models (see survival::coxph() specifically survival::coxph.fit()).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"details-on-inputs","dir":"Reference","previous_headings":"","what":"Details on inputs","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"formula: response formula can survival object returned Surv function, can also just time status variables. .e., Surv(time, status) ~ . works just like time + status ~ . . symbol right hand side short-hand using variables data (omitting left hand side formula) predictors. order variables left hand side matters. .e., writing status + time ~ . make orsf assume status variable actually time variable. response variable can survival object stored data. example, y ~ . valid formula data$y inherits Surv class. Although can fit oblique random survival forest 1 predictor variable, formula least 2 predictors. reason recommendation linear combination predictors trivial one predictor. mtry: mtry parameter may temporarily reduced ensure least 2 events per predictor variable. occurs using orsf_control_cph coefficients Newton Raphson scoring algorithm may become unstable number covariates greater equal number events. reduction occur using orsf_control_net. oobag_fun: oobag_fun specified, used compute negation importance permutation importance, role ANOVA importance. n_thread: R function must called C++ (.e., user-supplied function compute --bag error identify linear combinations variables), n_thread automatically set 1 attempting run R functions multiple threads cause R session crash.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-an-oblique-decision-tree-","dir":"Reference","previous_headings":"","what":"What is an oblique decision tree?","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Decision trees developed splitting set training data two new subsets, goal similarity within new subsets . splitting process repeated resulting subsets data stopping criterion met. new subsets data formed based single predictor, decision tree said axis-based splits data appear perpendicular axis predictor. linear combinations variables used instead single variable, tree oblique splits data neither parallel right angle axis Figure : Decision trees classification axis-based splitting (left) oblique splitting (right). Cases orange squares; controls purple circles. trees partition predictor space defined variables X1 X2, oblique splits better job separating two classes.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"what-is-a-random-forest-","dir":"Reference","previous_headings":"","what":"What is a random forest?","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Random forests collections de-correlated decision trees. Predictions tree aggregated make ensemble prediction forest. details, see Breiman el, 2001.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"training-out-of-bag-error-and-testing","dir":"Reference","previous_headings":"","what":"Training, out-of-bag error, and testing","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. '--bag' data instances bootstrapped training set. tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. Generalization error refers error random forest's predictions applied predict outcomes data used train , .e., testing data.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"missing-data","dir":"Reference","previous_headings":"","what":"Missing data","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Data passed aorsf functions allowed missing values. user impute missing values using R package purpose, recipes mlr3pipelines.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"First load relevant packages entry-point aorsf standard call orsf(): printing fit provides quick descriptive summaries:","code":"set.seed(329730) suppressPackageStartupMessages({ library(aorsf) library(survival) library(tidymodels) library(tidyverse) library(randomForestSRC) library(ranger) library(riskRegression) library(obliqueRSF) }) fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 20.98 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"model-control","dir":"Reference","previous_headings":"","what":"Model control","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"examples make use orsf_control_ functions build compare models based --bag predictions. also standardize --bag samples using input argument tree_seeds","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"accelerated-linear-combinations","dir":"Reference","previous_headings":"","what":"Accelerated linear combinations","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"accelerated ORSF ensemble default nice balance computational speed prediction accuracy. runs single iteration Newton Raphson scoring Cox partial likelihood function find linear combinations predictors.","code":"fit_accel <- orsf(pbc_orsf, control = orsf_control_survival(), formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-cox-regression","dir":"Reference","previous_headings":"","what":"Linear combinations with Cox regression","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"orsf_control_cph runs Cox regression non-terminal node survival tree, using regression coefficients create linear combinations predictors:","code":"control_cph <- orsf_control_survival(method = 'glm', scale_x = TRUE, max_iter = 20) fit_cph <- orsf(pbc_orsf, control = control_cph, formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-penalized-cox-regression","dir":"Reference","previous_headings":"","what":"Linear combinations with penalized cox regression","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"orsf_control_net runs penalized Cox regression non-terminal node survival tree, using regression coefficients create linear combinations predictors. can really helpful want feature selection within node, lot slower options.","code":"# select 3 predictors out of 5 to be used in # each linear combination of predictors. control_net <- orsf_control_survival(method = 'net', target_df = 3) fit_net <- orsf(pbc_orsf, control = control_net, formula = Surv(time, status) ~ . - id, tree_seeds = 329)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"linear-combinations-with-your-own-function","dir":"Reference","previous_headings":"","what":"Linear combinations with your own function","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Let’s make three customized functions identify linear combinations predictors. first uses random coefficients second derives coefficients principal component analysis. third uses orsf() inside orsf(). can plug functions orsf_control_custom(), pass result orsf(): fit seems work best example? Let’s find evaluating --bag survival predictions. AUC values, highest lowest: indices prediction accuracy: inspection, net, accel, rlt high discrimination index prediction accuracy. rando pca less well, aren’t bad.","code":"f_rando <- function(x_node, y_node, w_node){ matrix(runif(ncol(x_node)), ncol=1) } f_pca <- function(x_node, y_node, w_node) { # estimate two principal components. pca <- stats::prcomp(x_node, rank. = 2) # use the second principal component to split the node pca$rotation[, 1L, drop = FALSE] } # This approach is known as reinforcement learning trees. # some special care is taken to prevent your R session from crashing. # Specifically, random coefficients are used when n_obs <= 10 # or n_events <= 5. f_aorsf <- function(x_node, y_node, w_node){ colnames(y_node) <- c('time', 'status') colnames(x_node) <- paste(\"x\", seq(ncol(x_node)), sep = '') data <- as.data.frame(cbind(y_node, x_node)) if(nrow(data) <= 10 || sum(y_node[,'status']) <= 5) return(matrix(runif(ncol(x_node)), ncol = 1)) fit <- orsf(data, time + status ~ ., weights = as.numeric(w_node), n_tree = 25, importance = 'permute') out <- orsf_vi(fit) # drop the least two important variables n_vars <- length(out) out[c(n_vars, n_vars-1)] <- 0 # ensure out has same variable order as input out <- out[colnames(x_node)] matrix(out, ncol = 1) } fit_rando <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_rando), tree_seeds = 329) fit_pca <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_pca), tree_seeds = 329) fit_rlt <- orsf(pbc_orsf, time + status ~ . - id, control = orsf_control_survival(method = f_aorsf), tree_seeds = 329) risk_preds <- list( accel = 1 - fit_accel$pred_oobag, cph = 1 - fit_cph$pred_oobag, net = 1 - fit_net$pred_oobag, rando = 1 - fit_rando$pred_oobag, pca = 1 - fit_pca$pred_oobag, rlt = 1 - fit_rlt$pred_oobag ) sc <- Score(object = risk_preds, formula = Surv(time, status) ~ 1, data = pbc_orsf, summary = 'IPA', times = fit_accel$pred_horizon) sc$AUC$score[order(-AUC)] ## model times AUC se lower upper ## 1: net 1788 0.9134593 0.02079935 0.8726933 0.9542253 ## 2: rlt 1788 0.9129537 0.01979016 0.8741657 0.9517417 ## 3: accel 1788 0.9112315 0.02098077 0.8701099 0.9523530 ## 4: cph 1788 0.9063871 0.02165434 0.8639453 0.9488288 ## 5: rando 1788 0.9023489 0.02218936 0.8588586 0.9458393 ## 6: pca 1788 0.8994220 0.02201713 0.8562692 0.9425748 sc$Brier$score[order(-IPA), .(model, times, IPA)] ## model times IPA ## 1: net 1788 0.4916038 ## 2: accel 1788 0.4879683 ## 3: cph 1788 0.4751883 ## 4: rlt 1788 0.4640282 ## 5: pca 1788 0.4370592 ## 6: rando 1788 0.4258344 ## 7: Null model 1788 0.0000000"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"tidymodels","dir":"Reference","previous_headings":"","what":"tidymodels","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"example uses tidymodels functions stops short using official tidymodels workflow. working getting aorsf pulled censored package update real workflows happens!","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"comparing-orsf-with-other-learners","dir":"Reference","previous_headings":"","what":"Comparing ORSF with other learners","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Start recipe pre-process data Next create 10-fold cross validation object pre-process data: Define functions ‘workflow’ randomForestSRC, ranger, aorsf. Run ‘workflows’ fold: Next unnest column get back tibble testing data predictions. finish aggregating predictions computing performance testing data. Note computing one statistic predictions instead computing one statistic fold. approach fine smaller testing sets /small event counts. inspection, aorsf obtained slightly higher discrimination (AUC) aorsf obtained higher index prediction accuracy (IPA)","code":"imputer <- recipe(pbc_orsf, formula = time + status ~ .) %>% step_rm(id) %>% step_impute_mean(all_numeric_predictors()) %>% step_impute_mode(all_nominal_predictors()) # 10-fold cross validation; make a container for the pre-processed data analyses <- vfold_cv(data = pbc_orsf, v = 10) %>% mutate(recipe = map(splits, ~prep(imputer, training = training(.x))), train = map(recipe, juice), test = map2(splits, recipe, ~bake(.y, new_data = testing(.x)))) analyses ## # 10-fold cross-validation ## # A tibble: 10 x 5 ## splits id recipe train test ## ## 1 Fold01 ## 2 Fold02 ## 3 Fold03 ## 4 Fold04 ## 5 Fold05 ## 6 Fold06 ## 7 Fold07 ## 8 Fold08 ## 9 Fold09 ## 10 Fold10 rfsrc_wf <- function(train, test, pred_horizon){ # rfsrc does not like tibbles, so cast input data into data.frames train <- as.data.frame(train) test <- as.data.frame(test) rfsrc(formula = Surv(time, status) ~ ., data = train) %>% predictRisk(newdata = test, times = pred_horizon) %>% as.numeric() } ranger_wf <- function(train, test, pred_horizon){ ranger(Surv(time, status) ~ ., data = train) %>% predictRisk(newdata = test, times = pred_horizon) %>% as.numeric() } aorsf_wf <- function(train, test, pred_horizon){ train %>% orsf(Surv(time, status) ~ .,) %>% predict(new_data = test, pred_type = 'risk', pred_horizon = pred_horizon) %>% as.numeric() } # 5 year risk prediction ph <- 365.25 * 5 results <- analyses %>% transmute(test, pred_aorsf = map2(train, test, aorsf_wf, pred_horizon = ph), pred_rfsrc = map2(train, test, rfsrc_wf, pred_horizon = ph), pred_ranger = map2(train, test, ranger_wf, pred_horizon = ph)) results <- results %>% unnest(everything()) glimpse(results) ## Rows: 276 ## Columns: 23 ## $ id 1, 8, 15, 17, 23, 35, 56, 68, 92, 94, 97, 109, 116, 130, 1~ ## $ trt d_penicill_main, placebo, d_penicill_main, placebo, placeb~ ## $ age 58.76523, 53.05681, 64.64613, 52.18344, 55.96715, 48.61875~ ## $ sex f, f, f, f, f, f, f, f, f, f, m, f, f, f, f, f, f, f, f, f~ ## $ ascites 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~ ## $ hepato 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0~ ## $ spiders 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0~ ## $ edema 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0.5, 0, 0.5, 0, 0, 0, 0.5, 0~ ## $ bili 14.5, 0.3, 0.8, 2.7, 17.4, 1.2, 1.1, 0.7, 1.4, 3.2, 2.0, 0~ ## $ chol 261, 280, 231, 274, 395, 314, 498, 174, 206, 201, 420, 120~ ## $ albumin 2.60, 4.00, 3.87, 3.15, 2.94, 3.20, 3.80, 4.09, 3.13, 3.11~ ## $ copper 156, 52, 173, 159, 558, 201, 88, 58, 36, 178, 62, 53, 74, ~ ## $ alk.phos 1718.0, 4651.2, 9009.8, 1533.0, 6064.8, 12258.8, 13862.4, ~ ## $ ast 137.95, 28.38, 127.71, 117.80, 227.04, 72.24, 95.46, 71.30~ ## $ trig 172, 189, 96, 128, 191, 151, 319, 46, 70, 69, 91, 52, 382,~ ## $ platelet 190, 373, 295, 224, 214, 431, 365, 203, 145, 188, 344, 271~ ## $ protime 12.2, 11.0, 11.0, 10.5, 11.7, 10.6, 10.6, 10.6, 12.2, 11.8~ ## $ stage 4, 3, 3, 4, 4, 3, 2, 3, 4, 4, 3, 3, 3, 3, 2, 2, 3, 2, 3, 2~ ## $ time 400, 2466, 3584, 769, 264, 2847, 1847, 4039, 388, 750, 611~ ## $ status 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0~ ## $ pred_aorsf 0.95858574, 0.05070837, 0.22924369, 0.47048899, 0.94442624~ ## $ pred_rfsrc 0.85474291, 0.05480510, 0.31751746, 0.45156052, 0.85784657~ ## $ pred_ranger 0.77897322, 0.04599438, 0.30267242, 0.43467381, 0.77914735~ Score( object = list(aorsf = results$pred_aorsf, rfsrc = results$pred_rfsrc, ranger = results$pred_ranger), formula = Surv(time, status) ~ 1, data = results, summary = 'IPA', times = ph ) ## ## Metric AUC: ## ## Results by model: ## ## model times AUC lower upper ## 1: aorsf 1826 90.7 86.4 95.0 ## 2: rfsrc 1826 89.9 85.6 94.2 ## 3: ranger 1826 89.7 85.5 94.0 ## ## Results of model comparisons: ## ## times model reference delta.AUC lower upper p ## 1: 1826 rfsrc aorsf -0.8 -2.2 0.5 0.2 ## 2: 1826 ranger aorsf -1.0 -2.5 0.5 0.2 ## 3: 1826 ranger rfsrc -0.1 -1.3 1.0 0.8 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The higher AUC the better. ## ## Metric Brier: ## ## Results by model: ## ## model times Brier lower upper IPA ## 1: Null model 1826.25 20.5 18.1 22.9 0.0 ## 2: aorsf 1826.25 10.8 8.6 13.0 47.3 ## 3: rfsrc 1826.25 11.8 9.7 13.9 42.5 ## 4: ranger 1826.25 11.9 9.8 14.0 41.9 ## ## Results of model comparisons: ## ## times model reference delta.Brier lower upper p ## 1: 1826.25 aorsf Null model -9.7 -12.3 -7.1 2.179169e-13 ## 2: 1826.25 rfsrc Null model -8.7 -11.0 -6.4 3.915661e-14 ## 3: 1826.25 ranger Null model -8.6 -10.9 -6.2 8.869452e-13 ## 4: 1826.25 rfsrc aorsf 1.0 0.3 1.7 5.383491e-03 ## 5: 1826.25 ranger aorsf 1.1 0.5 1.7 7.890574e-04 ## 6: 1826.25 ranger rfsrc 0.1 -0.4 0.6 6.310608e-01 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The lower Brier the better, the higher IPA the better."},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"mlr-pipelines","dir":"Reference","previous_headings":"","what":"mlr3 pipelines","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Warning: code may may run depending current version mlr3proba. First load additional mlr3 libraries. Next ’ll define tasks learners engage . Now can make benchmark designed compare three favorite learners: Let’s look overall results: inspection, aorsf higher expected value ‘surv.cindex’ (higher better) aorsf lower expected value ‘surv.graf’ (lower better)","code":"suppressPackageStartupMessages({ library(mlr3verse) library(mlr3proba) library(mlr3extralearners) library(mlr3viz) library(mlr3benchmark) }) # Mayo Clinic Primary Biliary Cholangitis Data task_pbc <- TaskSurv$new( id = 'pbc', backend = select(pbc_orsf, -id) %>% mutate(stage = as.numeric(stage)), time = \"time\", event = \"status\" ) # Veteran's Administration Lung Cancer Trial data(veteran, package = \"randomForestSRC\") task_veteran <- TaskSurv$new( id = 'veteran', backend = veteran, time = \"time\", event = \"status\" ) # NKI 70 gene signature data_nki <- OpenML::getOMLDataSet(data.id = 1228) task_nki <- TaskSurv$new( id = 'nki', backend = data_nki$data, time = \"time\", event = \"event\" ) # Gene Expression-Based Survival Prediction in Lung Adenocarcinoma data_lung <- OpenML::getOMLDataSet(data.id = 1245) task_lung <- TaskSurv$new( id = 'nki', backend = data_lung$data %>% mutate(OS_event = as.numeric(OS_event) -1), time = \"OS_years\", event = \"OS_event\" ) # Chemotherapy for Stage B/C colon cancer # (there are two rows per person, one for death # and the other for recurrence, hence the two tasks) task_colon_death <- TaskSurv$new( id = 'colon_death', backend = survival::colon %>% filter(etype == 2) %>% drop_na() %>% # drop id, redundant variables select(-id, -study, -node4, -etype), mutate(OS_event = as.numeric(OS_event) -1), time = \"time\", event = \"status\" ) task_colon_recur <- TaskSurv$new( id = 'colon_death', backend = survival::colon %>% filter(etype == 1) %>% drop_na() %>% # drop id, redundant variables select(-id, -study, -node4, -etype), mutate(OS_event = as.numeric(OS_event) -1), time = \"time\", event = \"status\" ) # putting them all together tasks <- list(task_pbc, task_veteran, task_nki, task_lung, task_colon_death, task_colon_recur, # add a few more pre-made ones tsk(\"actg\"), tsk('gbcs'), tsk('grace'), tsk(\"unemployment\"), tsk(\"whas\")) # Learners with default parameters learners <- lrns(c(\"surv.ranger\", \"surv.rfsrc\", \"surv.aorsf\")) # Brier (Graf) score, c-index and training time as measures measures <- msrs(c(\"surv.graf\", \"surv.cindex\", \"time_train\")) # Benchmark with 5-fold CV design <- benchmark_grid( tasks = tasks, learners = learners, resamplings = rsmps(\"cv\", folds = 5) ) benchmark_result <- benchmark(design) bm_scores <- benchmark_result$score(measures, predict_sets = \"test\") bm_scores %>% select(task_id, learner_id, surv.graf, surv.cindex, time_train) %>% group_by(learner_id) %>% filter(!is.infinite(surv.graf)) %>% summarize( across( .cols = c(surv.graf, surv.cindex, time_train), .fns = ~mean(.x, na.rm = TRUE) ) ) ## # A tibble: 3 x 4 ## learner_id surv.graf surv.cindex time_train ## ## 1 surv.aorsf 0.150 0.734 0.383 ## 2 surv.ranger 0.164 0.716 2.04 ## 3 surv.rfsrc 0.155 0.724 0.757"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Oblique Random Survival Forest (ORSF) — orsf","text":"Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating Yield Medical Tests. JAMA 1982; 247(18):2543-2546. DOI: 10.1001/jama.1982.03320430047030 Breiman L. Random forests. Machine learning 2001 Oct; 45(1):5-32. DOI: 10.1023/:1010933404324 Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Annals applied statistics 2008 Sep; 2(3):841-60. DOI: 10.1214/08-AOAS169 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29 Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":null,"dir":"Reference","previous_headings":"","what":"Oblique random forest control — orsf_control","title":"Oblique random forest control — orsf_control","text":"Oblique random forest control","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Oblique random forest control — orsf_control","text":"","code":"orsf_control( tree_type, method, scale_x, ties, net_mix, target_df, max_iter, epsilon, ... ) orsf_control_classification( method = \"glm\", scale_x = TRUE, net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... ) orsf_control_regression( method = \"glm\", scale_x = TRUE, net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... ) orsf_control_survival( method = \"glm\", scale_x = TRUE, ties = \"efron\", net_mix = 0.5, target_df = NULL, max_iter = 20, epsilon = 1e-09, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Oblique random forest control — orsf_control","text":"tree_type (character) type tree. Valid options \"classification\", .e., categorical outcomes \"regression\", .e., continuous outcomes \"survival\", .e., time-event outcomes method (character function) identify linear linear combinations predictors. method character value, must one : 'glm': linear, logistic, cox regression 'net': 'glm' penalty terms 'pca': principal component analysis 'random': random draw uniform distribution method function, used identify linear combinations predictor variables. method must case accept three inputs named x_node, y_node w_node, expect following types dimensions: x_node (matrix; n rows, p columns) y_node (matrix; n rows, 2 columns) w_node (matrix; n rows, 1 column) addition, method must return matrix p rows 1 column. scale_x (logical) TRUE, values predictors scaled prior instance finding linear combination predictors, using summary values data current node decision tree. ties (character) character string specifying method tie handling. relevant modeling survival outcomes using method engages tied outcome times. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. net_mix (double) elastic net mixing parameter. value 1 gives lasso penalty, value 0 gives ridge penalty. multiple values alpha given, penalized model fit using alpha value prior splitting node. target_df (integer) Preferred number variables used linear combination. example, mtry 5 target_df 3, sample 5 predictors look best linear combination using 3 . max_iter (integer) iteration continues convergence (see eps ) number attempted iterations equal iter_max. epsilon (double) using modeling based method, iteration continues algorithm relative change kind objective less epsilon, absolute change less sqrt(epsilon). ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Oblique random forest control — orsf_control","text":"object class 'orsf_control', used input control argument orsf. Components : tree_type: type trees fit lincomb_type: method linear combinations lincomb_eps: epsilon convergence lincomb_iter_max: max iterations lincomb_scale: scale . lincomb_alpha: mixing parameter lincomb_df_target: target degrees freedom lincomb_ties_method: method ties survival time lincomb_R_function: R function custom splits","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Oblique random forest control — orsf_control","text":"Adjust scale_x risk. Setting scale_x = FALSE reduce computation time also make orsf model dependent scale data, default value TRUE.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cox regression ORSF control — orsf_control_cph","title":"Cox regression ORSF control — orsf_control_cph","text":"Use coefficients proportional hazards model create linear combinations predictor variables fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cox regression ORSF control — orsf_control_cph","text":"","code":"orsf_control_cph(method = \"efron\", eps = 1e-09, iter_max = 20, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cox regression ORSF control — orsf_control_cph","text":"method (character) character string specifying method tie handling. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. eps (double) using Newton Raphson scoring identify linear combinations inputs, iteration continues algorithm relative change log partial likelihood less eps, absolute change less sqrt(eps). Must positive. default value 1e-09 used consistency survival::coxph.control. iter_max (integer) iteration continues convergence (see eps ) number attempted iterations equal iter_max. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cox regression ORSF control — orsf_control_cph","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cox regression ORSF control — orsf_control_cph","text":"code survival package modified make routine. details Cox proportional hazards model, see coxph /Therneau Grambsch (2000).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_cph.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Cox regression ORSF control — orsf_control_cph","text":"Therneau T.M., Grambsch P.M. (2000) Cox Model. : Modeling Survival Data: Extending Cox Model. Statistics Biology Health. Springer, New York, NY. DOI: 10.1007/978-1-4757-3294-8_3","code":""},{"path":[]},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Custom ORSF control — orsf_control_custom","text":"","code":"orsf_control_custom(beta_fun, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Custom ORSF control — orsf_control_custom","text":"beta_fun (function) function define coefficients used linear combinations predictor variables. beta_fun must accept three inputs named x_node, y_node w_node, expect following types dimensions: x_node (matrix; n rows, p columns) y_node (matrix; n rows, 2 columns) w_node (matrix; n rows, 1 column) addition, beta_fun must return matrix p rows 1 column. conditions met, orsf_control_custom() let know. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Custom ORSF control — orsf_control_custom","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Custom ORSF control — orsf_control_custom","text":"Two customized functions identify linear combinations predictors shown . first uses random coefficients second derives coefficients principal component analysis.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"random-coefficients","dir":"Reference","previous_headings":"","what":"Random coefficients","title":"Custom ORSF control — orsf_control_custom","text":"f_rando() function get random coefficients: can plug f_rando orsf_control_survival(), pass result orsf():","code":"f_rando <- function(x_node, y_node, w_node){ matrix(runif(ncol(x_node)), ncol=1) } library(aorsf) fit_rando <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_rando), n_tree = 500) fit_rando ## ---------- Oblique random survival forest ## ## Linear combinations: Custom user function ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 19.682 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.83 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"principal-components","dir":"Reference","previous_headings":"","what":"Principal components","title":"Custom ORSF control — orsf_control_custom","text":"Follow steps , starting custom function: plug function orsf_control_survival() pass result orsf():","code":"f_pca <- function(x_node, y_node, w_node) { # estimate two principal components. pca <- stats::prcomp(x_node, rank. = 2) # use the second principal component to split the node pca$rotation[, 2L, drop = FALSE] } fit_pca <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_survival(method = f_pca), n_tree = 500)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_custom.html","id":"evaluate","dir":"Reference","previous_headings":"","what":"Evaluate","title":"Custom ORSF control — orsf_control_custom","text":"well two customized ORSFs ? Let’s compute indices prediction accuracy based --bag predictions: PCA ORSF quite well! (higher IPA better)","code":"library(riskRegression) ## riskRegression version 2023.09.08 library(survival) risk_preds <- list(rando = 1 - fit_rando$pred_oobag, pca = 1 - fit_pca$pred_oobag) sc <- Score(object = risk_preds, formula = Surv(time, status) ~ 1, data = pbc_orsf, summary = 'IPA', times = fit_pca$pred_horizon) sc$Brier ## ## Results by model: ## ## model times Brier lower upper IPA ## 1: Null model 1788 20.479 18.090 22.868 0.000 ## 2: rando 1788 11.872 9.771 13.972 42.031 ## 3: pca 1788 12.990 10.971 15.009 36.569 ## ## Results of model comparisons: ## ## times model reference delta.Brier lower upper p ## 1: 1788 rando Null model -8.607 -10.809 -6.406 1.832790e-14 ## 2: 1788 pca Null model -7.489 -9.213 -5.765 1.664802e-17 ## 3: 1788 pca rando 1.118 0.258 1.979 1.087482e-02 ## ## NOTE: Values are multiplied by 100 and given in %. ## NOTE: The lower Brier the better, the higher IPA the better."},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":null,"dir":"Reference","previous_headings":"","what":"Accelerated ORSF control — orsf_control_fast","title":"Accelerated ORSF control — orsf_control_fast","text":"Fast methods identify linear combinations predictors fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Accelerated ORSF control — orsf_control_fast","text":"","code":"orsf_control_fast(method = \"efron\", do_scale = TRUE, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Accelerated ORSF control — orsf_control_fast","text":"method (character) character string specifying method tie handling. ties, methods equivalent. Valid options 'breslow' 'efron'. Efron approximation default accurate dealing tied event times similar computational efficiency compared Breslow method. do_scale (logical) TRUE, values predictors scaled prior instance Newton Raphson scoring, using summary values data current node decision tree. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Accelerated ORSF control — orsf_control_fast","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_fast.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Accelerated ORSF control — orsf_control_fast","text":"code survival package modified make routine. Adjust do_scale risk. Setting do_scale = FALSE reduce computation time also make orsf model dependent scale data, default value TRUE.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":null,"dir":"Reference","previous_headings":"","what":"Penalized Cox regression ORSF control — orsf_control_net","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"Use regularized Cox proportional hazard models identify linear combinations input variables fitting orsf model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"","code":"orsf_control_net(alpha = 1/2, df_target = NULL, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"alpha (double) elastic net mixing parameter. value 1 gives lasso penalty, value 0 gives ridge penalty. multiple values alpha given, penalized model fit using alpha value prior splitting node. df_target (integer) Preferred number variables used linear combination. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"object class 'orsf_control', used input control argument orsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"df_target less mtry, separate argument orsf indicates number variables chosen random prior finding linear combination variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_control_net.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Penalized Cox regression ORSF control — orsf_control_net","text":"Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths Cox's proportional hazards model via coordinate descent. Journal statistical software 2011 Mar; 39(5):1. DOI: 10.18637/jss.v039.i05","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF Individual Conditional Expectations — orsf_ice_oob","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"Compute individual conditional expectations ORSF model. Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. can compute individual conditional expectations three ways using random forest: using -bag predictions training data using --bag predictions training data using predictions new set data See examples details","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"","code":"orsf_ice_oob( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... ) orsf_ice_inb( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... ) orsf_ice_new( object, pred_spec, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", expand_grid = TRUE, boundary_checks = TRUE, n_thread = 1, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"object (orsf_fit) trained oblique random survival forest (see orsf). pred_spec (named list data.frame). pred_spec named list, item list vector values used points partial dependence function. name item list indicate variable modified take corresponding values. pred_spec data.frame, columns indicate variable names, values indicate variable values, partial dependence computed using inputs row. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction expand_grid (logical) TRUE, partial dependence computed possible combinations inputs pred_spec. FALSE, partial dependence computed variable pred_spec, separately. boundary_checks (logical) TRUE, pred_spec checked make sure requested values 10th 90th percentile object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. ... arguments passed methods (currently used). new_data data.frame, tibble, data.table compute predictions . na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'omit' : rows new_data incomplete data dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"data.table containing individual conditional expectations specified variable(s) specified prediction horizon(s).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_ice_oob.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF Individual Conditional Expectations — orsf_ice_oob","text":"Begin fitting ORSF ensemble Use ensemble compute ICE values using --bag predictions: Much detailed examples given vignette","code":"library(aorsf) set.seed(329) fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 21.026 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## ----------------------------------------- pred_spec <- list(bili = seq(1, 10, length.out = 25)) ice_oob <- orsf_ice_oob(fit, pred_spec, boundary_checks = FALSE) ice_oob ## id_variable id_row pred_horizon bili pred ## 1: 1 1 1788 1 0.1264442 ## 2: 1 2 1788 1 0.1739727 ## 3: 1 3 1788 1 0.3904517 ## 4: 1 4 1788 1 0.2874752 ## 5: 1 5 1788 1 0.4398522 ## --- ## 6896: 25 272 1788 10 0.3076971 ## 6897: 25 273 1788 10 0.4942110 ## 6898: 25 274 1788 10 0.6407498 ## 6899: 25 275 1788 10 0.3871298 ## 6900: 25 276 1788 10 0.6479179"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF partial dependence — orsf_pd_oob","title":"ORSF partial dependence — orsf_pd_oob","text":"Compute partial dependence ORSF model. Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model's prediction. can compute partial dependence three ways using random forest: using -bag predictions training data using --bag predictions training data using predictions new set data See examples details","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF partial dependence — orsf_pd_oob","text":"","code":"orsf_pd_oob( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... ) orsf_pd_inb( object, pred_spec, pred_horizon = NULL, pred_type = NULL, expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... ) orsf_pd_new( object, pred_spec, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", expand_grid = TRUE, prob_values = c(0.025, 0.5, 0.975), prob_labels = c(\"lwr\", \"medn\", \"upr\"), boundary_checks = TRUE, n_thread = 1, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF partial dependence — orsf_pd_oob","text":"object (orsf_fit) trained oblique random survival forest (see orsf). pred_spec (named list data.frame). pred_spec named list, item list vector values used points partial dependence function. name item list indicate variable modified take corresponding values. pred_spec data.frame, columns indicate variable names, values indicate variable values, partial dependence computed using inputs row. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction expand_grid (logical) TRUE, partial dependence computed possible combinations inputs pred_spec. FALSE, partial dependence computed variable pred_spec, separately. prob_values (numeric) vector values 0 1, indicating quantiles used summarize partial dependence values set inputs. prob_values length prob_labels. quantiles calculated based predictions object set values indicated pred_spec. prob_labels (character) vector labels length prob_values, label indicating corresponding value prob_values labelled summarized outputs. prob_labels length prob_values. boundary_checks (logical) TRUE, pred_spec checked make sure requested values 10th 90th percentile object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. ... arguments passed methods (currently used). new_data data.frame, tibble, data.table compute predictions . na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'omit' : rows new_data incomplete data dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF partial dependence — orsf_pd_oob","text":"data.table containing partial dependence values specified variable(s) specified prediction horizon(s).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF partial dependence — orsf_pd_oob","text":"Partial dependence number known limitations assumptions users aware (see Hooker, 2021). particular, partial dependence less intuitive >2 predictors examined jointly, assumed feature(s) partial dependence computed correlated features (likely true many cases). Accumulated local effect plots can used (see ) case feature independence valid assumption.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF partial dependence — orsf_pd_oob","text":"Begin fitting ORSF ensemble:","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"three-ways-to-compute-pd-and-ice","dir":"Reference","previous_headings":"","what":"Three ways to compute PD and ICE","title":"ORSF partial dependence — orsf_pd_oob","text":"can compute partial dependence ICE three ways aorsf: using -bag predictions training data using --bag predictions training data using predictions new set data -bag partial dependence indicates relationships model learned training. helpful goal interpret model. --bag partial dependence indicates relationships model learned training using --bag data simulates application model new data. want test model’s reliability fairness new data don’t access large testing set. new data partial dependence shows model predicts outcomes observations seen. helpful want test model’s reliability fairness.","code":"pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5)) pd_train ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7932390 0.2177461 0.9060625 0.9816153 ## 2: 1826.25 2 0.7642403 0.1988035 0.8717127 0.9710504 ## 3: 1826.25 3 0.7240284 0.1770122 0.8303501 0.9480047 ## 4: 1826.25 4 0.6744615 0.1615326 0.7599508 0.9088882 ## 5: 1826.25 5 0.6313355 0.1553589 0.7152580 0.8658139 pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_train ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7840481 0.2727537 0.8694252 0.9809905 ## 2: 1826.25 2 0.7549406 0.2525478 0.8333524 0.9693362 ## 3: 1826.25 3 0.7158234 0.2364582 0.7890158 0.9461864 ## 4: 1826.25 4 0.6656823 0.2260407 0.7158336 0.9151153 ## 5: 1826.25 5 0.6225353 0.2071656 0.6734005 0.8681677 pd_test <- orsf_pd_new(fit, new_data = pbc_orsf_test, pred_spec = list(bili = 1:5)) pd_test ## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 0.7524101 0.1868769 0.8121185 0.9803382 ## 2: 1826.25 2 0.7234050 0.1759562 0.7754099 0.9653244 ## 3: 1826.25 3 0.6816975 0.1581292 0.7224945 0.9403449 ## 4: 1826.25 4 0.6339907 0.1467816 0.6598026 0.9000773 ## 5: 1826.25 5 0.5911775 0.1387876 0.6186801 0.8504577"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_pd_oob.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"ORSF partial dependence — orsf_pd_oob","text":"Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires least One Model, Free Variable Importance. arXiv e-prints 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":null,"dir":"Reference","previous_headings":"","what":"Scale input data — orsf_scale_cph","title":"Scale input data — orsf_scale_cph","text":"functions exported users may access internal routines used scale inputs orsf_control_cph used.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Scale input data — orsf_scale_cph","text":"","code":"orsf_scale_cph(x_mat, w_vec = NULL) orsf_unscale_cph(x_mat)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Scale input data — orsf_scale_cph","text":"x_mat (numeric matrix) matrix values scaled unscaled. Note orsf_unscale_cph accept x_mat inputs attribute containing transform values, added automatically orsf_scale_cph. w_vec (numeric vector) optional vector weights. weights supplied (default), observations equally weighted. supplied, w_vec must length equal nrow(x_mat).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Scale input data — orsf_scale_cph","text":"scaled unscaled x_mat.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Scale input data — orsf_scale_cph","text":"data transformed first subtracting mean multiplying scale. inverse transform can completed using orsf_unscale_cph dividing column corresponding scale adding mean. values means scales stored attribute output returned orsf_scale_cph (see examples)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_scale_cph.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Scale input data — orsf_scale_cph","text":"","code":"x_mat <- as.matrix(pbc_orsf[, c('bili', 'age', 'protime')]) head(x_mat) #> bili age protime #> 1 14.5 58.76523 12.2 #> 2 1.1 56.44627 10.6 #> 3 1.4 70.07255 12.0 #> 4 1.8 54.74059 10.3 #> 5 3.4 38.10541 10.9 #> 7 1.0 55.53457 9.7 x_scaled <- orsf_scale_cph(x_mat) head(x_scaled) #> bili age protime #> [1,] 3.77308887 1.0412574 1.9694656 #> [2,] -0.75476469 0.7719344 -0.1822316 #> [3,] -0.65339483 2.3544852 1.7005035 #> [4,] -0.51823502 0.5738373 -0.5856748 #> [5,] 0.02240421 -1.3581657 0.2212116 #> [6,] -0.78855464 0.6660494 -1.3925613 attributes(x_scaled) # note the transforms attribute #> $dim #> [1] 276 3 #> #> $dimnames #> $dimnames[[1]] #> NULL #> #> $dimnames[[2]] #> [1] \"bili\" \"age\" \"protime\" #> #> #> $transforms #> mean scale #> [1,] 3.333696 0.3378995 #> [2,] 49.799661 0.1161396 #> [3,] 10.735507 1.3448108 #> x_unscaled <- orsf_unscale_cph(x_scaled) head(x_unscaled) #> bili age protime #> [1,] 14.5 58.76523 12.2 #> [2,] 1.1 56.44627 10.6 #> [3,] 1.4 70.07255 12.0 #> [4,] 1.8 54.74059 10.3 #> [5,] 3.4 38.10541 10.9 #> [6,] 1.0 55.53457 9.7 # numeric difference in x_mat and x_unscaled should be practically 0 max(abs(x_mat - x_unscaled)) #> [1] 3.552714e-15"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF summary; univariate — orsf_summarize_uni","title":"ORSF summary; univariate — orsf_summarize_uni","text":"Summarize univariate information ORSF object","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF summary; univariate — orsf_summarize_uni","text":"","code":"orsf_summarize_uni( object, n_variables = NULL, pred_horizon = NULL, pred_type = NULL, importance = NULL, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF summary; univariate — orsf_summarize_uni","text":"object (orsf_fit) trained oblique random survival forest (see orsf). n_variables (integer) many variables summarized? Setting input lower number reduce computation time. pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction importance (character) Indicate method variable importance: 'none': variable importance computed. 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance details methods, see orsf_vi. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF summary; univariate — orsf_summarize_uni","text":"object class 'orsf_summary', includes data importance individual predictors. expected values predictions specific values predictors.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF summary; univariate — orsf_summarize_uni","text":"pred_horizon left unspecified, median value time--event variable object's training data used. recommended always specify prediction horizon, median time may especially meaningful horizon compute predicted risk values . object already variable importance values, can safely bypass computation variable importance function setting importance = 'none'.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_summarize_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ORSF summary; univariate — orsf_summarize_uni","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 25) # since anova importance was used to make object, it is also # used for ranking variables in the summary, unless we specify # a different type of importance orsf_summarize_uni(object, n_variables = 3) #> #> -- bili (VI Rank: 1) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7777513 0.8532309 0.6835249 0.9682600 #> 1.40 0.7582240 0.8409578 0.6680678 0.9512311 #> 3.52 0.6225496 0.6849123 0.4373820 0.8388405 #> #> -- ascites (VI Rank: 2) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6939223 0.7974914 0.4696991 0.9522116 #> 1 0.5588307 0.6131822 0.3830100 0.7809927 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6982167 0.8062842 0.4935759 0.9528082 #> 0.5 0.6074940 0.6611793 0.3983464 0.8624686 #> 1 0.6000213 0.6666069 0.3625468 0.8498958 #> #> Predicted survival at time t = 1788 for top 3 predictors # if we want to summarize object according to variables # ranked by negation importance, we can compute negation # importance within orsf_summarize_uni() as follows: orsf_summarize_uni(object, n_variables = 3, importance = 'negate') #> #> -- bili (VI Rank: 1) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7777513 0.8532309 0.6835249 0.9682600 #> 1.40 0.7582240 0.8409578 0.6680678 0.9512311 #> 3.52 0.6225496 0.6849123 0.4373820 0.8388405 #> #> -- copper (VI Rank: 2) ------------------------ #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 42.8 0.7284832 0.8305973 0.5811309 0.9723858 #> 74.0 0.7079694 0.8118548 0.5308118 0.9430437 #> 129 0.6578487 0.7345022 0.4382455 0.9122239 #> #> -- age (VI Rank: 3) --------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 41.5 0.7385443 0.8644422 0.5752826 0.9697622 #> 49.7 0.7081352 0.8355585 0.4877352 0.9656811 #> 56.6 0.6674529 0.7596919 0.4288566 0.9388434 #> #> Predicted survival at time t = 1788 for top 3 predictors"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate training time — orsf_time_to_train","title":"Estimate training time — orsf_time_to_train","text":"Estimate training time","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate training time — orsf_time_to_train","text":"","code":"orsf_time_to_train(object, n_tree_subset = 50)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate training time — orsf_time_to_train","text":"object untrained aorsf object n_tree_subset (integer) many trees fit order estimate time needed train object. default value 50, usually gives good enough approximation.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate training time — orsf_time_to_train","text":"difftime object.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_time_to_train.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate training time — orsf_time_to_train","text":"","code":"# specify but do not train the model by setting no_fit = TRUE. object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 500, no_fit = TRUE) # grow 50 trees to approximate the time it will take to grow 500 trees time_estimated <- orsf_time_to_train(object, n_tree_subset = 50) print(time_estimated) #> Time difference of 0.1957631 secs # let's see how close the approximation was time_true_start <- Sys.time() fit <- orsf_train(object) time_true_stop <- Sys.time() time_true <- time_true_stop - time_true_start print(time_true) #> Time difference of 0.2327383 secs # error abs(time_true - time_estimated) #> Time difference of 0.03697515 secs"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":null,"dir":"Reference","previous_headings":"","what":"ORSF variable importance — orsf_vi","title":"ORSF variable importance — orsf_vi","text":"Estimate importance individual variables using oblique random survival forests.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ORSF variable importance — orsf_vi","text":"","code":"orsf_vi( object, group_factors = TRUE, importance = NULL, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_negate( object, group_factors = TRUE, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_permute( object, group_factors = TRUE, oobag_fun = NULL, n_thread = 1, verbose_progress = FALSE, ... ) orsf_vi_anova(object, group_factors = TRUE, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ORSF variable importance — orsf_vi","text":"object (orsf_fit) trained oblique random survival forest (see orsf). group_factors (logical) TRUE, importance factor variables reported overall aggregating importance individual levels factor. FALSE, importance individual factor levels returned. importance (character) Indicate method variable importance: 'anova': compute analysis variance (ANOVA) importance 'negate': compute negation importance 'permute': compute permutation importance oobag_fun (function) used evaluating --bag prediction accuracy negating coefficients (importance = 'negate') permuting values predictor (importance = 'permute') oobag_fun = NULL (default), evaluation statistic selected based tree type survival: Harrell's C-statistic (1982) classification: Area underneath ROC curve (AUC-ROC) regression: Traditional prediction R-squared use oobag_fun note following: oobag_fun three inputs: y_mat, w_vec, s_vec survival trees, y_mat two column matrix first column named 'time' second named 'status'. classification trees, y_mat matrix number columns = number distinct classes outcome. regression, y_mat matrix one column. s_vec numeric vector containing predictions oobag_fun return numeric output length 1 oobag_fun used created object initial value --bag prediction accuracy consistent values computed variable importance estimated. details, see --bag vignette. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"ORSF variable importance — orsf_vi","text":"orsf_vi functions return named numeric vector. Names vector predictor variables used object Values vector estimated importance given predictor. returned vector sorted highest lowest value, higher values indicating higher importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ORSF variable importance — orsf_vi","text":"orsf_fit object fitted importance = 'anova', 'negate', 'permute', output vector importance values based requested type importance. However, may still want call orsf_vi() output want group factor levels one overall importance value. orsf_vi() general purpose function extract compute variable importance estimates 'orsf_fit' object (see orsf). orsf_vi_negate(), orsf_vi_permute(), orsf_vi_anova() wrappers orsf_vi(). way functions work depends whether object given already variable importance estimates (see examples).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"variable-importance-methods","dir":"Reference","previous_headings":"","what":"Variable importance methods","title":"ORSF variable importance — orsf_vi","text":"negation importance: variable assessed separately multiplying variable's coefficients -1 determining much model's performance changes. worse model's performance negating coefficients given variable, important variable. technique promising b/c require permutation emphasizes variables larger coefficients linear combinations, also relatively new studied much permutation importance. See Jaeger, (2023) details technique. permutation importance: variable assessed separately randomly permuting variable's values determining much model's performance changes. worse model's performance permuting values given variable, important variable. technique flexible, intuitive, frequently used. also several known limitations analysis variance (ANOVA) importance: p-value computed coefficient linear combination variables decision tree. Importance individual predictor variable proportion times p-value coefficient < 0.01. technique efficient computationally, may effective permutation negation terms selecting signal noise variables. See Menze, 2011 details technique.","code":""},{"path":[]},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"anova-importance","dir":"Reference","previous_headings":"","what":"ANOVA importance","title":"ORSF variable importance — orsf_vi","text":"default variable importance technique, ANOVA, calculated fit ORSF ensemble. ANOVA default fast, may decisive permutation negation techniques variable selection.","code":"fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id) fit ## ---------- Oblique random survival forest ## ## Linear combinations: Accelerated Cox regression ## N observations: 276 ## N events: 111 ## N trees: 500 ## N predictors total: 17 ## N predictors per node: 5 ## Average leaves per tree: 21.114 ## Min observations in leaf: 5 ## Min events in leaf: 1 ## OOB stat value: 0.84 ## OOB stat type: Harrell's C-index ## Variable importance: anova ## ## -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"raw-vi-values","dir":"Reference","previous_headings":"","what":"Raw VI values","title":"ORSF variable importance — orsf_vi","text":"‘raw’ variable importance values can accessed fit object ‘raw’ values factors aggregated single value. Currently one value k-1 levels k level factor. example, can see edema_1 edema_0.5 importance values edema factor variable levels 0, 0.5, 1.","code":"attr(fit, 'importance_values') ## NULL"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"collapse-vi-across-factor-levels","dir":"Reference","previous_headings":"","what":"Collapse VI across factor levels","title":"ORSF variable importance — orsf_vi","text":"get aggregated values across levels factor, access importance element orsf fit: use orsf_vi() group_factors set TRUE (default) Note can make default returned importance values ungrouped setting group_factors FALSE orsf_vi functions orsf function.","code":"fit$importance ## ascites bili edema copper age albumin protime ## 0.49306931 0.40601166 0.30339953 0.26839623 0.26685796 0.25804901 0.22825024 ## chol stage ast spiders hepato sex trig ## 0.19290780 0.18941048 0.17435648 0.16943522 0.15678670 0.13905325 0.12965799 ## alk.phos platelet trt ## 0.11338661 0.09012876 0.06778770 orsf_vi(fit) ## ascites bili edema copper age albumin protime ## 0.49306931 0.40601166 0.30339953 0.26839623 0.26685796 0.25804901 0.22825024 ## chol stage ast spiders hepato sex trig ## 0.19290780 0.18941048 0.17435648 0.16943522 0.15678670 0.13905325 0.12965799 ## alk.phos platelet trt ## 0.11338661 0.09012876 0.06778770"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"add-vi-to-an-orsf","dir":"Reference","previous_headings":"","what":"Add VI to an ORSF","title":"ORSF variable importance — orsf_vi","text":"can fit ORSF without VI, add VI later","code":"fit_no_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'none') # Note: you can't call orsf_vi_anova() on fit_no_vi because anova # VI can only be computed while the forest is being grown. orsf_vi_negate(fit_no_vi) ## bili copper sex protime albumin age ## 0.122344895 0.047850279 0.035986359 0.023711711 0.021831451 0.021503160 ## stage ascites chol ast spiders hepato ## 0.019718835 0.012550534 0.011115307 0.009845811 0.007601474 0.007055077 ## edema trt alk.phos trig platelet ## 0.006411580 0.003666224 0.002388178 0.001156845 -0.001214167 orsf_vi_permute(fit_no_vi) ## bili copper protime age ascites ## 5.513908e-02 2.181846e-02 1.246900e-02 1.192659e-02 1.176139e-02 ## albumin stage chol spiders edema ## 1.175554e-02 9.479348e-03 6.215674e-03 5.752179e-03 4.960035e-03 ## ast hepato sex trig alk.phos ## 4.647971e-03 3.594325e-03 2.477936e-03 1.162558e-03 6.778008e-05 ## platelet trt ## -1.132546e-03 -1.376816e-03"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"orsf-and-vi-all-at-once","dir":"Reference","previous_headings":"","what":"ORSF and VI all at once","title":"ORSF variable importance — orsf_vi","text":"fit ORSF compute vi time can still get negation VI fit, needs computed","code":"fit_permute_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'permute') # get the vi instantly (i.e., it doesn't need to be computed again) orsf_vi_permute(fit_permute_vi) ## bili copper protime albumin age ## 0.0582460338 0.0255992039 0.0130100780 0.0129532316 0.0121027391 ## ascites stage chol ast edema ## 0.0119289124 0.0084185175 0.0071302967 0.0053592731 0.0051471990 ## spiders hepato sex trig alk.phos ## 0.0046418826 0.0036776097 0.0026334550 0.0024978806 0.0013078222 ## platelet trt ## 0.0003504423 -0.0013892173 orsf_vi_negate(fit_permute_vi) ## bili copper sex protime age albumin ## 0.1259391254 0.0507141085 0.0363834330 0.0235136073 0.0233592840 0.0225371677 ## stage chol ascites ast spiders edema ## 0.0211978251 0.0141956334 0.0141890702 0.0108977272 0.0073762768 0.0070333453 ## hepato alk.phos trig trt platelet ## 0.0050661672 0.0048879157 0.0044980321 0.0039418881 0.0007189274"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vi.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"ORSF variable importance — orsf_vi","text":"Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating Yield Medical Tests. JAMA 1982; 247(18):2543-2546. DOI: 10.1001/jama.1982.03320430047030 Breiman L. Random forests. Machine learning 2001 Oct; 45(1):5-32. DOI: 10.1023/:1010933404324 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. oblique random forests. Joint European Conference Machine Learning Knowledge Discovery Databases 2011 Sep 4; pp. 453-469. DOI: 10.1007/978-3-642-23783-6_29 Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey , Pajewski NM. Accelerated interpretable oblique random survival forests. Journal Computational Graphical Statistics Published online 08 Aug 2023. DOI: 10.1080/10618600.2023.2231048","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":null,"dir":"Reference","previous_headings":"","what":"Variable selection — orsf_vs","title":"Variable selection — orsf_vs","text":"Variable selection","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Variable selection — orsf_vs","text":"","code":"orsf_vs(object, n_predictor_min = 3, verbose_progress = FALSE)"},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Variable selection — orsf_vs","text":"object (orsf_fit) trained oblique random survival forest (see orsf). n_predictor_min (integer) minimum number predictors allowed verbose_progress (logical) implemented yet. progress printed console?","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Variable selection — orsf_vs","text":"data.table four columns: n_predictors: number predictors used stat_value: --bag statistic predictors_included: names predictors included predictor_dropped: predictor selected dropped","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Variable selection — orsf_vs","text":"tree_seeds specified object successive run orsf evaluated --bag samples initial run.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/orsf_vs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Variable selection — orsf_vs","text":"","code":"object <- orsf(formula = time + status ~ ., data = pbc_orsf, n_tree = 25, importance = 'anova') orsf_vs(object, n_predictor_min = 15) #> n_predictors stat_value predictors_included #> 1: 15 0.8355123 age,sex_f,ascites_1,hepato_1,spiders_1,edema_0.5,... #> 2: 16 0.8367102 id,age,sex_f,ascites_1,hepato_1,spiders_1,... #> 3: 17 0.8344185 id,age,sex_f,ascites_1,hepato_1,spiders_1,... #> 4: 18 0.8323350 id,trt_placebo,age,sex_f,ascites_1,hepato_1,... #> predictor_dropped #> 1: stage #> 2: id #> 3: platelet #> 4: trt_placebo"},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"data light modification survival::pbc data. modifications :","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"","code":"pbc_orsf"},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"data frame 276 rows 20 variables: id case number time number days registration earlier death, transplantion, study analysis July, 1986 status status endpoint, 0 censored transplant, 1 dead trt randomized treatment group: D-penicillmain placebo age years sex m/f ascites presence ascites hepato presence hepatomegaly enlarged liver spiders blood vessel malformations skin edema 0 edema, 0.5 untreated successfully treated, 1 edema despite diuretic therapy bili serum bilirubin (mg/dl) chol serum cholesterol (mg/dl) albumin serum albumin (g/dl) copper urine copper (ug/day) alk.phos alkaline phosphotase (U/liter) ast aspartate aminotransferase, called SGOT (U/ml) trig triglycerides (mg/dl) platelet platelet count protime standardized blood clotting time stage histologic stage disease (needs biopsy)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"T Therneau P Grambsch (2000), Modeling Survival Data: Extending Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/pbc_orsf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Mayo Clinic Primary Biliary Cholangitis Data — pbc_orsf","text":"removed rows missing data converted status 0 censor transplant, 1 dead converted stage ordered factor. converted trt, ascites, hepato, spiders, edema factors.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":null,"dir":"Reference","previous_headings":"","what":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"data copied lightly modified penguins data palmerpenguins R package. modification removal rows missing data. data include measurements penguin species, island Palmer Archipelago, size (flipper length, body mass, bill dimensions), sex.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"","code":"penguins_orsf"},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"tibble 333 rows 8 variables: species factor denoting penguin species (Adélie, Chinstrap Gentoo) island factor denoting island Palmer Archipelago, Antarctica (Biscoe, Dream Torgersen) bill_length_mm number denoting bill length (millimeters) bill_depth_mm number denoting bill depth (millimeters) flipper_length_mm integer denoting flipper length (millimeters) body_mass_g integer denoting body mass (grams) sex factor denoting penguin sex (female, male) year integer denoting study year (2007, 2008, 2009)","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/penguins_orsf.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Size measurements for adult foraging penguins near Palmer Station, Antarctica — penguins_orsf","text":"Adélie penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Adélie penguins (Pygoscelis adeliae) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f Gentoo penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Gentoo penguin (Pygoscelis papua) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689 Chinstrap penguins: Palmer Station Antarctica LTER K. Gorman. 2020. Structural size measurements isotopic signatures foraging among adult male female Chinstrap penguin (Pygoscelis antarcticus) nesting along Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative. doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e Originally published : Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism Environmental Variability within Community Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute predictions using ORSF — predict.ObliqueForest","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"Predicted risk, survival, hazard, mortality ORSF model.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"","code":"# S3 method for ObliqueForest predict( object, new_data, pred_horizon = NULL, pred_type = NULL, na_action = \"fail\", boundary_checks = TRUE, n_thread = 1, verbose_progress = FALSE, pred_aggregate = TRUE, ... )"},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"object (orsf_fit) trained oblique random survival forest (see orsf). new_data data.frame, tibble, data.table compute predictions . pred_horizon (double) value vector indicating time(s) predictions calibrated . E.g., predicting risk incident heart failure within next 10 years, pred_horizon = 10. pred_horizon can NULL pred_type 'mort', since mortality predictions aggregated event times pred_type (character) type predictions compute. Valid options 'risk' : probability event pred_horizon. 'surv' : 1 - risk. 'chf': cumulative hazard function 'mort': mortality prediction na_action (character) happen new_data contains missing values (.e., NA values). Valid options : 'fail' : error thrown new_data contains NA values 'pass' : output NA rows new_data 1 NA value predictors used object 'omit' : rows new_data incomplete data dropped 'impute_meanmode' : missing values continuous categorical variables new_data imputed using mean mode, respectively. clarify, mean mode used impute missing values training data object, new_data. boundary_checks (logical) TRUE, pred_horizon checked make sure requested values less maximum observed time object's training data. FALSE, checks skipped. n_thread (integer) number threads use computing predictions. Default one thread. use maximum number threads system provides concurrent execution, set n_thread = 0. verbose_progress (logical) TRUE, progress messages printed console. FALSE (default), nothing printed. pred_aggregate (logical) TRUE (default), predictions aggregated trees taking mean. FALSE, returned output contain one row per observation one column tree. length pred_horizon two pred_aggregate FALSE, result list matrices, 'th item list corresponding 'th value pred_horizon. ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"matrix predictions. Column j matrix corresponds value j pred_horizon. Row matrix corresponds row new_data.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"new_data must columns equivalent types data used train object. Also, factors new_data must levels data used train object. pred_horizon values exceed maximum follow-time object's training data, truly want , set boundary_checks = FALSE can use pred_horizon large want. Note predictions beyond maximum follow-time object's training data equal predictions maximum follow-time, aorsf estimate survival beyond maximum observed time. unspecified, pred_horizon may automatically specified value used oobag_pred_horizon object created (see orsf).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/predict.ObliqueForest.html","id":"examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute predictions using ORSF — predict.ObliqueForest","text":"Begin fitting ORSF ensemble: Predict risk, survival, cumulative hazard one several times: Predict mortality, defined number events forest’s population observations characteristics like current observation. type prediction require specify prediction horizon","code":"library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5) # predicted risk, the default predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'risk', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.45965512 0.73309199 0.89715078 ## [2,] 0.03235764 0.09091330 0.18045864 ## [3,] 0.12091603 0.25919883 0.39403239 ## [4,] 0.01488893 0.03745896 0.07571412 ## [5,] 0.01279842 0.02623832 0.06015808 # predicted survival, i.e., 1 - risk predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'surv', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.5403449 0.2669080 0.1028492 ## [2,] 0.9676424 0.9090867 0.8195414 ## [3,] 0.8790840 0.7408012 0.6059676 ## [4,] 0.9851111 0.9625410 0.9242859 ## [5,] 0.9872016 0.9737617 0.9398419 # predicted cumulative hazard function # (expected number of events for person i at time j) predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'chf', pred_horizon = c(500, 1000, 1500)) ## [,1] [,2] [,3] ## [1,] 0.65381651 1.28606246 1.75476570 ## [2,] 0.03531788 0.10967272 0.24697387 ## [3,] 0.15371784 0.36989220 0.65462524 ## [4,] 0.01549537 0.04229610 0.09352493 ## [5,] 0.01290261 0.02687956 0.06916273 predict(fit, new_data = pbc_orsf_test[1:5, ], pred_type = 'mort') ## [,1] ## [1,] 79.795533 ## [2,] 22.393743 ## [3,] 38.749709 ## [4,] 13.552788 ## [5,] 9.984989"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect your ORSF model — print.ObliqueForest","title":"Inspect your ORSF model — print.ObliqueForest","text":"Printing ORSF model tells : Linear combinations: identified? N observations: Number rows training data N events: Number events training data N trees: Number trees forest N predictors total: Total number columns predictor matrix N predictors per node: Number variables used linear combinations Average leaves per tree: proxy depth trees Min observations leaf: See leaf_min_obs orsf Min events leaf: See leaf_min_events orsf OOB stat value: --bag error fitting trees OOB stat type: --bag error computed? Variable importance: variable importance computed?","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect your ORSF model — print.ObliqueForest","text":"","code":"# S3 method for ObliqueForest print(x, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect your ORSF model — print.ObliqueForest","text":"x (orsf_fit) oblique random survival forest (ORSF; see orsf). ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect your ORSF model — print.ObliqueForest","text":"x, invisibly.","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.ObliqueForest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect your ORSF model — print.ObliqueForest","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id, n_tree = 5) print(object) #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 22 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.79 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> -----------------------------------------"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":null,"dir":"Reference","previous_headings":"","what":"Print ORSF summary — print.orsf_summary_uni","title":"Print ORSF summary — print.orsf_summary_uni","text":"Print ORSF summary","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print ORSF summary — print.orsf_summary_uni","text":"","code":"# S3 method for orsf_summary_uni print(x, n_variables = NULL, ...)"},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print ORSF summary — print.orsf_summary_uni","text":"x object class 'orsf_summary' n_variables number variables print ... arguments passed methods (currently used).","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Print ORSF summary — print.orsf_summary_uni","text":"invisibly, x","code":""},{"path":"https://bcjaeger.github.io/aorsf/reference/print.orsf_summary_uni.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Print ORSF summary — print.orsf_summary_uni","text":"","code":"object <- orsf(pbc_orsf, Surv(time, status) ~ . - id) smry <- orsf_summarize_uni(object, n_variables = 3) print(smry) #> #> -- ascites (VI Rank: 1) ----------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6938291 0.7966843 0.4904799 0.9324773 #> 1 0.5405888 0.6094190 0.3515579 0.7427231 #> #> -- bili (VI Rank: 2) -------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0.80 0.7608539 0.8443806 0.6311260 0.9380968 #> 1.40 0.7400537 0.8204494 0.5978744 0.9198812 #> 3.52 0.6338126 0.7007166 0.4658716 0.8327246 #> #> -- edema (VI Rank: 3) ------------------------- #> #> |-------------- Survival --------------| #> Value Mean Median 25th % 75th % #> 0 0.6998313 0.8067008 0.5026885 0.9348496 #> 0.5 0.5901635 0.6295332 0.3767099 0.7948707 #> 1 0.5891139 0.6247675 0.3836066 0.8193277 #> #> Predicted survival at time t = 1788 for top 3 predictors"},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-012-unreleased","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.2 (unreleased)","title":"aorsf 0.1.2 (unreleased)","text":"Added orsf_control functions classification, regression, survival (https://github.com/ropensci/aorsf/pull/25). optimization implemented matrix multiplication prediction (https://github.com/ropensci/aorsf/pull/20)","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-011","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.1","title":"aorsf 0.1.1","text":"CRAN release: 2023-10-26 fixed uninitialized value pd_type","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-010","dir":"Changelog","previous_headings":"","what":"aorsf 0.1.0","title":"aorsf 0.1.0","text":"CRAN release: 2023-10-13 Re-worked internal C++ routines following design ranger. Re-worked progress printed console verbose_progress TRUE, following design ranger. Messages now indicate action taken, % complete, approximate time finishing action. Improved variable importance, following design ranger. Importance now computed tree--tree instead aggregate. Additionally, mortality type prediction used importance survival trees, since mortality depend pred_horizon. Allowed multi-threading performed orsf(), predict.orsf_fit(), functions orsf_vi() orsf_pd() family. Allowed sampling without replacement sampling specific fraction observations orsf() Included Harrell’s C-statistic option assessing goodness splits growing trees. Fixed issue uninformative error message occur pred_horizon > max(time) orsf_summarize_uni. Thanks @JyHao1 @DustinMLong finding !","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-007","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.7","title":"aorsf 0.0.7","text":"CRAN release: 2023-01-12 Additional changes internal testing avoid problems ATLAS","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-006","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.6","title":"aorsf 0.0.6","text":"CRAN release: 2023-01-06 Minor fix internal tests failing run ATLAS","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-005","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.5","title":"aorsf 0.0.5","text":"CRAN release: 2022-12-14 orsf() longer throws errors warnings try give single predictor. note added documentation details ?orsf explains using single predictor orsf() somewhat useless. done resolve https://github.com/mlr-org/mlr3extralearners/issues/259. predict.orsf_fit now accepts pred_horizon = 0 returns sensible values. Thanks @mattwarkentin feature request. added function perform variable selection, orsf_vs(). Made variable importance consistent respect group_factors. Originally, output orsf ungrouped VI values orsf_vi grouped values. update, orsf defaults grouped values. ungrouped values can still recovered. Fixed issue orsf_pd functions output data returned original scale.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-004","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.4","title":"aorsf 0.0.4","text":"CRAN release: 2022-11-07 orsf formulas now accepts Surv objects (see https://github.com/ropensci/aorsf/issues/11) Added verbose_progress input orsf, prints messages console indicating progress. Allowance missing values orsf. Mean mode imputation performed observations missing data. values can also used impute new data missing values. Centering scaling predictors now done prior growing forest.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-003","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.3","title":"aorsf 0.0.3","text":"CRAN release: 2022-10-09 Included rOpenSci reviewers Christopher Jackson, Marvin N Wright, Lukas Burk DESCRIPTION reviewers. Thank ! Added clarification docs pros/cons different variable importance techniques Added regression tests aorsf versus obliqueRSF (similar) Additional support tests functions long right hand sides Updated --bag vignette appropriate custom functions. Allow status values input data general, .e., just 0 1. Allow missing values predict functions, including partial dependence.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-002","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.2","title":"aorsf 0.0.2","text":"CRAN release: 2022-09-05 Modified unit tests compatibility extra checks run CRAN.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-001","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.1","title":"aorsf 0.0.1","text":"CRAN release: 2022-08-23 Added orsf_control_custom(), allows users submit custom functions identifying linear combinations inputs growing oblique decision trees. Added weights input orsf, allowing users fit orsf specific data training set. Added chf mort options predict.orsf_fit(). Mortality predictions fully implemented yet - supported partial dependence --bag error estimates. features added future update.","code":""},{"path":"https://bcjaeger.github.io/aorsf/news/index.html","id":"aorsf-0009000","dir":"Changelog","previous_headings":"","what":"aorsf 0.0.0.9000","title":"aorsf 0.0.0.9000","text":"Core features implemented: fit, interpret, predict using oblique random survival forests. Vignettes + Readme covering usage core features. Website hosted GitHub pages, managed pkgdown.","code":""}]