diff --git a/articles/aorsf.html b/articles/aorsf.html index 50f00f1b..e5c2edee 100644 --- a/articles/aorsf.html +++ b/articles/aorsf.html @@ -173,13 +173,13 @@

Variable importance orsf_vi_negate(orsf_fit) #> bili copper protime stage sex -#> 1.129201e-01 5.143202e-02 2.985467e-02 2.913153e-02 2.648666e-02 +#> 1.129815e-01 5.195790e-02 2.979064e-02 2.948335e-02 2.646991e-02 #> age albumin ascites ast chol -#> 2.257257e-02 2.222867e-02 1.560638e-02 1.231634e-02 1.203531e-02 +#> 2.274087e-02 2.194915e-02 1.566330e-02 1.225677e-02 1.211119e-02 #> edema trt hepato spiders trig -#> 9.463853e-03 7.772744e-03 6.663188e-03 6.162035e-03 5.138559e-03 +#> 9.582936e-03 7.903782e-03 6.753772e-03 6.166109e-03 5.264650e-03 #> alk.phos platelet -#> 3.549245e-03 -6.782850e-06 +#> 3.522158e-03 -1.435386e-05
  • You can also compute variable importance using permutation, a @@ -188,13 +188,13 @@

    Variable importance orsf_vi_permute(orsf_fit) #> bili copper protime albumin age -#> 0.0507342790 0.0239177369 0.0163535749 0.0125096251 0.0119269625 +#> 0.0511008531 0.0240482013 0.0163342275 0.0124345455 0.0119496253 #> stage ascites ast edema chol -#> 0.0115229977 0.0104655581 0.0077846459 0.0058569776 0.0048923838 -#> spiders hepato sex trig alk.phos -#> 0.0035124258 0.0034740982 0.0021089828 0.0019406658 0.0011960153 +#> 0.0118205689 0.0104882375 0.0076936637 0.0060404151 0.0049699614 +#> hepato spiders sex trig alk.phos +#> 0.0035818563 0.0034644598 0.0021725738 0.0018539837 0.0012967692 #> platelet trt -#> -0.0004113343 -0.0008678193 +#> -0.0003284245 -0.0009058986

  • A faster alternative to permutation and negation importance is @@ -204,11 +204,11 @@

    Variable importance orsf_vi_anova(orsf_fit) #> ascites edema bili copper albumin age protime spiders -#> 0.4248497 0.2966406 0.2962241 0.2200782 0.2056028 0.2044682 0.1952912 0.1701295 -#> chol ast stage hepato sex trig alk.phos platelet -#> 0.1698671 0.1575173 0.1568678 0.1479081 0.1460362 0.1277078 0.1160302 0.1098039 +#> 0.4244652 0.2965880 0.2955043 0.2192982 0.2059560 0.2034433 0.1946442 0.1717902 +#> chol stage ast hepato sex trig alk.phos platelet +#> 0.1686003 0.1575789 0.1573754 0.1476569 0.1462905 0.1272040 0.1161886 0.1089885 #> trt -#> 0.1094891 +#> 0.1086503

  • @@ -260,7 +260,7 @@

    What about the original ORSF?) ) #> user system elapsed -#> 5.129 0.000 5.130 +#> 6.498 0.020 6.523 # and how long it takes to fit 50 cph trees print( @@ -272,11 +272,11 @@

    What about the original ORSF?) ) #> user system elapsed -#> 0.053 0.000 0.053 +#> 0.068 0.000 0.067 t1['elapsed'] / t2['elapsed'] #> elapsed -#> 96.79245 +#> 97.35821

    aorsf and other machine learning software diff --git a/articles/oobag.html b/articles/oobag.html index e8fca287..f36a8122 100644 --- a/articles/oobag.html +++ b/articles/oobag.html @@ -138,9 +138,9 @@

    Out-of-bag predictions and error# what is the output from this function? fit$eval_oobag$stat_values #> [,1] -#> [1,] 0.8404084

    +#> [1,] 0.8405646

    The out-of-bag estimate of Harrell’s C-statistic (the default method -to evaluate out-of-bag predictions) is 0.8404084.

    +to evaluate out-of-bag predictions) is 0.8405646.

    Monitoring out-of-bag error @@ -203,7 +203,7 @@

    User-supplied out-of-bag oobag_fun_brier(y_mat = pbc_orsf[,c('time', 'status')], s_vec = fit$pred_oobag) -#> [1] 0.11724

    +#> [1] 0.113018

    Second, you can pass your function into orsf(), and it will be used in place of Harrell’s C-statistic:

    @@ -326,12 +326,12 @@ 

    User-supplied function importance = 'negate') fit_tdep_cstat$importance -#> bili copper sex protime age ascites -#> 0.130946460 0.044500890 0.033850120 0.022515610 0.019551930 0.017677020 -#> stage albumin chol spiders edema ast -#> 0.017561950 0.016692050 0.011163150 0.007158130 0.007008088 0.006360200 -#> trig hepato trt alk.phos platelet -#> 0.005541530 0.004885160 0.002620090 0.001023750 -0.002403190

    +#> bili copper stage sex albumin protime age +#> 0.12277976 0.05474438 0.03624949 0.03600352 0.02799870 0.02613815 0.02258938 +#> ascites ast chol edema spiders hepato trig +#> 0.01396824 0.01370726 0.01291091 0.01011906 0.00679223 0.00659164 0.00615851 +#> platelet trt alk.phos +#> 0.00490489 0.00373202 0.00066513

    Notes diff --git a/articles/oobag_files/figure-html/unnamed-chunk-2-1.png b/articles/oobag_files/figure-html/unnamed-chunk-2-1.png index 4c80a487..07170287 100644 Binary files a/articles/oobag_files/figure-html/unnamed-chunk-2-1.png and b/articles/oobag_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/articles/oobag_files/figure-html/unnamed-chunk-4-1.png b/articles/oobag_files/figure-html/unnamed-chunk-4-1.png index 166abf87..f30c55ba 100644 Binary files a/articles/oobag_files/figure-html/unnamed-chunk-4-1.png and b/articles/oobag_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/articles/oobag_files/figure-html/unnamed-chunk-7-1.png b/articles/oobag_files/figure-html/unnamed-chunk-7-1.png index 3591d935..f5014cb7 100644 Binary files a/articles/oobag_files/figure-html/unnamed-chunk-7-1.png and b/articles/oobag_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/articles/oobag_files/figure-html/unnamed-chunk-8-1.png b/articles/oobag_files/figure-html/unnamed-chunk-8-1.png index 776c15e5..c9d387de 100644 Binary files a/articles/oobag_files/figure-html/unnamed-chunk-8-1.png and b/articles/oobag_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/articles/pd.html b/articles/pd.html index c573bf73..43603ef2 100644 --- a/articles/pd.html +++ b/articles/pd.html @@ -153,11 +153,11 @@

    Three ways to compute PD pd_inb #> pred_horizon bili mean lwr medn upr -#> 1: 1826.25 1 0.2156600 0.02011013 0.09831473 0.8008680 -#> 2: 1826.25 2 0.2577688 0.03774419 0.15474649 0.8221929 -#> 3: 1826.25 3 0.2999502 0.06374133 0.20647008 0.8435691 -#> 4: 1826.25 4 0.3392310 0.08411776 0.25351577 0.8591273 -#> 5: 1826.25 5 0.3697834 0.10610430 0.28239158 0.8696440

    +#> 1: 1826.25 1 0.2154847 0.02028479 0.09620362 0.7999464 +#> 2: 1826.25 2 0.2580146 0.03766695 0.15454947 0.8215570 +#> 3: 1826.25 3 0.3001896 0.06432488 0.20728050 0.8429332 +#> 4: 1826.25 4 0.3394211 0.08427149 0.25388024 0.8601380 +#> 5: 1826.25 5 0.3703022 0.10680098 0.28301801 0.8696998
  • using out-of-bag predictions for the training data

    @@ -166,12 +166,12 @@

    Three ways to compute PDpd_oob <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_oob -#> pred_horizon bili mean lwr medn upr -#> 1: 1826.25 1 0.2151532 0.01827735 0.09723974 0.7980629 -#> 2: 1826.25 2 0.2568631 0.03685066 0.14354244 0.8181867 -#> 3: 1826.25 3 0.2986274 0.05900059 0.20896839 0.8335823 -#> 4: 1826.25 4 0.3386074 0.07902099 0.24601781 0.8482977 -#> 5: 1826.25 5 0.3696689 0.10423692 0.27977338 0.8523756 +#> pred_horizon bili mean lwr medn upr +#> 1: 1826.25 1 0.2151526 0.01835000 0.0961149 0.7980629 +#> 2: 1826.25 2 0.2572420 0.03685020 0.1444598 0.8181867 +#> 3: 1826.25 3 0.2990080 0.05900059 0.2069944 0.8335823 +#> 4: 1826.25 4 0.3388657 0.07887323 0.2434497 0.8486574 +#> 5: 1826.25 5 0.3701697 0.10614495 0.2805791 0.8523756

  • using predictions for a new set of data

    @@ -183,11 +183,11 @@

    Three ways to compute PD pd_test #> pred_horizon bili mean lwr medn upr -#> 1: 1826.25 1 0.2543878 0.02896385 0.1940600 0.8149734 -#> 2: 1826.25 2 0.2954799 0.05000806 0.2471401 0.8323505 -#> 3: 1826.25 3 0.3389602 0.07334663 0.3010648 0.8494444 -#> 4: 1826.25 4 0.3800023 0.10459104 0.3519063 0.8597879 -#> 5: 1826.25 5 0.4119512 0.12113773 0.3895548 0.8693355 +#> 1: 1826.25 1 0.2543006 0.02901386 0.1943949 0.8140307 +#> 2: 1826.25 2 0.2956375 0.05072616 0.2473845 0.8314078 +#> 3: 1826.25 3 0.3389084 0.07453896 0.3032327 0.8485016 +#> 4: 1826.25 4 0.3800621 0.10565022 0.3516712 0.8588451 +#> 5: 1826.25 5 0.4125041 0.12292465 0.3918400 0.8694518

  • in-bag PD indicates relationships that the model has learned during training. This is helpful if your goal is to interpret the @@ -221,8 +221,8 @@

    One variable, one horizon pd_sex #> pred_horizon sex mean lwr medn upr -#> 1: 1826.25 m 0.3556338 0.03685843 0.2388210 0.9403551 -#> 2: 1826.25 f 0.3027138 0.01063007 0.1525368 0.9548015 +#> 1: 1826.25 m 0.3564912 0.03712878 0.2369997 0.9398747 +#> 2: 1826.25 f 0.3035109 0.01053790 0.1568024 0.9545274

    The output shows that the expected predicted mortality risk for men is substantially higher than women at 5 years after baseline.

    @@ -275,13 +275,13 @@

    Multiple variables, marginallypd_two_vars #> pred_horizon variable value level mean lwr medn upr -#> 1: 1826.25 sex NA m 0.3556338 0.03685843 0.2388210 0.9403551 -#> 2: 1826.25 sex NA f 0.3027138 0.01063007 0.1525368 0.9548015 -#> 3: 1826.25 bili 1 <NA> 0.2452129 0.01605591 0.1285905 0.8946964 -#> 4: 1826.25 bili 2 <NA> 0.3013639 0.04076854 0.2001183 0.9148909 -#> 5: 1826.25 bili 3 <NA> 0.3508351 0.05863497 0.2592947 0.9201450 -#> 6: 1826.25 bili 4 <NA> 0.3941197 0.08353114 0.3193513 0.9270573 -#> 7: 1826.25 bili 5 <NA> 0.4286023 0.10893179 0.3624517 0.9263783 +#> 1: 1826.25 sex NA m 0.3564912 0.03712878 0.2369997 0.9398747 +#> 2: 1826.25 sex NA f 0.3035109 0.01053790 0.1568024 0.9545274 +#> 3: 1826.25 bili 1 <NA> 0.2461638 0.01583046 0.1295294 0.8963509 +#> 4: 1826.25 bili 2 <NA> 0.3023356 0.03962917 0.2026023 0.9165164 +#> 5: 1826.25 bili 3 <NA> 0.3519626 0.06060907 0.2635266 0.9220564 +#> 6: 1826.25 bili 4 <NA> 0.3947579 0.08420548 0.3188143 0.9267172 +#> 7: 1826.25 bili 5 <NA> 0.4293114 0.10880143 0.3618061 0.9255556

    Now would it be tedious if you wanted to do this for all the variables? You bet. That’s why we made a function for that. As a bonus, the printed output is sorted from most to least important variables.

    @@ -295,133 +295,133 @@

    Multiple variables, marginally#> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 0.80 0.2376199 0.1175227 0.05112242 0.3796784 -#> 1.4 0.2639813 0.1491127 0.07001711 0.4313834 -#> 3.5 0.3750645 0.2959060 0.16393912 0.5687616 +#> 0.80 0.2384834 0.1161447 0.05083804 0.3819801 +#> 1.4 0.2650961 0.1520876 0.07038703 0.4319252 +#> 3.5 0.3757328 0.2962150 0.16422471 0.5693036 #> #> -- copper (VI Rank: 2) ----------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 43 0.2669529 0.1343897 0.05166615 0.4478705 -#> 74 0.2910106 0.1560732 0.06745016 0.4930575 -#> 129 0.3458559 0.2276424 0.10882113 0.5518172 +#> 43 0.2674680 0.1379648 0.05160351 0.4454352 +#> 74 0.2916843 0.1599240 0.06762896 0.4988722 +#> 129 0.3468187 0.2274629 0.11003062 0.5575685 #> #> -- sex (VI Rank: 3) -------------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> m 0.3556338 0.2388210 0.10652523 0.5860674 -#> f 0.3027138 0.1525368 0.05384831 0.5617088 +#> m 0.3564912 0.2369997 0.10787696 0.5872914 +#> f 0.3035109 0.1568024 0.05434944 0.5526196 #> #> -- stage (VI Rank: 4) ------------------------------------ #> #> |---------------- risk ----------------| -#> Value Mean Median 25th % 75th % -#> 1 0.5826297 0.5285612 0.3996661 0.7538155 -#> 2 0.5826297 0.5285612 0.3996661 0.7538155 -#> 3 0.5826297 0.5285612 0.3996661 0.7538155 -#> 4 0.5826297 0.5285612 0.3996661 0.7538155 +#> Value Mean Median 25th % 75th % +#> 1 0.5847147 0.52941 0.3996593 0.7568616 +#> 2 0.5847147 0.52941 0.3996593 0.7568616 +#> 3 0.5847147 0.52941 0.3996593 0.7568616 +#> 4 0.5847147 0.52941 0.3996593 0.7568616 #> #> -- age (VI Rank: 5) -------------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 42 0.2759266 0.1460143 0.04501236 0.4636428 -#> 50 0.3096357 0.1961013 0.05299745 0.5246981 -#> 57 0.3454927 0.2378101 0.08210052 0.5872838 +#> 42 0.2769998 0.1475089 0.04641807 0.4656813 +#> 50 0.3105800 0.1962849 0.05238234 0.5260957 +#> 57 0.3458171 0.2416166 0.08073215 0.5838783 #> #> -- protime (VI Rank: 6) ---------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 10 0.2863813 0.1552936 0.05388682 0.5048504 -#> 11 0.3040031 0.1631759 0.05897970 0.5387056 -#> 11 0.3268747 0.1901532 0.06986700 0.5837946 +#> 10 0.2868796 0.1565407 0.05348739 0.5082739 +#> 11 0.3049211 0.1630376 0.05660416 0.5398259 +#> 11 0.3279135 0.1941268 0.07126535 0.5817064 #> #> -- albumin (VI Rank: 7) ---------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 3.3 0.3262871 0.1903782 0.06054024 0.6108546 -#> 3.5 0.3036352 0.1556788 0.05654525 0.5458010 -#> 3.8 0.2855665 0.1540199 0.05336999 0.5033495 +#> 3.3 0.3268318 0.1898688 0.06087156 0.6078501 +#> 3.5 0.3040046 0.1564467 0.05753060 0.5400452 +#> 3.8 0.2861643 0.1576060 0.05303617 0.5026478 #> #> -- ascites (VI Rank: 8) ---------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 0 0.3020660 0.1499472 0.05384831 0.5426342 -#> 1 0.4518423 0.3638425 0.24341274 0.6433297 +#> 0 0.3028570 0.1542828 0.05434944 0.5393245 +#> 1 0.4525678 0.3667405 0.24229217 0.6413067 #> -#> -- ast (VI Rank: 9) -------------------------------------- +#> -- chol (VI Rank: 9) ------------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 82 0.2892887 0.1466898 0.05112062 0.5135115 -#> 117 0.3067945 0.1587376 0.05443293 0.5522869 -#> 153 0.3298465 0.1791157 0.07169660 0.6069204 +#> 250 0.2931503 0.1529514 0.04780508 0.5046722 +#> 310 0.3029789 0.1662991 0.05563008 0.5144736 +#> 401 0.3256828 0.1983822 0.07294784 0.5492762 #> -#> -- chol (VI Rank: 10) ------------------------------------ +#> -- ast (VI Rank: 10) ------------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 250 0.2927177 0.1496405 0.04763950 0.5048974 -#> 310 0.3023735 0.1576780 0.05465480 0.5158055 -#> 401 0.3248603 0.1923752 0.07233946 0.5526811 +#> 82 0.2903896 0.1490766 0.05113070 0.5159897 +#> 117 0.3076024 0.1648002 0.05698760 0.5501132 +#> 153 0.3307164 0.1800020 0.07086705 0.6059466 #> #> -- edema (VI Rank: 11) ----------------------------------- #> #> |---------------- risk ----------------| -#> Value Mean Median 25th % 75th % -#> 0 0.2959085 0.1499472 0.05285512 0.5406550 -#> 0.5 0.3637250 0.2550075 0.10860243 0.6126169 -#> 1 0.4575163 0.3709865 0.25421885 0.6489456 +#> Value Mean Median 25th % 75th % +#> 0 0.2968877 0.1534589 0.0539005 0.5366297 +#> 0.5 0.3639845 0.2581248 0.1059203 0.6140513 +#> 1 0.4571985 0.3727463 0.2549830 0.6499621 #> #> -- spiders (VI Rank: 12) --------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 0 0.2976539 0.1499472 0.05137698 0.5310312 -#> 1 0.3426720 0.2230450 0.09443604 0.5593377 +#> 0 0.2984039 0.1528105 0.05324006 0.5321606 +#> 1 0.3438490 0.2238338 0.09693236 0.5660642 #> #> -- hepato (VI Rank: 13) ---------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 0 0.2912547 0.1493194 0.05056845 0.5242976 -#> 1 0.3268190 0.1830504 0.07537275 0.5490061 +#> 0 0.2924451 0.1513041 0.05269621 0.5291405 +#> 1 0.3274109 0.1826162 0.07423325 0.5488156 #> #> -- trt (VI Rank: 14) ------------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> d_penicill_main 0.3139442 0.1728924 0.06002565 0.5541061 -#> placebo 0.3083033 0.1575168 0.05420006 0.5508226 +#> d_penicill_main 0.3146055 0.1808374 0.06225388 0.5581896 +#> placebo 0.3092247 0.1616993 0.05455801 0.5462009 #> #> -- trig (VI Rank: 15) ------------------------------------ #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 85 0.3011420 0.1531589 0.05085489 0.5339691 -#> 108 0.3087066 0.1626084 0.05129435 0.5384522 -#> 151 0.3229588 0.1833097 0.06437458 0.5452205 +#> 85 0.3020625 0.1594529 0.04947666 0.5368518 +#> 108 0.3094585 0.1642559 0.05138019 0.5381577 +#> 151 0.3236035 0.1919256 0.06579255 0.5484254 #> #> -- alk.phos (VI Rank: 16) -------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 922 0.3123888 0.1719807 0.05639797 0.5655804 -#> 1278 0.3140034 0.1688451 0.05799607 0.5687561 -#> 2068 0.3170148 0.1684038 0.05899212 0.5944319 +#> 922 0.3125063 0.1746096 0.05547749 0.5655240 +#> 1278 0.3143534 0.1730928 0.06012625 0.5716067 +#> 2068 0.3175794 0.1719252 0.05909212 0.5968980 #> #> -- platelet (VI Rank: 17) -------------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % -#> 200 0.3169215 0.1771280 0.05563475 0.5902139 -#> 257 0.3112003 0.1692829 0.05472123 0.5747745 -#> 318 0.3078816 0.1724278 0.05521177 0.5625068 +#> 200 0.3178789 0.1797536 0.05520746 0.5855790 +#> 257 0.3120217 0.1700296 0.05532263 0.5736814 +#> 318 0.3087421 0.1739535 0.05590358 0.5649877 #> #> Predicted risk at time t = 1826.25 for top 17 predictors

    It’s easy enough to turn this ‘summary’ object into a @@ -430,12 +430,12 @@

    Multiple variables, marginally head(as.data.table(pd_smry)) #> variable importance Value Mean Median 25th % 75th % -#> 1: bili 0.11347395 0.80 0.2376199 0.1175227 0.05112242 0.3796784 -#> 2: bili 0.11347395 1.4 0.2639813 0.1491127 0.07001711 0.4313834 -#> 3: bili 0.11347395 3.5 0.3750645 0.2959060 0.16393912 0.5687616 -#> 4: copper 0.04899814 43 0.2669529 0.1343897 0.05166615 0.4478705 -#> 5: copper 0.04899814 74 0.2910106 0.1560732 0.06745016 0.4930575 -#> 6: copper 0.04899814 129 0.3458559 0.2276424 0.10882113 0.5518172 +#> 1: bili 0.11319592 0.80 0.2384834 0.1161447 0.05083804 0.3819801 +#> 2: bili 0.11319592 1.4 0.2650961 0.1520876 0.07038703 0.4319252 +#> 3: bili 0.11319592 3.5 0.3757328 0.2962150 0.16422471 0.5693036 +#> 4: copper 0.04949475 43 0.2674680 0.1379648 0.05160351 0.4454352 +#> 5: copper 0.04949475 74 0.2916843 0.1599240 0.06762896 0.4988722 +#> 6: copper 0.04949475 129 0.3468187 0.2274629 0.11003062 0.5575685 #> pred_horizon level #> 1: 1826.25 <NA> #> 2: 1826.25 <NA> @@ -531,17 +531,17 @@

    Visualizing ICE curves ice_oob #> id_variable id_row pred_horizon bili pred -#> 1: 1 1 1826.25 1 0.9216359 -#> 2: 1 2 1826.25 1 0.1141265 -#> 3: 1 3 1826.25 1 0.7337662 -#> 4: 1 4 1826.25 1 0.3585570 -#> 5: 1 5 1826.25 1 0.1417314 +#> 1: 1 1 1826.25 1 0.9194969 +#> 2: 1 2 1826.25 1 0.1136944 +#> 3: 1 3 1826.25 1 0.7413338 +#> 4: 1 4 1826.25 1 0.3671091 +#> 5: 1 5 1826.25 1 0.1439086 #> --- -#> 6896: 25 272 1826.25 10 0.3264152 -#> 6897: 25 273 1826.25 10 0.4338510 -#> 6898: 25 274 1826.25 10 0.4856256 -#> 6899: 25 275 1826.25 10 0.3136700 -#> 6900: 25 276 1826.25 10 0.5347941 +#> 6896: 25 272 1826.25 10 0.3218517 +#> 6897: 25 273 1826.25 10 0.4362345 +#> 6898: 25 274 1826.25 10 0.4962449 +#> 6899: 25 275 1826.25 10 0.3131265 +#> 6900: 25 276 1826.25 10 0.5433389

    We can plug these functions into orsf_control_custom(), and then pass the result into orsf():

    fit_rando <- orsf(pbc_orsf,
    @@ -533,6 +559,10 @@ 

    Linear combinations with you fit_pca <- orsf(pbc_orsf, Surv(time, status) ~ . - id, control = orsf_control_custom(beta_fun = f_pca), + tree_seeds = 1:500) + +fit_rlt <- orsf(pbc_orsf, time + status ~ . - id, + control = orsf_control_custom(beta_fun = f_aorsf), tree_seeds = 1:500)

    So which fit seems to work best in this example? Let’s find out by evaluating the out-of-bag survival predictions.

    @@ -541,7 +571,8 @@

    Linear combinations with you cph = 1 - fit_cph$pred_oobag, net = 1 - fit_net$pred_oobag, rando = 1 - fit_rando$pred_oobag, - pca = 1 - fit_pca$pred_oobag + pca = 1 - fit_pca$pred_oobag, + rlt = 1 - fit_rlt$pred_oobag ) sc <- Score(object = risk_preds, @@ -555,8 +586,9 @@

    Linear combinations with you ## 1: net 1788 0.9179396 0.02012887 0.8784877 0.9573915 ## 2: accel 1788 0.9106396 0.02076004 0.8699507 0.9513286 ## 3: cph 1788 0.9061167 0.02277540 0.8614777 0.9507556 -## 4: rando 1788 0.8997729 0.02201363 0.8566270 0.9429188 -## 5: pca 1788 0.8996927 0.02245483 0.8556821 0.9437034

    +## 4: rlt 1788 0.9012605 0.02178982 0.8585533 0.9439678 +## 5: rando 1788 0.8997729 0.02201363 0.8566270 0.9429188 +## 6: pca 1788 0.8996927 0.02245483 0.8556821 0.9437034

    And the indices of prediction accuracy:

    sc$Brier$score[order(-IPA), .(model, times, IPA)]

    ##         model times       IPA
    @@ -564,11 +596,11 @@ 

    Linear combinations with you ## 2: cph 1788 0.4759061 ## 3: accel 1788 0.4743392 ## 4: pca 1788 0.4398468 -## 5: rando 1788 0.4219209 -## 6: Null model 1788 0.0000000

    +## 5: rlt 1788 0.4373910 +## 6: rando 1788 0.4219209 +## 7: Null model 1788 0.0000000

    From inspection,

    • the glmnet approach has the highest discrimination and index of prediction accuracy.

    • -
    • the accelerated ORSF is a close second.

    • the random coefficients don’t do that well, but they aren’t bad.

    @@ -659,29 +691,29 @@

    Comparing ORSF with other learnersglimpse(results)

    ## Rows: 276
     ## Columns: 23
    -## $ id          <int> 8, 13, 31, 33, 35, 38, 83, 120, 127, 133, 143, 163, 165, 1~
    -## $ trt         <fct> placebo, placebo, placebo, placebo, placebo, placebo, d_pe~
    -## $ age         <dbl> 53.05681, 45.68925, 41.55236, 51.28268, 48.61875, 36.62697~
    -## $ sex         <fct> f, f, f, f, f, f, f, m, f, m, f, f, m, f, f, f, f, f, f, f~
    -## $ ascites     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
    -## $ hepato      <fct> 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1~
    -## $ spiders     <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1~
    -## $ edema       <fct> 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
    -## $ bili        <dbl> 0.3, 0.7, 4.7, 0.8, 1.2, 3.3, 1.3, 3.5, 0.5, 1.5, 2.9, 0.3~
    -## $ chol        <int> 280, 281, 296, 210, 314, 383, 250, 325, 268, 331, 332, 233~
    -## $ albumin     <dbl> 4.00, 3.85, 3.44, 3.19, 3.20, 3.53, 3.50, 3.98, 4.08, 3.95~
    -## $ copper      <int> 52, 40, 114, 82, 201, 102, 48, 444, 9, 13, 86, 20, 80, 67,~
    -## $ alk.phos    <dbl> 4651.2, 1181.0, 9933.2, 1592.0, 12258.8, 1234.0, 1138.0, 7~
    -## $ ast         <dbl> 28.38, 88.35, 206.40, 218.55, 72.24, 137.95, 71.30, 130.20~
    -## $ trig        <int> 189, 130, 101, 113, 151, 87, 100, 210, 95, 99, 103, 68, 14~
    -## $ platelet    <int> 373, 244, 195, 180, 431, 234, 81, 344, 453, 165, 277, 358,~
    -## $ protime     <dbl> 11.0, 10.6, 10.3, 12.0, 10.6, 11.0, 12.9, 10.6, 10.0, 10.1~
    -## $ stage       <ord> 3, 3, 2, 3, 3, 4, 4, 3, 2, 4, 4, 3, 4, 3, 2, 3, 4, 3, 3, 3~
    -## $ time        <int> 2466, 3577, 3839, 3170, 2847, 3244, 4050, 2033, 3255, 2796~
    -## $ status      <dbl> 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0~
    -## $ pred_aorsf  <dbl> 0.06002419, 0.01954988, 0.35024244, 0.29486541, 0.23418878~
    -## $ pred_rfsrc  <dbl> 0.052628661, 0.010204564, 0.401535927, 0.259857534, 0.3263~
    -## $ pred_ranger <dbl> 0.040042884, 0.012915865, 0.392153766, 0.347688672, 0.3015~

    +## $ id <int> 16, 29, 43, 62, 79, 82, 103, 105, 111, 114, 115, 139, 141,~ +## $ trt <fct> placebo, placebo, d_penicill_main, placebo, d_penicill_mai~ +## $ age <dbl> 40.44353, 63.87680, 48.87064, 60.70637, 46.51608, 67.31006~ +## $ sex <fct> f, f, f, f, f, f, f, f, f, m, f, f, f, f, f, f, f, f, f, f~ +## $ ascites <fct> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0~ +## $ hepato <fct> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1~ +## $ spiders <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1~ +## $ edema <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~ +## $ bili <dbl> 0.7, 0.7, 1.1, 1.3, 0.8, 4.5, 2.5, 1.1, 5.5, 3.2, 0.7, 1.1~ +## $ chol <int> 204, 370, 361, 302, 315, 472, 188, 464, 528, 259, 303, 328~ +## $ albumin <dbl> 3.66, 3.78, 3.64, 2.75, 4.24, 4.09, 3.67, 4.20, 4.18, 4.30~ +## $ copper <int> 28, 24, 36, 58, 13, 154, 57, 38, 77, 208, 81, 159, 59, 76,~ +## $ alk.phos <dbl> 685.0, 5833.0, 5430.2, 1523.0, 1637.0, 1580.0, 1273.0, 164~ +## $ ast <dbl> 72.85, 73.53, 67.08, 43.40, 170.50, 117.80, 119.35, 151.90~ +## $ trig <int> 58, 86, 89, 112, 70, 272, 102, 102, 78, 78, 156, 134, 56, ~ +## $ platelet <int> 198, 390, 203, 329, 426, 412, 110, 348, 467, 268, 307, 142~ +## $ protime <dbl> 10.8, 10.6, 10.6, 13.2, 10.9, 11.1, 11.1, 10.3, 10.7, 11.7~ +## $ stage <ord> 3, 2, 2, 4, 3, 3, 4, 3, 3, 3, 3, 4, 2, 2, 3, 4, 2, 3, 4, 4~ +## $ time <int> 3672, 4509, 4556, 3090, 3707, 3574, 110, 3092, 2350, 3395,~ +## $ status <dbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0~ +## $ pred_aorsf <dbl> 0.02210163, 0.12510110, 0.07571520, 0.59580668, 0.12839078~ +## $ pred_rfsrc <dbl> 0.01861595, 0.15632904, 0.07635485, 0.62281617, 0.19145913~ +## $ pred_ranger <dbl> 0.02143363, 0.13367920, 0.05892584, 0.54481330, 0.21380654~

    And finish by aggregating the predictions and computing performance in the testing data. Note that I am computing one statistic for all predictions instead of computing one statistic for each fold. This @@ -702,16 +734,16 @@

    Comparing ORSF with other learners## Results by model: ## ## model times AUC lower upper -## 1: aorsf 1826 91.3 87.2 95.5 -## 2: rfsrc 1826 90.0 85.8 94.3 -## 3: ranger 1826 90.6 86.6 94.7 +## 1: aorsf 1826 91.0 86.8 95.2 +## 2: rfsrc 1826 89.2 84.8 93.7 +## 3: ranger 1826 89.6 85.3 94.0 ## ## Results of model comparisons: ## -## times model reference delta.AUC lower upper p -## 1: 1826 rfsrc aorsf -1.3 -2.9 0.3 0.1 -## 2: 1826 ranger aorsf -0.7 -2.3 0.9 0.4 -## 3: 1826 ranger rfsrc 0.6 -0.5 1.7 0.3 +## times model reference delta.AUC lower upper p +## 1: 1826 rfsrc aorsf -1.7 -3.4 -0.1 0.04 +## 2: 1826 ranger aorsf -1.3 -2.9 0.2 0.08 +## 3: 1826 ranger rfsrc 0.4 -0.8 1.6 0.52 ## ## NOTE: Values are multiplied by 100 and given in %. @@ -725,19 +757,19 @@

    Comparing ORSF with other learners## ## model times Brier lower upper IPA ## 1: Null model 1826.25 20.5 18.1 22.9 0.0 -## 2: aorsf 1826.25 10.6 8.5 12.8 48.0 -## 3: rfsrc 1826.25 11.8 9.7 13.9 42.4 -## 4: ranger 1826.25 11.5 9.5 13.5 44.0 +## 2: aorsf 1826.25 10.9 8.7 13.1 46.9 +## 3: rfsrc 1826.25 12.0 9.9 14.2 41.3 +## 4: ranger 1826.25 12.0 9.9 14.1 41.5 ## ## Results of model comparisons: ## ## times model reference delta.Brier lower upper p -## 1: 1826.25 aorsf Null model -9.8 -12.5 -7.2 1.872048e-13 -## 2: 1826.25 rfsrc Null model -8.7 -10.9 -6.5 2.176184e-14 -## 3: 1826.25 ranger Null model -9.0 -11.3 -6.7 1.387967e-14 -## 4: 1826.25 rfsrc aorsf 1.1 0.3 2.0 8.934160e-03 -## 5: 1826.25 ranger aorsf 0.8 0.1 1.6 3.287486e-02 -## 6: 1826.25 ranger rfsrc -0.3 -0.9 0.2 2.459287e-01 +## 1: 1826.25 aorsf Null model -9.6 -12.2 -7.0 9.364941e-13 +## 2: 1826.25 rfsrc Null model -8.5 -10.7 -6.2 2.074175e-13 +## 3: 1826.25 ranger Null model -8.5 -10.8 -6.2 3.712823e-13 +## 4: 1826.25 rfsrc aorsf 1.1 0.3 2.0 1.075856e-02 +## 5: 1826.25 ranger aorsf 1.1 0.3 1.9 4.825778e-03 +## 6: 1826.25 ranger rfsrc -0.1 -0.6 0.5 8.429772e-01 ## ## NOTE: Values are multiplied by 100 and given in %. diff --git a/reference/orsf_control_custom.html b/reference/orsf_control_custom.html index b4238ecd..73363b63 100644 --- a/reference/orsf_control_custom.html +++ b/reference/orsf_control_custom.html @@ -123,7 +123,7 @@

    Random coefficients## Average leaves per tree: 20 ## Min observations in leaf: 5 ## Min events in leaf: 1 -## OOB stat value: 0.83 +## OOB stat value: 0.84 ## OOB stat type: Harrell's C-statistic ## Variable importance: anova ## @@ -157,9 +157,8 @@

    EvaluateHow well do our two customized ORSFs do? Let’s compute their indices of prediction accuracy based on out-of-bag predictions:

    -

    -

    ## riskRegression version 2023.03.22

    -

    library(survival)
    +

    library(riskRegression)
    +library(survival)
     
     risk_preds <- list(rando = 1 - fit_rando$pred_oobag,
                         pca = 1 - fit_pca$pred_oobag)
    @@ -175,23 +174,21 @@ 

    Evaluate## Results by model: ## ## model times Brier lower upper IPA -## <fctr> <num> <char> <char> <char> <char> -## 1: Null model 1788 20.479 18.090 22.868 0.000 -## 2: rando 1788 11.672 9.596 13.748 43.006 -## 3: pca 1788 12.917 10.885 14.950 36.924 -## -## Results of model comparisons: -## -## times model reference delta.Brier lower upper p -## <num> <fctr> <fctr> <char> <char> <char> <num> -## 1: 1788 rando Null model -8.807 -10.905 -6.709 1.896108e-16 -## 2: 1788 pca Null model -7.562 -9.235 -5.888 8.331729e-19 -## 3: 1788 pca rando 1.245 0.439 2.052 2.476657e-03 - -## -## NOTE: Values are multiplied by 100 and given in %. - -## NOTE: The lower Brier the better, the higher IPA the better.

    +## 1: Null model 1788 20.479 18.090 22.868 0.000 +## 2: rando 1788 11.554 9.476 13.631 43.584 +## 3: pca 1788 12.673 10.692 14.654 38.118 +## +## Results of model comparisons: +## +## times model reference delta.Brier lower upper p +## 1: 1788 rando Null model -8.926 -11.071 -6.780 3.491749e-16 +## 2: 1788 pca Null model -7.806 -9.534 -6.079 8.192570e-19 +## 3: 1788 pca rando 1.119 0.350 1.889 4.354090e-03 + +## +## NOTE: Values are multiplied by 100 and given in %. + +## NOTE: The lower Brier the better, the higher IPA the better.

    diff --git a/reference/orsf_control_net.html b/reference/orsf_control_net.html index 3b3d0a78..f0897fc4 100644 --- a/reference/orsf_control_net.html +++ b/reference/orsf_control_net.html @@ -125,10 +125,10 @@

    Examples#> N trees: 25 #> N predictors total: 17 #> N predictors per node: 5 -#> Average leaves per tree: 25 +#> Average leaves per tree: 24 #> Min observations in leaf: 5 #> Min events in leaf: 1 -#> OOB stat value: 0.84 +#> OOB stat value: 0.83 #> OOB stat type: Harrell's C-statistic #> Variable importance: anova #> diff --git a/reference/orsf_ice_oob.html b/reference/orsf_ice_oob.html index d4dd4133..0bbd7d80 100644 --- a/reference/orsf_ice_oob.html +++ b/reference/orsf_ice_oob.html @@ -227,19 +227,18 @@

    Examplesice_oob <- orsf_ice_oob(fit, pred_spec, boundary_checks = FALSE) ice_oob

    -

    ##       id_variable id_row pred_horizon  bili      pred
    -##             <int> <fctr>        <num> <num>     <num>
    -##    1:           1      1         1788     1 0.9011797
    -##    2:           1      2         1788     1 0.1096207
    -##    3:           1      3         1788     1 0.7646444
    -##    4:           1      4         1788     1 0.3531060
    -##    5:           1      5         1788     1 0.1228441
    -##   ---                                                
    -## 6896:          25    272         1788    10 0.3089586
    -## 6897:          25    273         1788    10 0.4005430
    -## 6898:          25    274         1788    10 0.4933945
    -## 6899:          25    275         1788    10 0.3134373
    -## 6900:          25    276         1788    10 0.5002014

    +

    ##       id_variable id_row pred_horizon bili      pred
    +##    1:           1      1         1788    1 0.9011797
    +##    2:           1      2         1788    1 0.1096207
    +##    3:           1      3         1788    1 0.7646444
    +##    4:           1      4         1788    1 0.3531060
    +##    5:           1      5         1788    1 0.1228441
    +##   ---                                               
    +## 6896:          25    272         1788   10 0.3089586
    +## 6897:          25    273         1788   10 0.4005430
    +## 6898:          25    274         1788   10 0.4933945
    +## 6899:          25    275         1788   10 0.3134373
    +## 6900:          25    276         1788   10 0.5002014

    Much more detailed examples are given in the vignette

    diff --git a/reference/orsf_pd_oob.html b/reference/orsf_pd_oob.html index 6255ce3e..75e92101 100644 --- a/reference/orsf_pd_oob.html +++ b/reference/orsf_pd_oob.html @@ -243,37 +243,34 @@

    Three ways to compute PD and ICE

    pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5))
     
     pd_train

    -

    ##    pred_horizon  bili      mean        lwr       medn       upr
    -##           <num> <num>     <num>      <num>      <num>     <num>
    -## 1:      1826.25     1 0.2151663 0.02028479 0.09634648 0.7997269
    -## 2:      1826.25     2 0.2576618 0.03766695 0.15497447 0.8211875
    -## 3:      1826.25     3 0.2998484 0.06436773 0.20771324 0.8425637
    -## 4:      1826.25     4 0.3390664 0.08427149 0.25401067 0.8589590
    -## 5:      1826.25     5 0.3699045 0.10650098 0.28284427 0.8689855

  • +

    ##    pred_horizon bili      mean        lwr       medn       upr
    +## 1:      1826.25    1 0.2151663 0.02028479 0.09634648 0.7997269
    +## 2:      1826.25    2 0.2576618 0.03766695 0.15497447 0.8211875
    +## 3:      1826.25    3 0.2998484 0.06436773 0.20771324 0.8425637
    +## 4:      1826.25    4 0.3390664 0.08427149 0.25401067 0.8589590
    +## 5:      1826.25    5 0.3699045 0.10650098 0.28284427 0.8689855

  • using out-of-bag predictions for the training data

    pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5))
     
     pd_train

    -

    ##    pred_horizon  bili      mean        lwr       medn       upr
    -##           <num> <num>     <num>      <num>      <num>     <num>
    -## 1:      1826.25     1 0.2145044 0.01835000 0.09619052 0.7980629
    -## 2:      1826.25     2 0.2566241 0.03535358 0.14185734 0.8173143
    -## 3:      1826.25     3 0.2984693 0.05900059 0.20515477 0.8334243
    -## 4:      1826.25     4 0.3383547 0.07887323 0.24347513 0.8469769
    -## 5:      1826.25     5 0.3696260 0.10450534 0.28065473 0.8523756

  • +

    ##    pred_horizon bili      mean        lwr       medn       upr
    +## 1:      1826.25    1 0.2145044 0.01835000 0.09619052 0.7980629
    +## 2:      1826.25    2 0.2566241 0.03535358 0.14185734 0.8173143
    +## 3:      1826.25    3 0.2984693 0.05900059 0.20515477 0.8334243
    +## 4:      1826.25    4 0.3383547 0.07887323 0.24347513 0.8469769
    +## 5:      1826.25    5 0.3696260 0.10450534 0.28065473 0.8523756

  • using predictions for a new set of data

    pd_test <- orsf_pd_new(fit, 
                            new_data = pbc_orsf_test, 
                            pred_spec = list(bili = 1:5))
     
     pd_test

    -

    ##    pred_horizon  bili      mean        lwr      medn       upr
    -##           <num> <num>     <num>      <num>     <num>     <num>
    -## 1:      1826.25     1 0.2542230 0.02901386 0.1943767 0.8143912
    -## 2:      1826.25     2 0.2955726 0.05037316 0.2474559 0.8317684
    -## 3:      1826.25     3 0.3388434 0.07453896 0.3010898 0.8488622
    -## 4:      1826.25     4 0.3800254 0.10565022 0.3516805 0.8592057
    -## 5:      1826.25     5 0.4124587 0.12292465 0.3915066 0.8690074

  • +

    ##    pred_horizon bili      mean        lwr      medn       upr
    +## 1:      1826.25    1 0.2542230 0.02901386 0.1943767 0.8143912
    +## 2:      1826.25    2 0.2955726 0.05037316 0.2474559 0.8317684
    +## 3:      1826.25    3 0.3388434 0.07453896 0.3010898 0.8488622
    +## 4:      1826.25    4 0.3800254 0.10565022 0.3516805 0.8592057
    +## 5:      1826.25    5 0.4124587 0.12292465 0.3915066 0.8690074

  • in-bag partial dependence indicates relationships that the model has learned during training. This is helpful if your goal is to interpret the model.

  • diff --git a/reference/orsf_scale_cph.html b/reference/orsf_scale_cph.html index 5ef85390..f9c4269e 100644 --- a/reference/orsf_scale_cph.html +++ b/reference/orsf_scale_cph.html @@ -157,7 +157,7 @@

    Examples # numeric difference in x_mat and x_unscaled should be practically 0 max(abs(x_mat - x_unscaled)) -#> [1] 8.881784e-16 +#> [1] 3.552714e-15