diff --git a/articles/aorsf.html b/articles/aorsf.html index 50f00f1b..e5c2edee 100644 --- a/articles/aorsf.html +++ b/articles/aorsf.html @@ -173,13 +173,13 @@
You can also compute variable importance using permutation, a @@ -188,13 +188,13 @@
A faster alternative to permutation and negation importance is @@ -204,11 +204,11 @@
The out-of-bag estimate of Harrell’s C-statistic (the default method -to evaluate out-of-bag predictions) is 0.8404084.
+to evaluate out-of-bag predictions) is 0.8405646.
oobag_fun_brier(y_mat = pbc_orsf[,c('time', 'status')],
s_vec = fit$pred_oobag)
-#> [1] 0.11724
Second, you can pass your function into orsf()
, and it
will be used in place of Harrell’s C-statistic:
@@ -326,12 +326,12 @@User-supplied function importance = 'negate') fit_tdep_cstat$importance -#> bili copper sex protime age ascites -#> 0.130946460 0.044500890 0.033850120 0.022515610 0.019551930 0.017677020 -#> stage albumin chol spiders edema ast -#> 0.017561950 0.016692050 0.011163150 0.007158130 0.007008088 0.006360200 -#> trig hepato trt alk.phos platelet -#> 0.005541530 0.004885160 0.002620090 0.001023750 -0.002403190
using out-of-bag predictions for the training data
@@ -166,12 +166,12 @@using predictions for a new set of data
@@ -183,11 +183,11 @@in-bag PD indicates relationships that the model has learned during training. This is helpful if your goal is to interpret the @@ -221,8 +221,8 @@
The output shows that the expected predicted mortality risk for men is substantially higher than women at 5 years after baseline.
@@ -275,13 +275,13 @@Now would it be tedious if you wanted to do this for all the variables? You bet. That’s why we made a function for that. As a bonus, the printed output is sorted from most to least important variables.
@@ -295,133 +295,133 @@It’s easy enough to turn this ‘summary’ object into a @@ -430,12 +430,12 @@
id_variable
is an identifier for the current value
of the variable(s) that are in the data. It is redundant if you only
diff --git a/articles/pd_files/figure-html/-orsf_ice-1.png b/articles/pd_files/figure-html/-orsf_ice-1.png
index ae8eb0ed..9c3f9d5c 100644
Binary files a/articles/pd_files/figure-html/-orsf_ice-1.png and b/articles/pd_files/figure-html/-orsf_ice-1.png differ
diff --git a/articles/pd_files/figure-html/unnamed-chunk-13-1.png b/articles/pd_files/figure-html/unnamed-chunk-13-1.png
index 006434ff..c0236a1e 100644
Binary files a/articles/pd_files/figure-html/unnamed-chunk-13-1.png and b/articles/pd_files/figure-html/unnamed-chunk-13-1.png differ
diff --git a/articles/pd_files/figure-html/unnamed-chunk-8-1.png b/articles/pd_files/figure-html/unnamed-chunk-8-1.png
index ed459527..d0e755f0 100644
Binary files a/articles/pd_files/figure-html/unnamed-chunk-8-1.png and b/articles/pd_files/figure-html/unnamed-chunk-8-1.png differ
diff --git a/articles/pd_files/figure-html/unnamed-chunk-9-1.png b/articles/pd_files/figure-html/unnamed-chunk-9-1.png
index f50e35e4..fea4b651 100644
Binary files a/articles/pd_files/figure-html/unnamed-chunk-9-1.png and b/articles/pd_files/figure-html/unnamed-chunk-9-1.png differ
diff --git a/pkgdown.yml b/pkgdown.yml
index c7395436..7f116c56 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -5,7 +5,7 @@ articles:
aorsf: aorsf.html
oobag: oobag.html
pd: pd.html
-last_built: 2023-10-04T14:00Z
+last_built: 2023-10-05T03:45Z
urls:
reference: https://bcjaeger.github.io/aorsf/reference
article: https://bcjaeger.github.io/aorsf/articles
diff --git a/reference/as.data.table.orsf_summary_uni.html b/reference/as.data.table.orsf_summary_uni.html
index 84995cfd..73155432 100644
--- a/reference/as.data.table.orsf_summary_uni.html
+++ b/reference/as.data.table.orsf_summary_uni.html
@@ -94,28 +94,24 @@
The third uses orsf()
inside of orsf()
(aka reinforcement learning
+trees RLTs).
# some special care is taken to prevent your R session from crashing.
+# Specifically, random coefficients are used when n_obs <= 10
+# or n_events <= 5.
+
+f_aorsf <- function(x_node, y_node, w_node){
+
+ colnames(y_node) <- c('time', 'status')
+ colnames(x_node) <- paste("x", seq(ncol(x_node)), sep = '')
+
+ data <- as.data.frame(cbind(y_node, x_node))
+
+ if(nrow(data) <= 10 || sum(y_node[,'status']) <= 5)
+ return(matrix(runif(ncol(x_node)), ncol = 1))
+
+ fit <- orsf(data, time + status ~ .,
+ weights = as.numeric(w_node),
+ n_tree = 25,
+ importance = 'anova')
+
+ out <- orsf_vi(fit)[colnames(x_node)]
+
+ matrix(out, ncol = 1)
+
+}
We can plug these functions into orsf_control_custom()
, and then pass
the result into orsf()
:
fit_rando <- orsf(pbc_orsf,
@@ -533,6 +559,10 @@ Linear combinations with you
fit_pca <- orsf(pbc_orsf,
Surv(time, status) ~ . - id,
control = orsf_control_custom(beta_fun = f_pca),
+ tree_seeds = 1:500)
+
+fit_rlt <- orsf(pbc_orsf, time + status ~ . - id,
+ control = orsf_control_custom(beta_fun = f_aorsf),
tree_seeds = 1:500)
So which fit seems to work best in this example? Let’s find out by evaluating the out-of-bag survival predictions.
@@ -541,7 +571,8 @@And the indices of prediction accuracy:
sc$Brier$score[order(-IPA), .(model, times, IPA)]
## model times IPA
@@ -564,11 +596,11 @@ Linear combinations with you
## 2: cph 1788 0.4759061
## 3: accel 1788 0.4743392
## 4: pca 1788 0.4398468
-## 5: rando 1788 0.4219209
-## 6: Null model 1788 0.0000000
From inspection,
the glmnet
approach has the highest discrimination and index of
prediction accuracy.
the accelerated ORSF is a close second.
the random coefficients don’t do that well, but they aren’t bad.
## Rows: 276
## Columns: 23
-## $ id <int> 8, 13, 31, 33, 35, 38, 83, 120, 127, 133, 143, 163, 165, 1~
-## $ trt <fct> placebo, placebo, placebo, placebo, placebo, placebo, d_pe~
-## $ age <dbl> 53.05681, 45.68925, 41.55236, 51.28268, 48.61875, 36.62697~
-## $ sex <fct> f, f, f, f, f, f, f, m, f, m, f, f, m, f, f, f, f, f, f, f~
-## $ ascites <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
-## $ hepato <fct> 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1~
-## $ spiders <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1~
-## $ edema <fct> 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
-## $ bili <dbl> 0.3, 0.7, 4.7, 0.8, 1.2, 3.3, 1.3, 3.5, 0.5, 1.5, 2.9, 0.3~
-## $ chol <int> 280, 281, 296, 210, 314, 383, 250, 325, 268, 331, 332, 233~
-## $ albumin <dbl> 4.00, 3.85, 3.44, 3.19, 3.20, 3.53, 3.50, 3.98, 4.08, 3.95~
-## $ copper <int> 52, 40, 114, 82, 201, 102, 48, 444, 9, 13, 86, 20, 80, 67,~
-## $ alk.phos <dbl> 4651.2, 1181.0, 9933.2, 1592.0, 12258.8, 1234.0, 1138.0, 7~
-## $ ast <dbl> 28.38, 88.35, 206.40, 218.55, 72.24, 137.95, 71.30, 130.20~
-## $ trig <int> 189, 130, 101, 113, 151, 87, 100, 210, 95, 99, 103, 68, 14~
-## $ platelet <int> 373, 244, 195, 180, 431, 234, 81, 344, 453, 165, 277, 358,~
-## $ protime <dbl> 11.0, 10.6, 10.3, 12.0, 10.6, 11.0, 12.9, 10.6, 10.0, 10.1~
-## $ stage <ord> 3, 3, 2, 3, 3, 4, 4, 3, 2, 4, 4, 3, 4, 3, 2, 3, 4, 3, 3, 3~
-## $ time <int> 2466, 3577, 3839, 3170, 2847, 3244, 4050, 2033, 3255, 2796~
-## $ status <dbl> 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0~
-## $ pred_aorsf <dbl> 0.06002419, 0.01954988, 0.35024244, 0.29486541, 0.23418878~
-## $ pred_rfsrc <dbl> 0.052628661, 0.010204564, 0.401535927, 0.259857534, 0.3263~
-## $ pred_ranger <dbl> 0.040042884, 0.012915865, 0.392153766, 0.347688672, 0.3015~
And finish by aggregating the predictions and computing performance in the testing data. Note that I am computing one statistic for all predictions instead of computing one statistic for each fold. This @@ -702,16 +734,16 @@
## riskRegression version 2023.03.22
library(survival)
+library(riskRegression)
+library(survival)
risk_preds <- list(rando = 1 - fit_rando$pred_oobag,
pca = 1 - fit_pca$pred_oobag)
@@ -175,23 +174,21 @@ Evaluate## Results by model:
##
## model times Brier lower upper IPA
-## <fctr> <num> <char> <char> <char> <char>
-## 1: Null model 1788 20.479 18.090 22.868 0.000
-## 2: rando 1788 11.672 9.596 13.748 43.006
-## 3: pca 1788 12.917 10.885 14.950 36.924
-##
-## Results of model comparisons:
-##
-## times model reference delta.Brier lower upper p
-## <num> <fctr> <fctr> <char> <char> <char> <num>
-## 1: 1788 rando Null model -8.807 -10.905 -6.709 1.896108e-16
-## 2: 1788 pca Null model -7.562 -9.235 -5.888 8.331729e-19
-## 3: 1788 pca rando 1.245 0.439 2.052 2.476657e-03
-
-##
-## NOTE: Values are multiplied by 100 and given in %.
-
-## NOTE: The lower Brier the better, the higher IPA the better.
+## 1: Null model 1788 20.479 18.090 22.868 0.000
+## 2: rando 1788 11.554 9.476 13.631 43.584
+## 3: pca 1788 12.673 10.692 14.654 38.118
+##
+## Results of model comparisons:
+##
+## times model reference delta.Brier lower upper p
+## 1: 1788 rando Null model -8.926 -11.071 -6.780 3.491749e-16
+## 2: 1788 pca Null model -7.806 -9.534 -6.079 8.192570e-19
+## 3: 1788 pca rando 1.119 0.350 1.889 4.354090e-03
+
+##
+## NOTE: Values are multiplied by 100 and given in %.
+
+## NOTE: The lower Brier the better, the higher IPA the better.
## id_variable id_row pred_horizon bili pred
-## <int> <fctr> <num> <num> <num>
-## 1: 1 1 1788 1 0.9011797
-## 2: 1 2 1788 1 0.1096207
-## 3: 1 3 1788 1 0.7646444
-## 4: 1 4 1788 1 0.3531060
-## 5: 1 5 1788 1 0.1228441
-## ---
-## 6896: 25 272 1788 10 0.3089586
-## 6897: 25 273 1788 10 0.4005430
-## 6898: 25 274 1788 10 0.4933945
-## 6899: 25 275 1788 10 0.3134373
-## 6900: 25 276 1788 10 0.5002014
## id_variable id_row pred_horizon bili pred
+## 1: 1 1 1788 1 0.9011797
+## 2: 1 2 1788 1 0.1096207
+## 3: 1 3 1788 1 0.7646444
+## 4: 1 4 1788 1 0.3531060
+## 5: 1 5 1788 1 0.1228441
+## ---
+## 6896: 25 272 1788 10 0.3089586
+## 6897: 25 273 1788 10 0.4005430
+## 6898: 25 274 1788 10 0.4933945
+## 6899: 25 275 1788 10 0.3134373
+## 6900: 25 276 1788 10 0.5002014
Much more detailed examples are given in the vignette
diff --git a/reference/orsf_pd_oob.html b/reference/orsf_pd_oob.html index 6255ce3e..75e92101 100644 --- a/reference/orsf_pd_oob.html +++ b/reference/orsf_pd_oob.html @@ -243,37 +243,34 @@pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5))
pd_train
## pred_horizon bili mean lwr medn upr
-## <num> <num> <num> <num> <num> <num>
-## 1: 1826.25 1 0.2151663 0.02028479 0.09634648 0.7997269
-## 2: 1826.25 2 0.2576618 0.03766695 0.15497447 0.8211875
-## 3: 1826.25 3 0.2998484 0.06436773 0.20771324 0.8425637
-## 4: 1826.25 4 0.3390664 0.08427149 0.25401067 0.8589590
-## 5: 1826.25 5 0.3699045 0.10650098 0.28284427 0.8689855
## pred_horizon bili mean lwr medn upr
+## 1: 1826.25 1 0.2151663 0.02028479 0.09634648 0.7997269
+## 2: 1826.25 2 0.2576618 0.03766695 0.15497447 0.8211875
+## 3: 1826.25 3 0.2998484 0.06436773 0.20771324 0.8425637
+## 4: 1826.25 4 0.3390664 0.08427149 0.25401067 0.8589590
+## 5: 1826.25 5 0.3699045 0.10650098 0.28284427 0.8689855
using out-of-bag predictions for the training data
pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5))
pd_train
## pred_horizon bili mean lwr medn upr
-## <num> <num> <num> <num> <num> <num>
-## 1: 1826.25 1 0.2145044 0.01835000 0.09619052 0.7980629
-## 2: 1826.25 2 0.2566241 0.03535358 0.14185734 0.8173143
-## 3: 1826.25 3 0.2984693 0.05900059 0.20515477 0.8334243
-## 4: 1826.25 4 0.3383547 0.07887323 0.24347513 0.8469769
-## 5: 1826.25 5 0.3696260 0.10450534 0.28065473 0.8523756
## pred_horizon bili mean lwr medn upr
+## 1: 1826.25 1 0.2145044 0.01835000 0.09619052 0.7980629
+## 2: 1826.25 2 0.2566241 0.03535358 0.14185734 0.8173143
+## 3: 1826.25 3 0.2984693 0.05900059 0.20515477 0.8334243
+## 4: 1826.25 4 0.3383547 0.07887323 0.24347513 0.8469769
+## 5: 1826.25 5 0.3696260 0.10450534 0.28065473 0.8523756
using predictions for a new set of data
pd_test <- orsf_pd_new(fit,
new_data = pbc_orsf_test,
pred_spec = list(bili = 1:5))
pd_test
## pred_horizon bili mean lwr medn upr
-## <num> <num> <num> <num> <num> <num>
-## 1: 1826.25 1 0.2542230 0.02901386 0.1943767 0.8143912
-## 2: 1826.25 2 0.2955726 0.05037316 0.2474559 0.8317684
-## 3: 1826.25 3 0.3388434 0.07453896 0.3010898 0.8488622
-## 4: 1826.25 4 0.3800254 0.10565022 0.3516805 0.8592057
-## 5: 1826.25 5 0.4124587 0.12292465 0.3915066 0.8690074
## pred_horizon bili mean lwr medn upr
+## 1: 1826.25 1 0.2542230 0.02901386 0.1943767 0.8143912
+## 2: 1826.25 2 0.2955726 0.05037316 0.2474559 0.8317684
+## 3: 1826.25 3 0.3388434 0.07453896 0.3010898 0.8488622
+## 4: 1826.25 4 0.3800254 0.10565022 0.3516805 0.8592057
+## 5: 1826.25 5 0.4124587 0.12292465 0.3915066 0.8690074
in-bag partial dependence indicates relationships that the model has learned during training. This is helpful if your goal is to interpret the model.