diff --git a/articles/aorsf.html b/articles/aorsf.html index f723da52..f3961c84 100644 --- a/articles/aorsf.html +++ b/articles/aorsf.html @@ -136,10 +136,10 @@
orsf_vi_negate(pbc_fit)
-#> bili copper stage age protime sex
-#> 0.114050616 0.058879090 0.046094524 0.038011836 0.028853966 0.020874091
-#> trt platelet edema albumin ascites alk.phos
-#> 0.017129950 0.013471420 0.012868400 0.012394794 0.010231183 0.009874770
-#> spiders trig ast chol hepato
-#> 0.005304760 0.005121708 0.003857380 0.001530643 -0.006896236
+#> bili copper age protime spiders
+#> 0.1168221744 0.0640918012 0.0318717527 0.0295703184 0.0199482278
+#> ascites stage trig ast hepato
+#> 0.0145030496 0.0138362817 0.0093934850 0.0081600305 0.0081045745
+#> edema albumin trt chol platelet
+#> 0.0074171879 0.0070565813 0.0049965458 0.0043845830 0.0007886543
+#> sex alk.phos
+#> -0.0023614972 -0.0040932561
You can also compute variable importance using @@ -235,12 +237,14 @@
orsf_vi_permute(pbc_fit)
-#> bili stage copper age albumin chol
-#> 0.081386482 0.027461497 0.026676054 0.024209679 0.018792662 0.012066683
-#> alk.phos ast trig spiders ascites platelet
-#> 0.009972421 0.009671777 0.006336094 0.004536710 0.004102601 0.003379097
-#> trt sex edema protime hepato
-#> 0.001788329 0.000976518 -0.000169104 -0.002325850 -0.007547886
+#> bili copper age ascites albumin
+#> 0.0681612536 0.0264039589 0.0154990015 0.0145135549 0.0128863883
+#> ast spiders stage edema protime
+#> 0.0112889819 0.0042083643 0.0036260906 0.0031464934 0.0029252926
+#> trt platelet sex hepato chol
+#> -0.0002451595 -0.0002523982 -0.0005419264 -0.0010103185 -0.0012341940
+#> alk.phos trig
+#> -0.0033725370 -0.0039837212
A faster alternative to permutation and negation importance is @@ -249,12 +253,12 @@
orsf_vi_anova(pbc_fit)
-#> ascites bili copper ast albumin stage edema
-#> 0.44444444 0.40000000 0.33333333 0.26666667 0.26666667 0.25000000 0.24047619
-#> age chol sex spiders protime alk.phos platelet
-#> 0.19047619 0.18750000 0.18181818 0.16666667 0.15789474 0.13043478 0.12500000
-#> hepato trig trt
-#> 0.12500000 0.06666667 0.04761905
+#> ascites copper albumin bili edema age hepato
+#> 0.50000000 0.41176471 0.35294118 0.35294118 0.29417989 0.26315789 0.23529412
+#> spiders protime chol stage alk.phos ast platelet
+#> 0.21428571 0.21052632 0.16666667 0.13333333 0.06250000 0.05263158 0.04545455
+#> trig sex trt
+#> 0.04545455 0.00000000 0.00000000
control
# control_fast() is much faster
time_net['elapsed'] / time_fast['elapsed']
#> elapsed
-#> 56.9
+#> 53.4
n_thread
@@ -155,10 +155,10 @@ n_thread
#> N trees: 5
#> N predictors total: 17
#> N predictors per node: 5
-#> Average leaves per tree: 22
+#> Average leaves per tree: 21.6
#> Min observations in leaf: 5
#> Min events in leaf: 1
-#> OOB stat value: 0.77
+#> OOB stat value: 0.78
#> OOB stat type: Harrell's C-index
#> Variable importance: anova
#>
@@ -206,7 +206,7 @@ The out-of-bag estimate of 1 (the default method to evaluate -out-of-bag predictions) is 0.7490789.
+out-of-bag predictions) is 0.7923697.(formula) Two sided formula with a single outcome. +The terms on the right are names of predictor variables, and the +symbol '.' may be used to indicate all variables in the data +except the response. The symbol '-' may also be used to indicate +removal of a predictor. Details on the response vary depending +on forest type:
Survival: The response should include a time variable, +followed by a status variable, and may be written inside a +call to Surv (see examples).
Classification: The response should be a single variable,
+and that variable should have type factor
in data
.
Regression: The response should be a single variable, and
+that variable should have typee double
or integer
with at
+least 10 unique numeric values in data
.
(orsf_control) An object returned from one of the
-orsf_control
functions:
orsf_control_fast (the default) uses a single iteration of Newton -Raphson scoring to identify a linear combination of predictors.
orsf_control_cph uses Newton Raphson scoring until a convergence -criteria is met.
orsf_control_net uses glmnet
to identify linear combinations of
-predictors, similar to Jaeger (2019).
orsf_control_custom allows the user to apply their own function -to create linear combinations of predictors.
orsf_control
functions: orsf_control_survival,
+orsf_control_classification, and orsf_control_regression. If
+NULL
(the default) will use an accelerated control, which is the
+fastest available option. For survival and classification, this is
+Cox and Logistic regression with 1 iteration, and for regression
+it is ordinary least squares.
(numeric vector) Optional. If given, this input should
-have length equal to nrow(data)
. Values in weights
are treated like
-replication weights, i.e., a value of 2 is the same thing as having 2
-observations in data
, each containing a copy of the corresponding
-person's data.
nrow(data)
for complete or imputed data and should
+have length equal to nrow(na.omit(data))
if na_action
is "omit"
.
+Values in weights
are treated like replication weights, i.e., a value
+of 2 is the same thing as having 2 observations in data
, each
+containing a copy of the corresponding person's data.
Use weights
cautiously, as orsf
will count the number of
observations and events prior to growing a node for a tree, so higher
-values in weights
will lead to deeper trees.
weights
will lead to deeper trees. If you use this
+input, it is highly recommended you scale the weights so that
+sum(weights) == nrow(data)
, as this will help make tree depth
+consistent with the default weights = rep(1, nrow(data))
TRUE
.
@@ -296,9 +332,7 @@