From 105560c86129edf343879e9b68936f91ef48aa85 Mon Sep 17 00:00:00 2001 From: Casey Greene Date: Wed, 6 Nov 2024 12:17:25 -0700 Subject: [PATCH] abstract -> 150 words --- content/01.abstract.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/01.abstract.md b/content/01.abstract.md index f05d0ba..0fd8690 100644 --- a/content/01.abstract.md +++ b/content/01.abstract.md @@ -18,10 +18,9 @@ If the goal is to achieve robust performance across contexts or datasets, whenev Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. -Gene signatures in cancer transcriptomics tend to include small subsets of genes for these reasons, and algorithms for defining signatures are often designed with these ideas in mind. -To directly test the latter assumption, that small gene signatures generalize better, we examined the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice-versa) and biological contexts (holding out entire cancer types from pan-cancer data). -We compared two simple procedures for model selection, one that exclusively relies on cross-validation performance and one that combines cross-validation performance with regularization strength. +We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice-versa) and biological contexts (holding out entire cancer types from pan-cancer data). +We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength. We did not observe that more regularized signatures generalized better. This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks). -When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation, instead of those that are smaller or more regularized. +When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.