Why are the results of AUCell not centered around 0.5 (for a random "regulon")? #440
Unanswered
scyrusm
asked this question in
* General SCENIC questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I'm trying to better understand how to interpret the output of AUCell. In particular, if we were to calculate the ranking of genes in a regulon, and plot on the x axis the ranking, and the y axis the cumulative number of genes in the regulon at or above the corresponding ranking, we would have a typical AUC/ROC curve. A "regulon" consisting of randomly selected genes would have an AUC of 0.5, the maximally enriched regulon having and AUC of 1, and the maximally suppressed having one bounded below by 0. This is equivalent (ignoring the case of ties) to the common language effect size between the rankings of the genes in the regulon and the genes not in the regulon, on a cell-by-cell basis.
I suspect that this is due to the parameters
--rank_threshold
and--auc_threshold
, but I am not sure.In practice, the output of AUCell seems to mostly be between 0 and 0.5, including for genes that seem to be "enriched." And, from what I can tell, downstream analysis seems to suggest using the AUC more as a summary statistic (for example, by using mixture models to binarize the distribution of the AUC).
It seems that this doesn't agree with the more conventional definition of the AUC/ROC (here). Why was this chosen? And, given that, how do we interpret the outputs of pyscenic aucell?
EDIT:
As an added note, the following code from ctxcore leads to the perverse situation where setting
auc_threshold
to 1 will lead to an assertion error: ifauc_threshold
were 1,rank_cutoff
will be equal tototal_genes
, butrank_threshold
has been set tototal_genes - 1
. Sorank_threshold
can at most betotal_genes - 1
, but then in the final line,rank_cutoff
is decremented by `. It seems that there was a double attempt to fix the 0- vs 1-indexing discrepancy between python and R, leading to an off-by-one error...Beta Was this translation helpful? Give feedback.
All reactions