-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upload data: improve filtering definitions and report excluded data #458
Comments
irrelevant for now, needs further discussion |
reopened as this will increase transparency. In dataview > resources, we could add the genes removed, the reasons, along with other related dataset statistics. |
There is already a 'slot' in pgx with the name "filtered" (pgx$filtered) which contains the names of genes/features that are filtered out by the low.expressed filter. We may add more filter results like "non-coding" or "not known gene" etc. This can be used to report. The other thing is that pgx$counts generally consists of non-filtered values close to the original input matrix. I think complete zero rows are deleted to save memory but all other genes are retained. It is the pgx$X (log transformed, normalized) matrix that is filtered (thus smaller) for either statistical reasons or not interesting. |
archiving as stale issue. |
[Note: this was a request from user as they often get asked why a certain favorite gene cannot be found in the results on OPG. So she needed too often check manually why a gene was filtered out.]
The text was updated successfully, but these errors were encountered: