Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload data: improve filtering definitions and report excluded data #458

Closed
mauromiguelm opened this issue Jun 12, 2023 · 4 comments
Closed
Labels
enhancement New feature or request

Comments

@mauromiguelm
Copy link
Contributor

mauromiguelm commented Jun 12, 2023

  • Make more clear which genes are filtered out
  • Output this list with filtered genes and place in dataview > resources

[Note: this was a request from user as they often get asked why a certain favorite gene cannot be found in the results on OPG. So she needed too often check manually why a gene was filtered out.]

@mauromiguelm mauromiguelm added the enhancement New feature or request label Jun 12, 2023
@mauromiguelm
Copy link
Contributor Author

irrelevant for now, needs further discussion

@mauromiguelm
Copy link
Contributor Author

reopened as this will increase transparency. In dataview > resources, we could add the genes removed, the reasons, along with other related dataset statistics.

@mauromiguelm mauromiguelm changed the title upload data: improve filtering definitions and default options upload data: improve filtering definitions and report excluded data Aug 10, 2023
@ivokwee
Copy link
Member

ivokwee commented Aug 24, 2023

There is already a 'slot' in pgx with the name "filtered" (pgx$filtered) which contains the names of genes/features that are filtered out by the low.expressed filter. We may add more filter results like "non-coding" or "not known gene" etc. This can be used to report.

The other thing is that pgx$counts generally consists of non-filtered values close to the original input matrix. I think complete zero rows are deleted to save memory but all other genes are retained. It is the pgx$X (log transformed, normalized) matrix that is filtered (thus smaller) for either statistical reasons or not interesting.

@ivokwee
Copy link
Member

ivokwee commented Sep 6, 2024

archiving as stale issue.

@ivokwee ivokwee closed this as completed Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants