Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications about LINCS data / evaluation #159

Open
rvinas opened this issue Feb 2, 2024 · 2 comments
Open

Clarifications about LINCS data / evaluation #159

rvinas opened this issue Feb 2, 2024 · 2 comments

Comments

@rvinas
Copy link

rvinas commented Feb 2, 2024

Hi, I would be grateful if you could clarify the following points regarding LINCS data:

  1. The README file mentions that you used data from Phase I and Phase II. What level (1-5)?
  2. Apart from cell line and compound information (perturbation ID and dose), did you consider any other variables when modeling this data?
  3. How did you compute the differentially expressed genes? Did you compute separate ranks for each cell line?
  4. In terms of evaluation, the baseline in the paper employs the expression decoded from the basal state (i.e. excluding perturbation information). Does this baseline preserve the ground-truth cell line information when decoding the basal state or is this information also removed?
  5. Did you consider computing the R2 scores between the raw control data (i.e. no autoencoder involved) and the ground-truth post-perturbation profiles predicted by chemCPA?

Any clarification on these points would be greatly appreciated.

@rvinas
Copy link
Author

rvinas commented Feb 14, 2024

@MxMstrmn I would appreciate your answers to the questions above

@MxMstrmn
Copy link
Collaborator

MxMstrmn commented Mar 4, 2024

Hi @rvinas,

  1. We use level two (the GEX equivalent)
  2. No, for the LINCS data, we only considered compounds, dosage, and cell line information
  3. The differentially expressed genes we approximated by this part of the notebook in 1_lincs.py, L93-L110
  4. The baseline is the composition of basal state + cell line information. Effectively, we simply check how similar to control distribution is compared to the perturbed state
  5. No, I did not make this check myself but relied on the original analysis in the Sci-Plex data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants