Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial gene embedding for chemcpa #158

Open
sepidism opened this issue Jan 22, 2024 · 4 comments
Open

Initial gene embedding for chemcpa #158

sepidism opened this issue Jan 22, 2024 · 4 comments

Comments

@sepidism
Copy link

Hi there, I wanted to double check something and would appreciate your help here. What is the initial embedding for each gene? I see you have the term "self.genes = torch.Tensor(data.X.A)" and then you pass that through the encoder and the rest of your arch. However, my issue is that, this way, your model sees the gene expressions for the treated cell lines and could influence your final results. Is that not a concern or im missing something here?

@sepidism
Copy link
Author

Specifically, in evaluate_r2_sc, the input to compute_prediction() is y_true which means the input and the output are basically the same.

@MxMstrmn
Copy link
Collaborator

MxMstrmn commented Jan 22, 2024

Hi @sepidism,

I am not sure if I understood your questions correctly but the encoder takes in the treated cells which are then embedded in a disentangled fashion (latent space arithmetics of basal state, perturbation state, cell state). After training, for evaluation, we compare to what extend the model is able to decode the ground truth cell signal by comparing it to the originally measured gene expression.

@sepidism
Copy link
Author

sepidism commented Jan 22, 2024

Hi @MxMstrmn
Sorry for the confusion. My question is, during the test/ evaluation, the input is again the treated cell? In your code, the input to the model.predict() is the treated cell line during the evaluation in evaluate_r2_sc.

@MxMstrmn
Copy link
Collaborator

MxMstrmn commented Mar 4, 2024

Hi @sepidism,

The input to the model are simply the control genes of all cell lines present in the dataset, not treatment at all. The treatment is inferred only from the metadata and then from the resulting embeddings which are added in the latent space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants