Different results Mixscape from PertPy vs Seurat #679

albacorp · 2024-11-12T22:15:35Z

Report

Hello,
We (@jameshyojaelee) are running PertPy Mixscape vs Mixscape with the same parameters and we are getting very different results -- PertPy is calling more perturbed cells than Mixscape -- which is ideal, but we want to be sure if this is true. Could you help us out, please?

Or at least to understand which are the differences between the Mixscape code vs PertPy Mixscape if there are any.

Please find the code attached and examples of the different guide barplots that we get running PertPy vs Seurat

241112_Scanpy_vs._Seurat.txt
guide_barplot_PertPy.pdf
guide_barplot_Seurat.pdf

Version information

No response

Zethson · 2024-11-13T09:16:10Z

Thanks for reporting! We will have a look.

Lilly-May · 2024-11-13T09:36:56Z

Hi @albacorp! Could you provide more information on the data you're using, please? Are you loading it via the pertpy dataloader? If not, it would be great if you could share the data with me. If sharing it publicly isn't possible, feel free to reach out to me via lilly.may@tum.de

jameshyojaelee · 2024-11-13T14:18:42Z

Hello @Zethson and @Lilly-May, thank you so much for such a quick response. We've been enjoying your PertPy package a lot!
@albacorp has generated the dataset from her experiment on the 10x platform. We used Cellranger multi to generate the h5 matrix which contains both the RNA and CRISPR guide data. Below is the code I wrote to import the data.

If this is not enough info, I can also subsample one of our datasets and email it to you. Thank you!

`samples = {
"Sample1": "/count/filtered_feature_bc_matrix.h5",
"Sample2": "/count/filtered_feature_bc_matrix.h5"
}
rna_dict = {}
crispr_dict = {}

for sample_id, filepath in samples.items():
sample_data = mu.read_10x_h5(filepath)
rna_data = sample_data.mod['rna']
crispr_data = sample_data.mod['CRISPR Guide Capture']
rna_data.obs_names = [f"{barcode}{sample_id}" for barcode in rna_data.obs_names]
crispr_data.obs_names = [f"{barcode}{sample_id}" for barcode in crispr_data.obs_names]
rna_data.var_names_make_unique()
crispr_data.var_names_make_unique()
rna_data.obs['sample'] = sample_id
crispr_data.obs['sample'] = sample_id
rna_dict[sample_id] = rna_data
crispr_dict[sample_id] = crispr_data

rna_combined = ad.concat(rna_dict, label='sample', join='outer')
crispr_combined = ad.concat(crispr_dict, label='sample', join='outer')
combined_mdata = mu.MuData({"rna": rna_combined, "gdo": crispr_combined})
combined_mdata.write("combined_mdata.h5mu")`

Zethson · 2024-11-13T14:20:46Z

Thank you!

I do fear that we need a downsampled version of the dataset where you can reproduce the issue that you described above. We have done extensive tests with other datasets where we verified that the R and Python implementations are very similar. Therefore, your dataset is essential to debug this.

jameshyojaelee · 2024-11-13T14:42:15Z

Got it! We just sent a sample matrix file to lilly.may@tum.de. Thank you!

Lilly-May · 2024-12-09T14:46:00Z

Hi @albacorp and @jameshyojaelee! I finally found the time to look into this (sorry it took me so long!). Unfortunately, I can't fully reproduce your steps as there seems to be some part of the code missing. For example, in the original .txt file you sent, you call pt.tl.Mixscape.perturbation_signature on the perturbation column and split by sample. However, as far as I can see, these columns are not generated during preprocessing and are also not present in the data excerpt you shared with me.

That said, while reviewing your code, I noticed the following: you run pertpy's Mixscape implementation on scaled data. Although in Seurat's implementation the data is also scaled, note that the perturbation signature is calculated from unscaled data in Seurat's version (combined <- CalcPerturbSig(... slot = "data", ...). I tested this on another dataset and found that running pertpy's Mixscape on scaled data produced quite different results compared to running Seurat's Mixscape on unscaled data, which might explain why you get different results. Keep in mind that PCA is indeed calculated on scaled data in Seurat's implementation, so you might still want to scale the data for PCA with pertpy/scanpy, but consider storing the scaled data in a separate layer.

If you'd like me to take a closer look, it would be helpful if you could send the entire code you used, starting from the raw data to the Mixscape results (per mail is also fine if you don't want to share it publicly). Additionally, please try to adjust the scaling on your end and report back if that resolves the issue.

One final note: some differences between Mixscape results are expected due to several non-deterministic steps in the function. Unless you specify a seed, the results might slightly vary between runs even for the same implementation. However, these differences should generally be minimal.

albacorp · 2024-12-09T15:09:35Z

Hi Lily, Thank you so so much! We will look into this carefully. We can give it a try first with your suggestions and come back to you and share the code when we have done some troubleshooting ... We don't want to add more work on your end. Talk to you soon ! Alba

…

On Mon, Dec 9, 2024 at 9:46 AM Lilly May ***@***.***> wrote: Hi @albacorp <https://github.com/albacorp> and @jameshyojaelee <https://github.com/jameshyojaelee>! I finally found the time to look into this (sorry it took me so long!). Unfortunately, I can't fully reproduce your steps as there seems to be some part of the code missing. For example, in the original .txt file you sent, you call pt.tl.Mixscape.perturbation_signature on the perturbation column and split by sample. However, as far as I can see, these columns are not generated during preprocessing and are also not present in the data excerpt you shared with me. That said, while reviewing your code, I noticed the following: you run pertpy's Mixscape implementation on scaled data. Although in Seurat's implementation the data is also scaled, note that the perturbation signature is calculated from unscaled data in Seurat's version (combined <- CalcPerturbSig(... slot = "data", ...). I tested this on another dataset and found that running pertpy's Mixscape on scaled data produced quite different results compared to running Seurat's Mixscape on unscaled data, which might explain why you get different results. Keep in mind that PCA is indeed calculated on scaled data in Seurat's implementation, so you might still want to scale the data for PCA with pertpy/scanpy, but consider storing the scaled data in a separate layer. If you'd like me to take a closer look, it would be helpful if you could send the entire code you used, starting from the raw data to the Mixscape results (per mail is also fine if you don't want to share it publicly). Additionally, please try to adjust the scaling on your end and report back if that resolves the issue. One final note: some differences between Mixscape results are expected due to several non-deterministic steps in the function. Unless you specify a seed, the results might slightly vary between runs even for the same implementation. However, these differences should generally be minimal. — Reply to this email directly, view it on GitHub <#679 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5BQCTAXIGTFPE5EI3A7IMD2EWUMBAVCNFSM6AAAAABRVCKRTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRYGE3DSNZTGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.

Zethson · 2024-12-09T15:11:24Z

Dear @albacorp

We don't want to add more work on your end

No, we are actually very interested in 100% ensuring that our results match the expectations. So please do not be afraid to add more work to our pile :)
Tiny deviations are expected because the R and Python numerical calculations deviate slightly. But the overall results simply must be very comparable.

albacorp added the bug Something isn't working label Nov 12, 2024

Zethson changed the title ~~Different results Mixscape from PertPy vs Seurat - Help~~ Different results Mixscape from PertPy vs Seurat Nov 12, 2024

Zethson assigned Lilly-May Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results Mixscape from PertPy vs Seurat #679

Different results Mixscape from PertPy vs Seurat #679

albacorp commented Nov 12, 2024

Zethson commented Nov 13, 2024

Lilly-May commented Nov 13, 2024

jameshyojaelee commented Nov 13, 2024

Zethson commented Nov 13, 2024

jameshyojaelee commented Nov 13, 2024

Lilly-May commented Dec 9, 2024

albacorp commented Dec 9, 2024 via email

Zethson commented Dec 9, 2024

Different results Mixscape from PertPy vs Seurat #679

Different results Mixscape from PertPy vs Seurat #679

Comments

albacorp commented Nov 12, 2024

Report

Version information

Zethson commented Nov 13, 2024

Lilly-May commented Nov 13, 2024

jameshyojaelee commented Nov 13, 2024

Zethson commented Nov 13, 2024

jameshyojaelee commented Nov 13, 2024

Lilly-May commented Dec 9, 2024

albacorp commented Dec 9, 2024 via email

Zethson commented Dec 9, 2024