Downsampling output missing #1731

lauratwomey · 2024-07-31T02:10:14Z

lauratwomey
Jul 31, 2024

Hi!

First of all, thank you very much for developing MIXCR, it's a fantastic tool and the documentation is great!
I'm relatively new to it so hopefully I'm not asking something obvious.

I would like to compare different stats (# clonotypes, # IGH sequences, # TRA... etc) across bulk RNAseq samples. I've run the analysis with the rnaseq preset, but there is quite a difference in coverage between samples, so I thought the downsampling function might be best to compare samples.

When I run downsampling, it seems to downsample it fine, but I don't get the stats I need.

What I get

.clns file (downsampled)
summary_downsampling.csv: contains the nElements before and after (corresponding with the # of clones) as well as the weights before and after. the nElements after = number of clones but I was hopping for more stats.

What I would like to get

The aligned, assembled.txt reports, just like when running it normally.

Note that if I run mixcr exportReports on the downsampled .clns files, I get the exact same numbers as with the original file.
I've also tried export clones on the downsampled .clns files but the output isn't exactly what I'm looking for.

Is there something I'm missing? Thank you so much in advance!

Answered by mizraelson

Aug 5, 2024

I think the best approach is to use mixcr postanalysis function, which includes downsampling:

mixcr postanalysis individual 
  --default-downsampling count-read-auto \
  --default-weight-function read \
  --metadata metadata.tsv \
  *.clns \
   result.json.gz

This will create multiple postanalysis metrics for each chain, including diversity indices. Observed diversity is the number of distinct chains.

Then can also use mixcr exportPlots with the postanalysis output to generate plots comparing the diversity between groups of samples defined by the metadata provided as a table.

View full answer

mizraelson · 2024-08-02T22:46:49Z

mizraelson
Aug 2, 2024
Collaborator

Hi Laura,
Thank you so much for the kind words about MiXCR. We are doing our best.

The downsample function does not re-align or re-assemble anything. What it does is takes an already processed sample (clns file with clones) and downsample it based on criteria selected by the user (for example, randomly picking a certain number of reads from the clonotype table, NOT from the initial fastq file). That is why the align and assemble reports do not change. What does change is shown in the summary_downsampling.csv: the number of clones (nElements) and the number of reads in clones (sumWeight) before and after downsampling.

I hope this clarifies things. Feel free to reach out if you have more questions!

Sincerely,
Mark

0 replies

lauratwomey · 2024-08-05T06:53:42Z

lauratwomey
Aug 5, 2024
Author

Thanks so much for your fast answer Mark!

I see, then perhaps it isn't exactly what I need.

What I need:
I'm trying to compare the number/abundance of clonotypes, chains etc... between different survival groups, and I have ~130 RNAseq data samples. If I just run MIXCR, then the different coverages between samples means I cannot compare them "at the same level".

What I have tried so far:

MIXCR's downsampling function, but that doesn't really give me, e.g., number of distinct IGH, TRA chains...
normalising MIXCR's results by the read count (not sure if normalising by the number of reads aligned would be more appropriate).

Not sure if you have done similar analyses - how do you compare inter-sample results in terms of # clones, chains etc? What do you recommend?

2 replies

mizraelson Aug 5, 2024
Collaborator

I think the best approach is to use mixcr postanalysis function, which includes downsampling:

mixcr postanalysis individual 
  --default-downsampling count-read-auto \
  --default-weight-function read \
  --metadata metadata.tsv \
  *.clns \
   result.json.gz

This will create multiple postanalysis metrics for each chain, including diversity indices. Observed diversity is the number of distinct chains.

Then can also use mixcr exportPlots with the postanalysis output to generate plots comparing the diversity between groups of samples defined by the metadata provided as a table.

Answer selected by lauratwomey

lauratwomey Aug 5, 2024
Author

I see, makes sense, thank you so much Mark!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downsampling output missing #1731

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Downsampling output missing #1731

lauratwomey Jul 31, 2024

Replies: 2 comments · 2 replies

mizraelson Aug 2, 2024 Collaborator

lauratwomey Aug 5, 2024 Author

mizraelson Aug 5, 2024 Collaborator

lauratwomey Aug 5, 2024 Author

lauratwomey
Jul 31, 2024

Replies: 2 comments 2 replies

mizraelson
Aug 2, 2024
Collaborator

lauratwomey
Aug 5, 2024
Author

mizraelson Aug 5, 2024
Collaborator

lauratwomey Aug 5, 2024
Author