Problem with mixcr analyze command #1354

BulushevaIrina · 2023-09-16T15:41:23Z

BulushevaIrina
Sep 16, 2023

Hi! I try to make analysis of MiSeq human TCR data with 100bp paired reads both with UMI 12 bp.
I tried different variants and have empty clonotypes files as a result and 0 percent alignment.

Last independent attempts had another problem:
mixcr analyze rna-seq --species hsa Control1_TRB_R1.fastq Control1_TRB_R2.fastq Control1_TRB_analyze
mixcr analyze takara-human-rna-tcr-umi-smarter-v2 Control1_TRB_R1.fastq Control1_TRB_R2.fastq output_file
mixcr align --preset milab-human-rna-tcr-umi-multiplex -OallowPartialAlignments=true Control1_TRB_R1.fastq.gz Control1_TRB_R2.fastq.gz Control1_TRB_align.vdjca

All of them were interrupted with error
Alignment: 65.5% ETA: 00:00:41
4.4.2; built=Sun Jul 30 12:17:59 AMT 2023; rev=cd7ea52e83; lib=repseqio.v3.0.1
picocli.CommandLine$ExecutionException: Error while running command align com.milaboratory.o.aL: No '+' character found in the beginning of the third line of the fastq record.

But I don't see this mistake
head -n 8 Control1_TRB_R1.fastq
@mig UMI:GCCTTATAGAAA:2
AAAAGATGTAAGCAGTGGTATCAACGCAGAGTGCCTTTATATGAAATCTTGGGGACAGTGACACTGATCTGGTAAAGCCCCCATCCTGGCCTGACCCTGC
+
II..IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.IIIIIIIIIIIIIIIIIIIIIIIII
@mig UMI:CGCTTGATTCTG:3
AGTCGATGTAAGCAGTGGTATCAACGCAGAGTCGCTTTGATTTCTGTCTTGGGGGGGTTCCCCGACGTGCTGCAGCAAGTGCCTTTGCCCTGCCTGTGGGC
+
%7%7IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII%

Could you help me please to make correct all pipeline commands for my type of data?

Answered by mizraelson

Sep 17, 2023

It appears the files have undergone processing, including UMI correction, and the reads are merged. I strongly recommend retrieving the original files, so MiXCR can handle the correction internally.

The error you've mentioned seems to stem from preprocessing. While I didn't notice ' ' and '\n' symbols in your initial message, the presence of these symbols could disrupt the original FASTQ format.

Understanding the library preparation protocol is crucial, as it guides the subsequent analysis. Based on the reads you provided, my guess is that you're working with a 5'RACE protocol. Given that R1 has a short sequence that likely only covers the 5'UTR, the original UMI was probably sequenced in…

View full answer

BulushevaIrina · 2023-09-17T00:00:31Z

BulushevaIrina
Sep 17, 2023
Author

May be problem is with parsing by ' ' and '\n'?
I try short variants of files without ' ' like:
@migumi:GCCTTATAGAAA:2
AAAAGATGTAAGCAGTGGTATCAACGCAGAGTGCCTTTATATGAAATCTTGGGGACAGTGACACTGATCTGGTAAAGCCCCCATCCTGGCCTGACCCTGC
+
II..IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.IIIIIIIIIIIIIIIIIIIIIIIII

And didn't catch a mistake

0 replies

mizraelson · 2023-09-17T00:38:10Z

mizraelson
Sep 17, 2023
Collaborator

Hi,
Are those preprocessed Fastq files? Is there a reason you don't use the raw data?

Also, do you mean that every read (r1 and r2) starts with 12nt UMI? Is it a 5'RACE or multiplex?

0 replies

BulushevaIrina · 2023-09-17T01:40:54Z

BulushevaIrina
Sep 17, 2023
Author

Hi!
Unfortunately, I have received only these files without explanation. I'll try to ask for any additional info.
As I see during analyzing content of files, both of pair-files (r1 and r2) have 100 bp reads and 12 bp UMI (in rows startes with '@').
I don't know is it 5'RACE or multiplex - may it can be useful to try both variants in this case?

$ head Control2_TRB_R1.fastq
@mig UMI:CGCTGCATTTAG:4
TCAGGCAAAAAGCAGTGGTATCAACGCAGAGTCGCTTGCATTTTAGTCTTGGGGGATGGGCACCAGTCTCCTATGCTGGGTGGTCCTGGGTTTCCTAGGG
+
;.;;IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@mig UMI:GTGAGGCCCGCC:5
AATCGCAAAAAGCAGTGGTATCAACGCAGAGTGTGATGGCCTCGCCTCTTGGGGTTCACGGAAGATGCTGCTGCTTCTGCTGCTTCTGGGGCCAGGCTCC
+
))3)IIIIIIIIIIIIII>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII>IIIIIIIIIIII
@mig UMI:TCCGTTGGGCAG:3
TTCTGCAAAAAGCAGTGGTATCAACGCAGAGTTCCGTTTGGTGCAGTCTTGGGGACAGTGACCCTGATCTGGTAAAGCTCCCATCCTGCCCTGACTCTGT
$ head Control2_TRB_R2.fastq
@mig UMI:CGCTGCATTTAG:4
GCACAGAGCAGCGGGACTCGGCCATGTATCGCTGTGCCAGCAGCCCCGGGAGGGGTCGGGCCATTGGGTCACCCCTCCACTTTGGGAACGGGACCAATGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;IIIIIII..;.
@mig UMI:GTGAGGCCCGCC:5
GTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCTAGAGTCCTCGGGGGAGCCACAGATACGCAGTATTTTGGCCCAGGCACCCCTTG
+
#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII>IIIIIIIIIIIIIIIIIIIIIIIIIIIIII>I>>I>>I>>>II>)33)
@mig UMI:TCCGTTGGGCAG:3
AGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTTAACGGACAGTATCTATGGCTACACCTTCGGTTCGGGGACCAGAAT

0 replies

mizraelson · 2023-09-17T20:01:38Z

mizraelson
Sep 17, 2023
Collaborator

It appears the files have undergone processing, including UMI correction, and the reads are merged. I strongly recommend retrieving the original files, so MiXCR can handle the correction internally.

The error you've mentioned seems to stem from preprocessing. While I didn't notice ' ' and '\n' symbols in your initial message, the presence of these symbols could disrupt the original FASTQ format.

Understanding the library preparation protocol is crucial, as it guides the subsequent analysis. Based on the reads you provided, my guess is that you're working with a 5'RACE protocol. Given that R1 has a short sequence that likely only covers the 5'UTR, the original UMI was probably sequenced in R1. Meanwhile, R2 starts with a C gene-specific primer and spans the CDR3 region. As a result, you can disregard the R1 file since it doesn't sufficiently cover the V gene.

I suggest using the following command:

mixcr analyze generic-amplicon \
    --species hsa \
    --rna \
    --rigid-left-alignment-boundary \
    --floating-right-alignment-boundary C \
      input_R2.fastq.gz \
      result

Nevertheless, the ideal approach involves locating the original raw FASTQ files and performing UMI correction within MiXCR. Once you have those files and a description of the library structure, I'll be happy to assist further with the command.

4 replies

BulushevaIrina Sep 18, 2023
Author

Thank you! I made this command on R2 (not raw) file and it finishes well. Here is the result. Is it good or I should do anything more for this case? Also I asked the laboratory for raw data? but I don't hope they could find it((

mixcr analyze generic-amplicon \
    --species hsa \
    --rna \
    --rigid-left-alignment-boundary \
    --floating-right-alignment-boundary C \
      Control1_R2.fastq \
      result
IMPORTANT: MiXCR will use at most 12000MB of RAM,
           use -Xmx to change automatically set heap size (i.e. -Xmx50g to set heap size to 50 Gb)

>>>>>>>>>>>>>>>>>>>>>>> mixcr align <<<<<<<<<<<<<<<<<<<<<<<
Running:
mixcr align --report result.align.report.txt --json-report result.align.report.json --preset generic-amplicon --rigid-left-alignment-boundary --floating-right-alignment-boundary C --rna --species hsa Control1_R2.fastq result.vdjca
Alignment: 0%
Alignment: 12.3%  ETA: 00:00:42
Alignment: 24.5%  ETA: 00:00:30
Alignment: 35.4%  ETA: 00:00:23
Alignment: 46.3%  ETA: 00:00:19
Alignment: 57.2%  ETA: 00:00:15
Alignment: 68.1%  ETA: 00:00:11
Alignment: 79%  ETA: 00:00:07
Alignment: 89.9%  ETA: 00:00:03
====================== report: align ======================
Analysis time: 39.63s
Total sequencing reads: 1344837
Successfully aligned reads: 1284624 (95.52%)
Coverage (percent of successfully aligned):
  CDR3: 99.88%
  FR3_TO_FR4: 0%
  CDR2_TO_FR4: 0%
  FR2_TO_FR4: 0%
  CDR1_TO_FR4: 0%
  VDJRegion: 0%
Alignment failed: no hits (not TCR/IG?): 3477 (0.26%)
Alignment failed: absence of J hits: 56736 (4.22%)
Overlapped: 0 (0%)
Overlapped and aligned: 0 (0%)
Overlapped and not aligned: 0 (0%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (NaN%)
Partial aligned reads, percent of successfully aligned: 1542 (0.12%)
Realigned with forced non-floating bound: 0 (0%)
Realigned with forced non-floating right bound in left read: 0 (0%)
Realigned with forced non-floating left bound in right read: 0 (0%)
TRA chains: 2 (0%)
TRA non-functional: 1 (50%)
TRB chains: 1284622 (100%)
TRB non-functional: 32586 (2.54%)
Trimming report:
  R1 reads trimmed left: 110734 (8.23%)
  R1 reads trimmed right: 268680 (19.98%)
  Average R1 nucleotides trimmed left: 0.28057824108051754
  Average R1 nucleotides trimmed right: 0.3801456979544733

>>>>>>>>>>>>>>>>>>>>>> mixcr assemble <<<<<<<<<<<<<<<<<<<<<<
Running:
mixcr assemble --report result.assemble.report.txt --json-report result.assemble.report.json result.vdjca result.clns
Initialization: progress unknown
Assembling initial clonotypes: 30.4%
Assembling initial clonotypes: 65.5%  ETA: 00:00:00
Assembling initial clonotypes: 99.8%  ETA: 00:00:00
Mapping low quality reads: 9%
Mapping low quality reads: 43.2%  ETA: 00:00:01
Mapping low quality reads: 77.8%  ETA: 00:00:00
Pre-clustering: progress unknown
Clustering: 0%
Clustering: 10.5%  ETA: 00:03:17
Clustering: 20.6%  ETA: 00:02:59
Clustering: 30.7%  ETA: 00:01:22
Clustering: 40.8%  ETA: 00:01:39
Clustering: 51%  ETA: 00:01:41
Clustering: 61.1%  ETA: 00:01:32
Clustering: 71.1%  ETA: 00:01:09
Clustering: 81.4%  ETA: 00:00:32
Clustering: 91.8%  ETA: 00:00:12
Building clones: 0.3%
Building clones: 10.3%  ETA: 00:01:02
Building clones: 21.7%  ETA: 00:00:55
Building clones: 32.7%  ETA: 00:00:49
Building clones: 43.2%  ETA: 00:00:43
Building clones: 54.1%  ETA: 00:00:33
Building clones: 64.8%  ETA: 00:00:26
Building clones: 75.5%  ETA: 00:00:18
Building clones: 86.1%  ETA: 00:00:10
Building clones: 96.9%  ETA: 00:00:02
===================== report: assemble =====================
Analysis time: 4.54m
Final clonotype count: 560371
Reads used in clonotypes, percent of total: 1213829 (90.26%)
Average number of reads per clonotype: 2.17
Reads dropped due to the lack of a clone sequence, percent of total: 1542 (0.11%)
Reads dropped due to a too short clonal sequence, percent of total: 14 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 55448 (4.12%)
Reads dropped with low quality clones, percent of total: 3973 (0.3%)
Aligned reads processed: 1283082
Reads used in clonotypes before clustering, percent of total: 1223647 (90.99%)
Number of reads used as a core, percent of used: 1199591 (98.03%)
Mapped low quality reads, percent of used: 24056 (1.97%)
Reads clustered in PCR error correction, percent of used: 9818 (0.8%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 3877
Clonotypes eliminated by PCR error correction: 3304
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 2 (0%)
TRA non-functional: 1 (50%)
TRB chains: 560369 (100%)
TRB non-functional: 17103 (3.05%)

>>>>>>>>>>>>>>>>>>>>>>>>> mixcr qc <<<<<<<<<<<<<<<<<<<<<<<<<
Running:
mixcr qc --print-to-stdout result.clns result.qc.txt

  Successfully aligned reads:                     95.52% [OK]
  Off target (non TCR/IG) reads:                  0.25%  [OK]
  Reads with no V or J hits:                      4.21%  [OK]
  Reads used in clonotypes:                       90.25% [OK]
  Alignments that do not cover CDR3:              0.11%  [OK]
  Alignments dropped due to low sequence quality: 0.0%   [OK]
  Clones dropped in post-filtering:               0.0%   [OK]
  Alignments dropped in clones post-filtering:    0.0%   [OK]

>>>>>>>>>>>>>>>>>>>> mixcr exportClones <<<<<<<<<<<<<<<<<<<<
Running:
mixcr exportClones result.clns result.clones.tsv
Exporting TRB
Exporting clones: 0%
Exporting clones: 33%  ETA: 00:00:02
Exporting clones: 74.9%  ETA: 00:00:00
Filtered 560369 of 560371 clones (0%).
Filtered 1213827.0 of 1213829.0 reads (0%).
Exporting TRAD
Exporting clones: 0%
Filtered 2 of 560371 clones (100%).
Filtered 2.0 of 1213829.0 reads (100%).
Analysis finished successfully.

BulushevaIrina Sep 18, 2023
Author

Also I have problem with quality of results for another samples (I know all info about preparing), but 150+150 reads qiagen kit are not suitable for qiagen preset 41+250 bp reads and I see many ALERT or prosess is interrapted. May I also ask you here about better variant of command? Or it should be better to make new discussion?

mizraelson Sep 18, 2023
Collaborator

The results look fine. You can proceed with this command.

Let's start a separate thread for the Qiagen kit in case it helps someone in the future.

BulushevaIrina Sep 18, 2023
Author

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with mixcr analyze command #1354

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Problem with mixcr analyze command #1354

BulushevaIrina Sep 16, 2023

Replies: 4 comments · 4 replies

BulushevaIrina Sep 17, 2023 Author

mizraelson Sep 17, 2023 Collaborator

BulushevaIrina Sep 17, 2023 Author

mizraelson Sep 17, 2023 Collaborator

BulushevaIrina Sep 18, 2023 Author

BulushevaIrina Sep 18, 2023 Author

mizraelson Sep 18, 2023 Collaborator

BulushevaIrina Sep 18, 2023 Author

BulushevaIrina
Sep 16, 2023

Replies: 4 comments 4 replies

BulushevaIrina
Sep 17, 2023
Author

mizraelson
Sep 17, 2023
Collaborator

BulushevaIrina
Sep 17, 2023
Author

mizraelson
Sep 17, 2023
Collaborator

BulushevaIrina Sep 18, 2023
Author

BulushevaIrina Sep 18, 2023
Author

mizraelson Sep 18, 2023
Collaborator

BulushevaIrina Sep 18, 2023
Author