MiXCR v4.5.0
🚀 New features
Multi-chain clone assembly for single-cell data
Now MiXCR calculates Heavy-Light antibody and Alpha-Beta and Gamma-Delta TCR combined clones for single-cell data. Two new commands were introduced to enable this functionality:
groupClones
: calculates multi-chain clones from assembled clonotypes and writes result in a binary format;exportCloneGroups
: export information about combined clonotypes.
All single-cell presets now automatically produce combined multi-chain output in both binary and textual formats, see files with names matching *.clone.groups.tsv
pattern in the output folder.
New characteristics in clonotype export
- Export biochemical properties of gene regions with
-biochemicalProperty <geneFeature> <property>
or-baseBiochemicalProperties <geneFeature>
export options. Available in export for alignments, clones and SHM tree nodes. Available properties:Hydrophobicity
,Charge
,Polarity
,Volume
,Strength
,MjEnergy
,Kf1
,Kf2
,Kf3
,Kf4
,Kf5
,Kf6
,Kf7
,Kf8
,Kf9
,Kf10
,Rim
,Surface
,Turn
,Alpha
,Beta
,Core
,Disorder
,N2Strength
,N2Hydrophobicity
,N2Volume
,N2Surface
. - Export isotype with
-isotype [<(primary|subclass|auto)>]
- Export
-mutationRate [<gene_feature>]
inexportShmTreesWithNodes
,exportClones
andexportCloneGroups
command: number of mutations relative to corresponding germline divided by the target sequence size. ForexportClones
andexportCloneGroups
CDR3 is not included in calculation.
Support for wider set of input formats
- Support for
cram
files as input foranalyze
andalign
commands. Optionally, a reference to the genome can be specified by--reference-for-cram
- Fixed usage of BAM input for
analyze
andalign
, if file contains both paired and single reads
Algorithm enhancements
- Global consensus assembly algorithm, applied in
assemble
to collapse UMI/Cell groups into contigs, now have much better seed selection empirical step for multi-consensus assembly scenarios. This significantly increases sensitivity during assembly of secondary consensuses from the same group of sequences. - New constrain in low-quality reads mapping procedure preventing cross-cell read mapping.
📚 Preset updates
- Additional improvement of clone filters in
10x-sc-xcr-vdj
preset. - Tag pattern upgrade for
cellecta-human-rna-xcr-umi-drivermap-air
. Now UMI includes a part of the C-gene primer to increase diversity, and R2 is also used for payload. - Assembling feature fix for
irepertoire-human-rna-xcr-repseq-plus
preset. Now{CDR2Begin:FR4End}
. - New preset for BD full-length protocol with enhanced beads V2 featuring B384 whitelists:
bd-sc-xcr-rhapsody-full-length-enhanced-bead-v2
. - New preset for Takara Bio SMART-Seq Mouse TCR (with UMIs):
takara-mouse-rna-tcr-umi-smarseq
. - Presets for new Cellecta kits:
cellecta-human-dna-xcr-umi-drivermap-air
,cellecta-human-rna-xcr-full-length-umi-drivermap-air
,cellecta-mouse-rna-xcr-umi-drivermap-air
. - Presets for iRepertoire RepSeq+ kits with UMI:
irepertoire-mouse-rna-xcr-repseq-plus-umi-pe
,irepertoire-human-rna-xcr-repseq-plus-umi-se
,irepertoire-human-rna-xcr-repseq-plus-umi-pe
. isotype
field added toexportClones
for presets supporting isotype identification.- Split by C-gene enabled in
thermofisher-human-rna-igh-oncomine-lr
andcellecta-human-rna-xcr-umi-drivermap-air
presets to facilitate isotype separation. - Default consensus assembly parameters
maxNormalizedAlignmentPenalty
andaltSeedPenaltyTolerance
are adjusted to increase sensitivity. - The
--split-by-sample
option is now set totrue
by default for allalign
presets, as well as all presets that inherit from it. This new default behavior applies unless it is directly overridden in the preset or with--dont-split-by-sample
mix-in. exportAlignments
now reports UMI and/or Cell barcodes by default for presets with barcodes.
🛠️ Minor improvements & fixes
- Fixed possible crash with
--dry-run
option inanalyze
- More informative help message that appears when using a deprecated preset and incorrectly suggests using
--assemble-contigs-by
instead of--assemble-clonotypes-by
. - When split-by-tags is enabled,
exportClone
andexportShmTreesWithNodes
now output read count as the sum of reads for given tags selection, more complicated formula was used in previous versions exportAlignments
by default now include the columntopChains
.exportClones
function reportstopChains
for single cell presets.- Fixed calculation of
geneFamilyName
for genes likeIGHA*00
(without the number before*
symbol) - Better formatting in
listPresets
command. Added grouping by vendor, labels and optional filtering - Validation of input types in
align
oranalyze
by given tag pattern