Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of operations in CIGAR #1118

Open
baraaorabi opened this issue Sep 27, 2023 · 4 comments
Open

Order of operations in CIGAR #1118

baraaorabi opened this issue Sep 27, 2023 · 4 comments

Comments

@baraaorabi
Copy link

I found that Minimap2 sometimes generates an unusual order of CIGAR opertations; with some Ns, followed by some Is then followed by more Ns instead of the more expected order of having all the Ns combined together.

Here is an example that produces this result:

$ minimap2 -c -x splice -t 32 data/refs/homo_sapiens.dna.fa <(echo -e ">1\nCATGGACTCATCCAGGATACATCAGGAAC
TCAATGGCAAAAAAATCCAATTTAAAAATGAACAAAATACCTAAATAGACACCTCTCAAAGAAGAAACACAAGGCCAGGCGCAGTGGCTCATGCCTATAATCCCAGCACTTTGGGAGGCTGAGGCAGGAGGATTGCCTGAGCTCGGGAGTTCGAGACC
AGCCTGGGCAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCTGGGCATGGTGGCGCACGCGTGTACGCCAGCTACTGGAAGGCTGAGACAGGAGAATCACTGGAACCTGGGAGGCGGAGGTTTCAGTAAGCCAAGATTGCACCACTG
CACTCCAGCCTGGGTGACAAAGCAAGACTCCATCTCAAAAAAAAAAAAAAAAAAAAAAAAAGCTAGATTCACAACATCGATGTTCCGGAGATCGGAAGAGCGTCGTGTAGAGGTTAAACACCCAAGCAGACGCCGCAATATCAGCACCAACAGAAAAC
AGACGACTACAAACGGAATCGAGTCACGGTAGGCGATAATCATAAACACCAGCTCCTAGCAACTGAACGAAGCACACTGATGACAAGAAAGTTGTCGGTGTCTTTGTGACTTGCCTGTCGCTCTATCTTCGGCGTCTGCTTGGGTGTTTAACCTCTGC
CACGACGCTCTTCCGATCTCAGCAGCAGATGCTTCTATTCTCTCGGTTTTTTTTTTTTTTTTTTTTTTTTTTACATTTCAAAATATTTAACAAAGTCAAACTTTCTCACCATGGTTTCAGTTTAGTGGAAGCATTTACTAAAGTACAAAAAGCCTCAG
AAAACGTGATGGGCAATATCTGGGCCCCAAGTTACCAGAAAGGGCACCAGCCAATATAGCACTGGCAGAGGTTTTCATGGGATGTCGCTTGTTTGATGAGCAGCTCAACTTGCGTTGGAACATTCAAAGTGTCATCATGAGAGAAGTCCCGACCAGTG
AGCTTATCTCTGAACCCTGTTAATAATCTGATAGCTTTTCTTCCTGGCGTGTACTCTGCGTTGATACCACCAAGCTAGGTTAAACACCCAAGCAGACGCC")
[M::mm_idx_gen::89.011*1.72] collected minimizers
[M::mm_idx_gen::99.459*3.62] sorted minimizers
[M::main::99.459*3.62] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::106.313*3.45] mid_occ = 765
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 194
[M::mm_idx_stat::109.605*3.38] distinct minimizers: 167225302 (35.46% are singletons); average occurrences: 6.030; average spacing: 3.074; total length: 3099750718
1       1077    101     1015    +       1       248956422       10975125        11109332        438     920     39      NM:i:482        ms:i:332        AS:i:182      nn:i:0  ts:A:-  tp:A:P  cm:i:50 s1:i:219        s2:i:101        de:f:0.0894     rl:i:320        cg:Z:61M2I99M2D1M1I9M1D24M183N437I131864N72M1I11M2D14M1I32M680N106M1003N15M1I18M1D9M\

You can see that after the first exon, 101S61M2I99M2D1M1I9M1D24M, there are 183N skip, followed by 437I insertion, followed by another skip of 131864N.

@baraaorabi
Copy link
Author

P.S.: This might be related to #502?

@lh3 lh3 added the enhancement label Nov 2, 2023
@lh3
Copy link
Owner

lh3 commented Nov 2, 2023

Yes, related but not the same. We may move 437I around in this case, though there will still be a CIGAR like xxxIyyyN, which is hard to resolve.

@baraaorabi
Copy link
Author

Can it not be resolved by linear time postprocessing? Something like anochoring the (mis)matches in the CIGAR string, and then sorting-and-merging any other CIGAR operations between (mis)match anchors?

@lh3
Copy link
Owner

lh3 commented Nov 3, 2023

No, can't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants