Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence and reverse complement generating different unitigs #10

Open
krobison13 opened this issue Dec 8, 2021 · 1 comment
Open

Sequence and reverse complement generating different unitigs #10

krobison13 opened this issue Dec 8, 2021 · 1 comment

Comments

@krobison13
Copy link

If I give themisto-build a file with just a sequence and its reverse complement, extract-unitigs is generating two different unitigs -- is this the expected behavior?

e.g. if 80.fna contains

>k80
ATCAGCAGCGACATGGCGGTCATCACCGTAGTCGAGGCAAGCAATAATGGACGGCGCCCG
ACGTGGTCGATGATCGCAGA
>rc.k80
TCTGCGATCATCGACCACGTCGGGCGCCGTCCATTATTGCTTGCCTCGACTACGGTGATG
ACCGCCATGTCGCTGCTGAT

and then run
themisto build -k 31 -i 80.fna -o 80.k31 --temp-dir .
themisto extract-unitigs -i 80.k31 --colors-out 80.k31.colors --gfa-out 80.k31.gfa

I get a file with two lines in the colors file and two segments in the GFA file

H VN:Z:1.0
S 86 ATCAGCAGCGACATGGCGGTCATCACCGTAGTCGAGGCAAGCAATAATGGACGGCGCCCGACGTGGTCGATGATCGCAGA
S 77 TCTGCGATCATCGACCACGTCGGGCGCCGTCCATTATTGCTTGCCTCGACTACGGTGATGACCGCCATGTCGCTGCTGAT

@jnalanko
Copy link
Collaborator

jnalanko commented Dec 8, 2021

Yes, this is expected. Our index structure is not aware of reverse complements.

We could add a flag to extract-unitigs to compute the bidirected de Bruijn graph for better interoperability with other tools. Meanwhile, you can work around this by concatenating the input with its reverse complement before building the index. This will create two copies for each unitig: one for the forward and one for the reverse complement (except for those that are reverse complements of themselves). You can extract the bidirected de Bruijn graph from this with some post processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants