-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence and reverse complement generating different unitigs #10
Comments
Yes, this is expected. Our index structure is not aware of reverse complements. We could add a flag to extract-unitigs to compute the bidirected de Bruijn graph for better interoperability with other tools. Meanwhile, you can work around this by concatenating the input with its reverse complement before building the index. This will create two copies for each unitig: one for the forward and one for the reverse complement (except for those that are reverse complements of themselves). You can extract the bidirected de Bruijn graph from this with some post processing. |
If I give themisto-build a file with just a sequence and its reverse complement, extract-unitigs is generating two different unitigs -- is this the expected behavior?
e.g. if 80.fna contains
and then run
themisto build -k 31 -i 80.fna -o 80.k31 --temp-dir .
themisto extract-unitigs -i 80.k31 --colors-out 80.k31.colors --gfa-out 80.k31.gfa
I get a file with two lines in the colors file and two segments in the GFA file
H VN:Z:1.0
S 86 ATCAGCAGCGACATGGCGGTCATCACCGTAGTCGAGGCAAGCAATAATGGACGGCGCCCGACGTGGTCGATGATCGCAGA
S 77 TCTGCGATCATCGACCACGTCGGGCGCCGTCCATTATTGCTTGCCTCGACTACGGTGATGACCGCCATGTCGCTGCTGAT
The text was updated successfully, but these errors were encountered: