-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with extra large k
-mers
#204
Comments
Hello Jamshed, I have tried to reproduce this using the current master branch, but it seems the results are correct:
Could you please show me the full command line and also confirm if you are using the same code as I do (current master branch). Best, |
Hi Marek, I'm using the latest code from the master branch. I'm using the following command: k=1001; /usr/bin/time ./KMC/bin/kmc -v -k$k -fm -ci1 -t1 ./ecoli.fna /mnt/scratch4/jamshed/ecoli.k$k.kmc ./temp/ This is the output log:
|
Hi, thanks. I think I know the cause that I have different results. |
Hello Marek, Thanks. I made a clean installation by directly setting |
Hello Jamshed, it seems the bug was quite complex to find, but I think I have it now and it's fixed with 726ecbf. |
Thanks Marek! It works correctly on some example datasets where the earlier commits produced incorrect results. |
Hello Marek,
I was wondering if KMC could be used to enumerate very large
k
-mers, e.g. withk
>= 1000. The need arose from this issue: COMBINE-lab/cuttlefish#22.To support KMC upto
k
= 1024, I've compiled it usingmake CFLAGS=-DMAX_K=1024
. Now, some test results for the E. Coli genome reference:k
= 31:k
= 501:k
= 1001:Note that, we can compute the total length of the input sequence from the KMC stats results as the following quantity:
Total no. of k-mers + Total no. of sequences * (k - 1)
. Withk
= 31, 501, and 1001, we get the input lengths as 4641652, 4641652, and 3952267. If the input sequences do not have anyN
/ indeterminate characters, then this quantity should be the same for differentk
's. And we observe a mismatching value withk
= 1001.On top of that, I'm also observing different (and indeterminate) results for
k
= 1001 if the number of threads is increased beyond 1. For example, the following are two different runs' results with-t16
:I wonder if you might have any insights on this behavior?
Regards!
The text was updated successfully, but these errors were encountered: