-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I submit a subset of kmers from the reads? #30
Comments
By the way, thanks for the algorithm and tool. It is very elegant. |
Hi Richard, and thanks for the kind words! For now, you can specify a minimum/maximum kmer abundance threshold using the options Otherwise, the input formats of Bcalm are fasta/fastq reads, or a custom HDF5 file that contains counted k-mers. That format isn't that easy to create from scratch, but not impossible for us to maybe write a custom program for that. In that format, Bcalm expects that kmers are partitioned according to their minimizer. Giving as input a fasta file with one kmer per sequence would be an acceptable solution, bearing in mind that abundance would be lost (unless the kmer is repeated as many times as its abundance, which isn't very elegant). What's the volume of kmers that you wish to give to Bcalm: 10's of millions, billions? |
Hello Rayan et al,
I want the unitigs induced by a subset of the kmers from a read set. Is there any way to create a file of kmers that you can read into bcalm2, and then compact into unitigs?
In fact I want to pick out kmers that I think are single copy in a diploid, based on depth. So a simple way to do that would be for you to allow me to give a maximum as well as a minimum copy number for each kmer. I would then set those by looking at a histogram. But ultimately I would like the freedom to correct the kmer counts by GC content (of their parent read pairs), so it would be more general to allow me to pass you a set of kmers.
I guess I could make a fasta file out of them, with one kmer per sequence...
Thanks, Richard
The text was updated successfully, but these errors were encountered: