Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should we benchmark? #232

Open
ccbaumler opened this issue Dec 20, 2022 · 3 comments
Open

What should we benchmark? #232

ccbaumler opened this issue Dec 20, 2022 · 3 comments

Comments

@ccbaumler
Copy link
Contributor

Following the same process as sourmash-bio/sourmash#2410, we will benchmark the charcoal workflow with the demo directory and/or the six signatures included in sourmash-bio/sourmash#2410. Suggesting to:

  1. Run the demo repo
  2. Run each sequences alone
  3. Run a variety of sequences from small to large sets
  4. Run the all six together

It may be interesting to also compare the results of sourmash search --containment to `charcoal.contigs_list_contaminents.py in this repo.

@ccbaumler
Copy link
Contributor Author

It would also be interesting to compare the accuracy of sourmash gather and genome-grist MinSetCov taxonomic outputs with and without charcoal.

@ctb
Copy link
Member

ctb commented Dec 21, 2022

It sounds like you might be trying to benchmark both computational performance and classification performance. Those are pretty different things.

I don't think that charcoal has any individually expensive steps or computationally complex scripts that are part of it; it's just the workflow overall that involves an awful lot of steps, much like genome-grist. That may change your benchmarking strategy.

@ccbaumler
Copy link
Contributor Author

I agree! They are completely different benchmarks. Mostly wanted to jot down the notion before it left me forever.
Additionally, since we will be writing an analytical benchmark for computational results, we will have a foundation to come back to in the future when we are ready for a biological benchmark.

Would you suggest forking and adding benchmark directives throughout the snakefile instead of a global benchmark?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants