Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump CPUs for fetch-and-ingest workflows #447

Merged
merged 1 commit into from
Jun 12, 2024
Merged

Bump CPUs for fetch-and-ingest workflows #447

merged 1 commit into from
Jun 12, 2024

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Jun 11, 2024

I've been only bumping the memory but not the CPUs for the fetch-and-ingest workflows. Might as well use all the compute that we are paying for. GenBank should be using c5.9xlarge and GISAID should be using c5.12xlarge, so bumping CPUs to match the instances.¹

Maybe this will magically help #446?

¹ https://aws.amazon.com/ec2/instance-types/c5/

Checklist

I've been only bumping the memory but not the CPUs for the
fetch-and-ingest workflows. Might as well use all the compute that we
are paying for. GenBank should be using c5.9xlarge and GISAID should be
using c5.12xlarge, so bumping CPUs to match the instances.¹

Maybe this will magically help #446?

¹ <https://aws.amazon.com/ec2/instance-types/c5/>
@joverlee521
Copy link
Contributor Author

Last nights run finished under 12h 🎉
I'm going to dig into the logs a little bit, but at least this is an improvement so I'm merging this ahead of today's run.

@joverlee521 joverlee521 requested a review from a team June 12, 2024 17:06
@joverlee521 joverlee521 merged commit de35468 into master Jun 12, 2024
10 checks passed
@joverlee521 joverlee521 deleted the bump-cpus branch June 12, 2024 17:06
@corneliusroemer
Copy link
Member

We could probably scrap those CPU limits altogether. All they do is make snakemake restrict the number of jobs run in parallel.

This has obvious downsides and only rare benefits.

The CPU scheduler figures out ways to give all jobs some share when oversubscribed. We don't really need snakemake to do pessimistic scheduling.

@joverlee521
Copy link
Contributor Author

We could probably scrap those CPU limits altogether.

For the aws batch runtime, the --cpus option is used to override the default nextstrain-job definition, which is only 4 cpus.

@tsibley
Copy link
Member

tsibley commented Jun 14, 2024

The CPU scheduler figures out ways to give all jobs some share when oversubscribed.

Yes, progress will still be made with oversubscription, but the run time of the whole workflow will increase, sometimes substantially, depending on the kind of workload. It's still better not to oversubscribe when you can avoid it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants