Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define threads for Nextclade rules #459

Closed
joverlee521 opened this issue Jul 9, 2024 · 8 comments · Fixed by #460
Closed

Define threads for Nextclade rules #459

joverlee521 opened this issue Jul 9, 2024 · 8 comments · Fixed by #460
Assignees
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Prompted by #456 (comment)

From my reading of Snakemake docs on threads:

Also, setting a threads: maximum is required to achieve parallelism in tools that (often implicitly and without the user knowing) rely on an environment variable for the maximum of cores to use.

Makes me think we need to define threads of the Nextclade Snakemake rules in order for Nextclade to parallelize.

Proposing that we split the workflow cores between the two Nextclade rules:

diff --git a/workflow/snakemake_rules/nextclade.smk b/workflow/snakemake_rules/nextclade.smk
index 8d24c23..e97c8aa 100644
--- a/workflow/snakemake_rules/nextclade.smk
+++ b/workflow/snakemake_rules/nextclade.smk
@@ -193,6 +193,8 @@ rule run_wuhan_nextclade:
             temp(f"data/{database}/nextclade.translation_{gene}.upd.fasta")
             for gene in GENE_LIST
         ],
+    threads:
+        workflow.cores * 0.5
     benchmark:
         f"benchmarks/run_wuhan_nextclade_{database}.txt"
     shell:
@@ -224,6 +226,8 @@ rule run_21L_nextclade:
         sequences=f"data/{database}/nextclade_21L.sequences.fasta",
     output:
         info=f"data/{database}/nextclade_21L_new_raw.tsv",
+    threads:
+        workflow.cores * 0.5
     benchmark:
         f"benchmarks/run_21L_nextclade_{database}.txt"
     shell:
@joverlee521 joverlee521 added the enhancement New feature or request label Jul 9, 2024
@corneliusroemer
Copy link
Member

Good idea but I don't think we need to set these threads. All they do, in my experience, is limit parallelism.

Have you checked that there's a need for this? I'm pretty sure things have always been running in parallel just fine.

Nextclade uses all cores by default, unless you tell it through -j otherwise. It doesn't read env variables AFAIK

@joverlee521
Copy link
Contributor Author

Have you checked that there's a need for this? I'm pretty sure things have always been running in parallel just fine.

Hmm, I'm checking with a small example below. Note that rule a does not define threads and is defaulting to 1 core.

rule all:
    input:
        b = "b.txt",
        c = "c.txt"


rule a:
    output: touch("a.txt")
    shell:
        """
        echo rule a nproc "$(nproc)"
        """


rule b:
    input: "a.txt"
    output: touch("b.txt")
    threads: workflow.cores * 0.5
    shell:
        """
        echo rule b nproc "$(nproc)"
        """


rule c:
    input: "a.txt"
    output: touch("c.txt")
    threads: 2
    shell:
        """
        echo rule c nproc "$(nproc)"
        """

Running with nextstrain build --cpus 8

$ nextstrain build --cpus 8 snakemake/threads/ --forceall --quiet
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job      count
-----  -------
a            1
all          1
b            1
c            1
total        4

Select jobs to execute...

        echo rule a nproc "$(nproc)"
        
rule a nproc 1
Touching output file a.txt.
Select jobs to execute...

        echo rule b nproc "$(nproc)"
        

        echo rule c nproc "$(nproc)"
        
rule b nproc 4
Touching output file b.txt.
rule c nproc 2
Touching output file c.txt.
Select jobs to execute...
Complete log: .snakemake/log/2024-07-09T175422.806795.snakemake.log

joverlee521 added a commit that referenced this issue Jul 9, 2024
@corneliusroemer
Copy link
Member

I still don't think this is necessary. I've never used nproc and it seems to report something that nextclade doesn't care about - at least on macOS.

See how these three rules run all in parallel, even though one doesn't mention threads at all.

They all run the same time, using as many jobs as there are cores (unless limited by -j). So the threads variable has no effect other than limiting parallelness.

Though it does indeed seem to set some env variable that makes nproc report a reduced number of cores. But this is not done through cgroups, so probably has no effect on programs that don't look at those env variables.

Snakefile:

rule all:
    input:
        a="a.txt",
        c="c.txt",
        d="d.txt",


rule a:
    input:
        "input.fasta",
    output:
        touch("a.txt"),
    shell:
        """
        echo rule a nproc "$(nproc)"
        time nextclade run -d nextstrain/mpox {input} --output-all nextclade_a
        """


rule c:
    input:
        "input.fasta",
    output:
        touch("c.txt"),
    threads: 4
    shell:
        """
        echo rule c nproc "$(nproc)"
        time nextclade run -d nextstrain/mpox {input} --output-all nextclade_c
        """


rule d:
    input:
        "input.fasta",
    output:
        touch("d.txt"),
    threads: 4
    shell:
        """
        echo rule d nproc "$(nproc)"
        time nextclade run -j {threads} -d nextstrain/mpox {input} --output-all nextclade_d
        """

Output:

$ snakemake --force
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Using shell: /opt/homebrew/bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job stats:
job      count
-----  -------
a            1
all          1
c            1
d            1
total        4

Select jobs to execute...
Execute 3 jobs...

[Tue Jul  9 20:22:54 2024]
localrule a:
    input: input.fasta
    output: a.txt
    jobid: 1
    reason: Forced execution
    resources: tmpdir=/var/folders/qf/4kkcfypx0gbfb0t9336522_r0000gn/T


[Tue Jul  9 20:22:54 2024]
localrule d:
    input: input.fasta
    output: d.txt
    jobid: 3
    reason: Forced execution
    threads: 4
    resources: tmpdir=/var/folders/qf/4kkcfypx0gbfb0t9336522_r0000gn/T


[Tue Jul  9 20:22:54 2024]
rule a nproc 1
localrule c:
    input: input.fasta
    output: c.txt
    jobid: 2
    reason: Forced execution
    threads: 4
    resources: tmpdir=/var/folders/qf/4kkcfypx0gbfb0t9336522_r0000gn/T

rule d nproc 4
rule c nproc 4

real    0m3.644s
user    0m9.996s
sys     0m0.405s
Touching output file a.txt.
[Tue Jul  9 20:22:58 2024]
Finished job 1.
1 of 4 steps (25%) done

real    0m3.654s
user    0m9.974s
sys     0m0.423s
Touching output file c.txt.
[Tue Jul  9 20:22:58 2024]
Finished job 2.
2 of 4 steps (50%) done

real    0m4.888s
user    0m9.459s
sys     0m0.376s
Touching output file d.txt.
[Tue Jul  9 20:22:59 2024]
Finished job 3.
3 of 4 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Jul  9 20:22:59 2024]
localrule all:
    input: a.txt, c.txt, d.txt
    jobid: 0
    reason: Forced execution
    resources: tmpdir=/var/folders/qf/4kkcfypx0gbfb0t9336522_r0000gn/T

[Tue Jul  9 20:22:59 2024]
Finished job 0.
4 of 4 steps (100%) done
Complete log: .snakemake/log/2024-07-09T202254.489461.snakemake.log

Note that the slowest is the one that gets 4 threads but limits nextclade jobs via -j 4.

This is slower than not defining threads and not passing jobs for nextclade. So the addition of jobs doesn't make anything faster, just adds lines of codes.

@corneliusroemer
Copy link
Member

I outputted env for each rule, and these are the differences:

$ diff env_a.txt env_d.txt 
10c10
< NUMEXPR_NUM_THREADS=1
---
> NUMEXPR_NUM_THREADS=4
28c28
< OPENBLAS_NUM_THREADS=1
---
> OPENBLAS_NUM_THREADS=4
40,41c40,41
< VECLIB_MAXIMUM_THREADS=1
< GOTO_NUM_THREADS=1
---
> VECLIB_MAXIMUM_THREADS=4
> GOTO_NUM_THREADS=4
63c63
< OMP_NUM_THREADS=1
---
> OMP_NUM_THREADS=4
69c69
< MKL_NUM_THREADS=1
---
> MKL_NUM_THREADS=4

So it looks like nproc reports what one of these variables says, but these variables are ignored by Nextclade and hence threads only has the effect of limiting parallelity (as the default is 1, if ommitted)

@joverlee521
Copy link
Contributor Author

Hmm, I'm getting confused. Isn't there two levels of parallelism going on?

  1. Parallel Snakemake rules
  2. Parallel Nextclade processes within a Snakemake rule

As you stated, [1] is limited by threads. However I thought [2] relies on Snakemake reserving the number of cores defined by threads for the single rule? Or am I completely misunderstanding how Snakemake works under the hood?

@joverlee521
Copy link
Contributor Author

Or am I completely misunderstanding how Snakemake works under the hood?

Chatted with @tsibley about this in our 1:1 today and I am misunderstanding. Snakemake's threads are just "guidelines" where we use it, so it does need to be passed to nextclade run -j {threads} to be respected by Nextclade. Just defining threads is not enough as Nextclade would still oversubscribe.

I am still curious whether limiting the threads will be faster than letting the two nextclade jobs compete for resources, so I'll update the workflow to pass in threads to Nextclade tomorrow and see how that affects the runtime.

@joverlee521 joverlee521 reopened this Jul 9, 2024
@joverlee521 joverlee521 self-assigned this Jul 9, 2024
@joverlee521
Copy link
Contributor Author

I am still curious whether limiting the threads will be faster than letting the two nextclade jobs compete for resources, so I'll update the workflow to pass in threads to Nextclade tomorrow and see how that affects the runtime.

I took a quick look at run times for this week after adding threads. The runtime when using the Nextclade cache is fast enough that there's no noticeable difference.

However, there is a difference in a full Nextclade run. Comparing just the run_wuhan_nextclade job runtimes:

@joverlee521
Copy link
Contributor Author

With plans to ignore cache with new versions, I think we will do full runs more often and so it makes sense to keep the defined threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants