Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaling issues due to prolog tagging api #34

Open
rvencu opened this issue Jul 22, 2022 · 3 comments
Open

scaling issues due to prolog tagging api #34

rvencu opened this issue Jul 22, 2022 · 3 comments

Comments

@rvencu
Copy link
Contributor

rvencu commented Jul 22, 2022

We got into scaling issue with the tagging in prolog script

I understand the prolog is ran at every step and when many nodes are involved the job fails with timeouts

we need to find another place to do the tagging and I understand that the comment is job related but some other tags can be done only once when the instances are created, either because of the min value in the configuration or created by slurm

I am looking at places where this could be done.

maybe it can be done at the headnode instead in the PrologSlurmctld https://slurm.schedmd.com/prolog_epilog.html

@rvencu
Copy link
Contributor Author

rvencu commented Jul 22, 2022

some comment on this topic from slurm support team

Considering the nature of this command in that it needs to run in parallel but async from the other prologs/epilogs. I think a SPANK plugin would fit better than a PREP plugin and avoid the need to write any non-trivial code.

For instance, this is a popular plugin to use lua with SPANK:

https://github.com/stanford-rc/slurm-spank-lua

I think the slurm_spank_init_post_opt() is likely the function to call the tagging command.

@rvencu
Copy link
Contributor Author

rvencu commented Jul 23, 2022

looking more closely I notice the loop in the prolog script. the prolog script runs on every compute node and at every step execution and

  • RPC calls to the headnode (with scontrol) is discouraged
  • tagging all nodes from every node is making the problem exponential (n ^ 2)

I think we can still keep this in the prolog, find own instance ID with curl and make the node tag itself with a single call. there will be only n calls to the tagging API

not perfect like async tagging but much better anyway I think

@rvencu
Copy link
Contributor Author

rvencu commented Jul 24, 2022

I changed the prolog script to PrologSlurmctld and any job larger than 30 nodes crashes

Then I tried this approach inside the prolog.sh

host=$(curl http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 create-tags --region $cfn_region --resources ${host} --tags Key=aws-parallelcluster-username,Value=${SLURM_JOB_USER} Key=aws-parallelcluster-jobid,Value=${SLURM_JOBID} Key=aws-parallelcluster-partition,Value=${SLURM_JOB_PARTITION}

This works for 40 nodes, will test with larger jobs too. But could not find a way to transport the comments yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant