GitHub

MULTINODE ON PBS CLUSTER ENVIRONMENT

This repo contains instructions to launch multinode scripts on a pbs cluster, setting all the ENV_VARIABLES needed for a multi-gpu script with pytorch or pytorch Lightning.

STEPS TO TAKE

test.sh: launches the pbs command and set up python script.
multinode_scripts/multinode.sh: finds the environment variables to be defined. Also launches the mpirun command from master node. Check line 75 which loades the appropriate module (on my cluster) and the - prefix option of the mpirun command at line 71 (probably need to change it on a different cluster).
multinode_scripts/run.sh: Take care of launching the final script on each node, adding the final env variable.

ENV VARIABLES TO BE SET

MASTER_ADDR: address of the master node.
MASTER_PORT: free communication port on the master node.
WORLD_SIZE: total number of processes used (usually num_gpu * num_nodes).
NODE_RANK: number rank, different for each node (master is usually 0).

USEFUL RESOURCES

USEFUL `PBS` COMMANDS

qstat -fQ: see permissions of Queues (e.g. max num of parallel jobs)
pbsnodes -aSj | grep -F 'gnode' | grep -F 'free': see all free gnode.
qstat -wan1 -u $user: monitor all launched jobs and requested resources by $user.
qstat -u $user | grep "$user" | cut -d"." -f1 | xargs qdel: kill all jobs of $user.
qstat -$user | grep "R" | cut -d"." -f1 | xargs qdel: kill all the running jobs of $user.
qstat -u $user | grep "Q" | cut -d"." -f1 | xargs qdel: kill all the queued jobs of $user.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
multinode_scripts		multinode_scripts
old_version		old_version
.gitignore		.gitignore
my_test.py		my_test.py
readme.md		readme.md
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MULTINODE ON PBS CLUSTER ENVIRONMENT

STEPS TO TAKE

ENV VARIABLES TO BE SET

USEFUL RESOURCES

USEFUL `PBS` COMMANDS

About

Releases

Packages

Languages

SerezD/multinode_pbs

Folders and files

Latest commit

History

Repository files navigation

MULTINODE ON PBS CLUSTER ENVIRONMENT

STEPS TO TAKE

ENV VARIABLES TO BE SET

USEFUL RESOURCES

USEFUL PBS COMMANDS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

USEFUL `PBS` COMMANDS

Packages