-
-
Notifications
You must be signed in to change notification settings - Fork 29
Home
Greg Wilson edited this page Jun 20, 2015
·
1 revision
- Remote Connections and File Transfers
- tasks: ssh, scp, tar
- Introduction to the Cluster, Queue, and Partitions/Allocations
- tasks: view the queue, view list of servers, view partition/allocation info
- Submit and Monitor a Job
- tasks: write a submit file, submit it, view it the queue, make sense of log/out/err files
- Interactive Sessions
- tasks: run an command-line job through the scheduler, run a GUI tool within an interactive session
- Software and MPI
- tasks: view modules or software installed; submit a job using installed software; install your own software, submit a job that depends on MPI
- File Handling
- tasks: submit jobs with each of 2-3 file availability methods (different file systems, etc.) (still in progress)
- batch scripts/queues
- monitoring jobs use of resources - cpu, memory, I/O
- parallel/ concurrent serial strategies/workflows
- data movement (rsync, globus, tar, etc)
- evaluating program scaling
- diagnosing job failures
- Accessing the cluster/transferring data to the cluster
- Understanding the queue/viewing the status of your job
- Submitting jobs
- Understanding types of parallelism
- Writing a shell script that launches independent parallel jobs
- Introduction to how cluster is structured, login nodes, backend nodes, job scheduler. (explain the jargon before diving in head first)
- SSH, file system (log in, ls/df filesystem building on unix shell lessons)
- file transfer, scp, rsync, globus (transfer files and download files to laptop, setup globus account & transfer files)
- interactive session / debugging, software modules (run interactive job, load modules, queue/partition view)
- batch job submission, job monitoring (batch script creation, submit batch job)
- job array(create job array script and submit, review output)
- visualization IPython, RStudio, Visit...(run interactive session and launch visual tools)
- Logging in/SSH keys
- login via ssh, keygen, copying keys, scp
- Submitting jobs & understanding queues/resources
- submit batch job
- submit interactive job
- submit jobs to different queues
- submit jobs using doifferent resources
- Job control
- submit exmplaes
- kill jobs
- get job status
- modify jobs
- hold/resume
- MPI & multithreaded
- submit jobs having different shapes
- talk about run diufference on clustser
- for loops / job arrys
- submit jobs using bash for loop
- submit jobs via job arrays
- job dependencies
- submit jobs using dependency conditions
- ADV: submit job array dependent arrays
- AWS - half hour intro, description of nodes and clusters, and set-up of AWS account, possibly with preinstalled software so all the students, regardless of their home institution, have access to the same kind of environment / topology,login with SSH keys, key management....
- Deploy code on AWS from where it was stored on git repository
- Parallel workfolow (data preprocessing / preparation, planning of assignment of data subsets to parallel routines)
- parallel processing in R (and possibly other packages) - extension of modular programming because MP is a way of organization that makes it easier
- how to run the code - shell script that distributes the execution
- How to have the parallel executing programs write their results to a single database
- "Intro" - what is a node, what is a cluster (diagrams!)
- Logging into remote servers (probably ssh) -> activity: log into the training cluster
- Dealing with files - where to store, access times, restrictions of file size, backups -> activity: compare accessing a file on one system versus another
- Queuing system -> activity: write a submission script (and submit?)
- MPI concepts - what is a task, comparison to shared memory -> activity: write a basic MPI script?