-
Notifications
You must be signed in to change notification settings - Fork 23
2014 05 15
Andre Merzky edited this page May 22, 2014
·
3 revisions
-
Who: Mark, Matteo, Shantenu, Ole, AndreM, Antons
-
Agenda:
- open TODOs
- TODO OW: get quantitative EnMDTK requirements
- TODO AM: micro benchmarks for RP
- TODO AM: re-check 2048 ceiling
- DONE OW: provide MPI prototype for stampede
- TODO MS: base MPI agent on this
- TODO OW: start on Cray agent, based on ATs scripts
- DONE AM/OW: set up regular 10min meets between AT and OW
- TODO AT: expand scripts toward MPI jobs, and further to inter-node MPI jobs
- TODO SJ: check pipeline example (MTMS)
- DONE MS: repost data proposal on list
- TODO ALL: provide feedback
-
MS-7 checkpoints:
- May 8:
- OW: simple MPI support for Stampede complete (prototype)
- AT, OW: draft architecture for Cray agent
- May 15:
- MS: implementation proposal for MPI support beyond stampede
- AM: MPI integration tests set up
- OW: first prototype of non-MPI agent for cray
- ALL: agree on implementation plan for Cray agent
- May 8:
- status reports
- discussion on Mark's data proposal
- benchmarking plans
- (?) what role plays scheduling on agent level?
- open TODOs
-
Notes:
-
TODO MS, OW: check module load / shell startup issues
-
ibrun vs. mpiexec
-
TODO OW: bootstrap for agent on archer
-
mongodb on headnode of archer
-
DONE OW: email about port forwarding to Iain(?)
-
Antons integrates scripts in agent, expands towards MPI / aprun
-
Antons: might not need agent hierarchy
-
data feedback:
- OW: clunky, decoupled from CU (cannot refer to data from other CUs)
- MS: it acts within the sandbox, which was not possible before; its a building block
- MS: CU deps can / will be addressed above
- OW: actual deps are out of scope anyways...
- MS: next: higher abstraction, implicit data locations for intermediate data
- MS: lifetime management of staging are is up to higher levels
- implementation: now agent can also pull data and copy/link/move
- adds saga dependency to agent: should be optional then
- OW: staging-area is transient, may want to use proper object?
- next steps: come up with serious pilot data
-
MS: agent is very stand-alone in terms of code, does not even share constants, nor data-db abstraction layer, should be addressed in the long run
-
benchmarking: benchmarks != tracing
-
MT: want cancel() on any state
-
TODO OW: yes, makes sense, will do
-
MS: RP state model: doesn't easily cover actively staging agent TODO MS: proposal
-