-
Notifications
You must be signed in to change notification settings - Fork 23
2014 09 18
andre-merzky edited this page Sep 18, 2014
·
18 revisions
-
Agenda:
- open TODOs
- TODO SJ: doc is missing 10.000 feet overview (SJ?)
- DONE SJ: contact Helmut about SuperMuc allocationtheradicalgroup
- DONE AM: reduce testing frequency further after release
- WIP AM: follow up with Anjani on EC2
- WIP AM: RP slide decks
- DONE AM: test supermuc, check out gsi contexts
- DONE AM: suspend testing on FG
- DONE MS: verify state model wrt. data staging
- TODO AM: better name for
STATE_X
- TODO SJ: obtain credentials for
theradicalgroup@gmail.com
- open agenda items
- WIP AM: AGENDA: cleanup
- WIP AM: AGENDA: configuration files
- TODO AM: AGENDA: test suite granularity
- TODO AM: AGENDA: performance PTY / SHELL / SAGA
- TODO AM: AGENDA: discuss how to ensure test coverage
- TODO AM: AGENDA: discuss #307, async call semantics
- TODO AM: AGENDA: student project: plotting
- MS.9
- What are goals for the next couple of weeks?
- (check on open tickets)
- eval tutorial with Indiana: online, Scott and Abhinav
- TODO IU: provide application code
- TODO RADICAL: CCM
- TODO AM: follow up on progress
- configuration files
- AM: we could re-use what we had in OWMS (code exists in utils)
- RP resource configs remain as is
- user configs can be used to overwrite those default settings, like:
# $HOME/radical/pilot.cfg { "resources" : { # add a custom host "boskop" : { "defaults" : "localhost" "pilot_agent" : "rp-agent-testing.py", "lrms" : "TORQUE", "task_launch_method" : "SSH", "mpi_launch_method" : "MPIRUN", "global_virtenv" : "$HOME/ve/" }, # change some user specific variable in # existing RP config entries "*.futuregrid.org" : { "username" : "merzky" }, "sierra.futuregrid.org" : { "default_queue" : "batch" }, } }
- AM: we could re-use what we had in OWMS (code exists in utils)
- cleanup modes
-
1:
cleanup database entries:session.close (cleanup=TRUE)
-
2:
terminate pilots:session.close (terminate=TRUE)
-
3:
clean pilot sandbox:pilot_description.cleanup = TRUE
-
4:
clean unit sandbox:unit_description.cleanup = TRUE
- 1, 2 are enacted by RP/Application on clean application shutdown
- 3, 4 are enacted by agent on clean pilot shutdown
- 1, 2 can be performed after application finishes, via
radicalpilot-cleanup
- 3, 4 cannot be performed after application finishes (yet)
-
-
STATE_X
:- AM: should be
SCHEDULING
: the CU has reached the scheduler but has not yet been assigned to a pilot (e.g., if none is free to run the CU).# scheduler for task in wait_q : task.state = SCHEDULING while True : pilot = find_free_pilot (task) if pilot : task.pilot = pilot break task.state = PENDING_EXECUTION submit_task_to_pilot (task)
- but:
SCHEDULING
already used within the agent:# agent for task in mongodb.find (pid : my_pid, state: PENDING_EXECUTION) task.state = SCHEDULING while True : cores = find_free_cores (task) if cores : task.cores = cores break task.state = EXECUTING submit_task_to_cores (task)
- AM: should be
- open TODOs
-
Notes:
- MS.9:
- MS: CU startup timing
- MS: (understanding of) perf
- MS: data: pushing caps into pilot level, data intell.
- MS: docs, tickets
- OW: documentation, tutorial
- OW: incremental steps, stability
- OW: don't pack feature tickets
- SJ: bigger, better, faster, self-maintaining ;)
- SJ: careful what we want to achieve
- SJ: easy of use, feature complete documentation
- SJ: mission: replace BJ, serve our own needs
- SJ: testing
- configuration:
- agree on adding resources
- no urgent need for changing individual settings
- TODO AM: reply on #327 with app level config
- cleanup:
- complete
- unit sandbox: pilot cleanup is maybe too late to enforce unit cleanup
- Antons ticket, Matteo ticket
- MS: address issues incompletely
-
STATE_X
- both are
SCHEDULING
- MS: SCHEDULING / ALLOCATION
- both are
- TODO SJ: cleanup new member information
- MS.9: