2014 09 18

Agenda:
- open TODOs
  - TODO SJ: doc is missing 10.000 feet overview (SJ?)
  - DONE SJ: contact Helmut about SuperMuc allocationtheradicalgroup
  - DONE AM: reduce testing frequency further after release
  - WIP AM: follow up with Anjani on EC2
  - WIP AM: RP slide decks
  - DONE AM: test supermuc, check out gsi contexts
  - DONE AM: suspend testing on FG
  - DONE MS: verify state model wrt. data staging
  - TODO AM: better name for STATE_X
  - TODO SJ: obtain credentials for theradicalgroup@gmail.com
- open agenda items
  - WIP AM: AGENDA: cleanup
  - WIP AM: AGENDA: configuration files
  - TODO AM: AGENDA: test suite granularity
  - TODO AM: AGENDA: performance PTY / SHELL / SAGA
  - TODO AM: AGENDA: discuss how to ensure test coverage
  - TODO AM: AGENDA: discuss #307, async call semantics
  - TODO AM: AGENDA: student project: plotting
- MS.9
  - What are goals for the next couple of weeks?
  - (check on open tickets)
- eval tutorial with Indiana: online, Scott and Abhinav
  - TODO AM: provide application code
  - TODO RADICAL: CCM
  - TODO AM: follow up on progress
- configuration files
  - AM: we could re-use what we had in OWMS (code exists in utils)
    - RP resource configs remain as is
    - user configs can be used to overwrite those default settings, like:
      # $HOME/radical/pilot.cfg { "resources" : { # add a custom host "boskop" : { "defaults" : "localhost" "pilot_agent" : "rp-agent-testing.py", "lrms" : "TORQUE", "task_launch_method" : "SSH", "mpi_launch_method" : "MPIRUN", "global_virtenv" : "$HOME/ve/" }, # change some user specific variable in # existing RP config entries "*.futuregrid.org" : { "username" : "merzky" }, "sierra.futuregrid.org" : { "default_queue" : "batch" }, } }
- cleanup modes
  - 1: cleanup database entries: session.close (cleanup=TRUE)
  - 2: terminate pilots: session.close (terminate=TRUE)
  - 3: clean pilot sandbox: pilot_description.cleanup = TRUE
  - 4: clean unit sandbox: unit_description.cleanup = TRUE
  - 1, 2 are enacted by RP/Application on clean application shutdown
  - 3, 4 are enacted by agent on clean pilot shutdown
  - 1, 2 can be performed after application finishes, via radicalpilot-cleanup
  - 3, 4 cannot be performed after application finishes (yet)
- STATE_X:
  - AM: should be SCHEDULING: the CU has reached the scheduler but has not yet been assigned to a pilot (e.g., if none is free to run the CU).
```
# scheduler
for task in wait_q :
    task.state = SCHEDULING

    while True :
        pilot = find_free_pilot (task)
        if  pilot :
            task.pilot = pilot
            break
        
task.state = PENDING_EXECUTION
submit_task_to_pilot (task)
```
  - but: SCHEDULING already used within the agent:
```
# agent
for task in mongodb.find (pid  : my_pid, 
                          state: PENDING_EXECUTION)
    task.state = SCHEDULING

    while True :
        cores = find_free_cores (task)
        if  cores :
            task.cores = cores
            break
        
task.state = EXECUTING
submit_task_to_cores (task)
```
Notes:
- MS.9:
  - MS: CU startup timing
  - MS: (understanding of) perf
  - MS: data: pushing caps into pilot level, data intell.
  - MS: docs, tickets
  - OW: documentation, tutorial
  - OW: incremental steps, stability
  - OW: don't pack feature tickets
  - SJ: bigger, better, faster, self-maintaining ;)
  - SJ: careful what we want to achieve
  - SJ: easy of use, feature complete documentation
  - SJ: mission: replace BJ, serve our own needs
  - SJ: testing
- configuration:
  - agree on adding resources
  - no urgent need for changing individual settings
  - TODO AM: reply on #327 with app level config
- cleanup:
  - complete
  - unit sandbox: pilot cleanup is maybe too late to enforce unit cleanup
  - Antons ticket, Matteo ticket
  - MS: address issues incompletely
- STATE_X
  - both are SCHEDULING
  - MS: SCHEDULING / ALLOCATION
- TODO SJ: cleanup new member information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2014 09 18

Clone this wiki locally