Skip to content

2014 09 18

andre-merzky edited this page Sep 24, 2014 · 18 revisions
  • Agenda:

    • open TODOs
      • TODO SJ: doc is missing 10.000 feet overview (SJ?)
      • DONE SJ: contact Helmut about SuperMuc allocationtheradicalgroup
      • DONE AM: reduce testing frequency further after release
      • WIP AM: follow up with Anjani on EC2
      • WIP AM: RP slide decks
      • DONE AM: test supermuc, check out gsi contexts
      • DONE AM: suspend testing on FG
      • DONE MS: verify state model wrt. data staging
      • TODO AM: better name for STATE_X
      • TODO SJ: obtain credentials for theradicalgroup@gmail.com
    • open agenda items
      • WIP AM: AGENDA: cleanup
      • WIP AM: AGENDA: configuration files
      • TODO AM: AGENDA: test suite granularity
      • TODO AM: AGENDA: performance PTY / SHELL / SAGA
      • TODO AM: AGENDA: discuss how to ensure test coverage
      • TODO AM: AGENDA: discuss #307, async call semantics
      • TODO AM: AGENDA: student project: plotting
    • MS.9
      • What are goals for the next couple of weeks?
      • (check on open tickets)
    • eval tutorial with Indiana: online, Scott and Abhinav
      • TODO AM: provide application code
      • TODO RADICAL: CCM
      • TODO AM: follow up on progress
    • configuration files
      • AM: we could re-use what we had in OWMS (code exists in utils)
        • RP resource configs remain as is
        • user configs can be used to overwrite those default settings, like:
          # $HOME/radical/pilot.cfg
          {
              "resources" : {
          
                  # add a custom host
                  "boskop" : {
                      "defaults"           : "localhost"
                      "pilot_agent"        : "rp-agent-testing.py",
                      "lrms"               : "TORQUE",
                      "task_launch_method" : "SSH",
                      "mpi_launch_method"  : "MPIRUN",
                      "global_virtenv"     : "$HOME/ve/"
                  },
          
                  # change some user specific variable in 
                  # existing RP config entries
                  "*.futuregrid.org" : {
                      "username" : "merzky"
                  },
                  "sierra.futuregrid.org" : {
                      "default_queue"    : "batch"
                  },
              }
          }
          
    • cleanup modes
      • 1: cleanup database entries: session.close (cleanup=TRUE)
      • 2: terminate pilots: session.close (terminate=TRUE)
      • 3: clean pilot sandbox: pilot_description.cleanup = TRUE
      • 4: clean unit sandbox: unit_description.cleanup = TRUE
      • 1, 2 are enacted by RP/Application on clean application shutdown
      • 3, 4 are enacted by agent on clean pilot shutdown
      • 1, 2 can be performed after application finishes, via radicalpilot-cleanup
      • 3, 4 cannot be performed after application finishes (yet)
    • STATE_X:
      • AM: should be SCHEDULING: the CU has reached the scheduler but has not yet been assigned to a pilot (e.g., if none is free to run the CU).
        # scheduler
        for task in wait_q :
            task.state = SCHEDULING
        
            while True :
                pilot = find_free_pilot (task)
                if  pilot :
                    task.pilot = pilot
                    break
                
        task.state = PENDING_EXECUTION
        submit_task_to_pilot (task)
        
      • but: SCHEDULING already used within the agent:
        # agent
        for task in mongodb.find (pid  : my_pid, 
                                  state: PENDING_EXECUTION)
            task.state = SCHEDULING
        
            while True :
                cores = find_free_cores (task)
                if  cores :
                    task.cores = cores
                    break
                
        task.state = EXECUTING
        submit_task_to_cores (task)
        
  • Notes:

    • MS.9:
      • MS: CU startup timing
      • MS: (understanding of) perf
      • MS: data: pushing caps into pilot level, data intell.
      • MS: docs, tickets
      • OW: documentation, tutorial
      • OW: incremental steps, stability
      • OW: don't pack feature tickets
      • SJ: bigger, better, faster, self-maintaining ;)
      • SJ: careful what we want to achieve
      • SJ: easy of use, feature complete documentation
      • SJ: mission: replace BJ, serve our own needs
      • SJ: testing
    • configuration:
      • agree on adding resources
      • no urgent need for changing individual settings
      • TODO AM: reply on #327 with app level config
    • cleanup:
      • complete
      • unit sandbox: pilot cleanup is maybe too late to enforce unit cleanup
      • Antons ticket, Matteo ticket
      • MS: address issues incompletely
    • STATE_X
      • both are SCHEDULING
      • MS: SCHEDULING / ALLOCATION
    • TODO SJ: cleanup new member information
Clone this wiki locally