Skip to content

2015 10 08

Andre Merzky edited this page Nov 19, 2015 · 6 revisions
  • Agenda:
    • open TODOs:
      • TODO MT: add allocation info to resource doc

      • HOLD AM/MS: prepare action/support plan for activities on BW
        • objectives, challenges, timelines, phase 1
      • HOLD AM: check if we can switch to HeartbeatMonitor for pilot health checks
      • HOLD AM: suggest alternatives for PTY layer resource consumption
      • HOLD MS: Anaconda/SuperMUC (October)
      • HOLD MS: add NAMD examples eventually? (Tom Bishop)
      • HOLD AM: set up example on how to use synapse as RP workload
      • HOLD AM: check documentation of state diagram in released docs
      • HOLD MT: move semantic elements of tools into RP.utils
      • HOLD AM: proposal to json export to persistent storage
      • HOLD MS: proposal for persistent experimental data storage
    • Development Progress:
      • release plan:
        • 0.36: W2 September
          • testing done, release done
          • hotfixes upcoming (aprun, log verbosity, archer shutdown)
        • 0.37: W3 September
          • documentation, examples, tutorials
        • 0.38: end October
          • module refactor
          • final state model
          • -> as planned
      • testing (AT):
        • DONE AT: move to RADICAL-Jenkins (with one fixture)
        • DONE AT: get stable (blue)
        • DONE AT: look into mail notifications
      • Anaconda (IP): anaconda support on client side? *
      • Yarn (IP):
        • TODO IP: toward dynamic multi node (lower priority)
        • TODO IP: pull request for launcher...
        • TODO AM: daemon startup over LMs?
        • reduced number of scp calls -> stabilizes
        • chameleon tests with longer CUs up to 64
      • Spark
        • TODO GC: compare to Yarn integration
      • BW: *
      • State of application kernels?
      • CECAM
        • Agenda
        • Documentation Tickets
          • which is the target env for installation?
          • workflow.iu.edu -> 50 tutorial account
            • DONE SJ: clarify account usage and XSEDE allocation
            • same accounts for Extasy
          • TODO AM: pre/post exec: not after application error
          • TODO AM: how is RTD to be synced to devel
        • conceptual section is missing
          • what problem do we address?
          • what is a pilot?
          • what is a CU?
          • what is this MongoDB thing?
          • how do I know what goes on in the pilot? With my CUs?
          • what is a scheduler? Why are there multiple schedulers?
          • how about data?
          • DONE AM: create that structure
        • ordering
          • there is no single 'good order'
          • examples and best practices are different, as is the tutorial
          • should the tutorial separated out in the first place?
          • rename this document to 'user guide'
          • SJ: user guide is pre-requisite for tutorial
          • no need to have release nodes in this doc
        • Intro: SJ
        • install: VB
          • branch, some changes,
          • DONE VB/AM: add ssh-config
        • resources: MT
          • DONE SJ: review after
          • DONE: link auth links from (II) into that section
        • data: AT
          • links should be clickable
          • move callbacks elsewhere? Most basic examples start with those...
          • data examples are on localhost only. Uhm.
          • DONE AM: split into concepts and examples? Lets see after concepts are in place
        • examples: MS
          • getting started needs to go much earlier
          • merge 5.2 (error handling) with 5.7 (app flow)
          • axe 5.3 (reconnect)
          • 5.4 should be merged into resource section
          • 5.5: is more an FAQ - move it there?
          • add pre/post exec discussion
          • 5.6 (MPI) sooner and implicit?
        • tutorial: AM
        • DONE VB: next Thu: switch from RP testing to ENMD testing protocol
        • DONE AT: next Thu: draft user guide, two weeks: testing protocol
      • TODO SJ: review of docs by Software institute? (Neil)
      • TODO AGENDA: RTD procedures
      • TODO AGENDA: where go user credentials? context vs. user pilot description.
    • Data Roadmap:
    • Experiments:
      • What metrics should RP provide to estimate its 'overhead' / 'contribution'?
      • HOLD: micro vs. macro benchmarks
      • HOLD: profile status
    • Publications:
    • AOB:
      • CECAM Tutorial
        • online documentation vs. online tutorial
        • begin to work on interactive examples (which involve user activity)
          • how to submit n tasks of size A and m tasks of size B, toward hosts X and Y
          • DONE AT: simple repex example
            • DONE AT: check with SJ about suitable example / exercise mode
          • DONE VB: simple MD example
          • DONE AM: simple RP example
        • execution env, software stack, applications/libraries
        • DONE SJ: landing page
        • DONE SJ: confirm that XUP accounts are valid on workflow machine
      • SC15 Tutorial
  • Notes: *
Clone this wiki locally