Skip to content

2018 05 30

Andre Merzky edited this page May 30, 2018 · 5 revisions

RCT Links:

RP/RS/RU:

  • Ioannis:
    • TODO AM: timeline for implementing partitions
    • RS troubles on SAGA (heterogeneous cluster)
  • Jumana
    • EnTK troubles
      • only first stage, then PIKA/RMQ problems
      • also scaling issues (ID limits?)
    • impact on paper timelines, but initial data exists
    • BW ok for JD, not for Kristof, not sure about the difference
  • George
    • streaming experiments:
      • deadline on June 18
      • looking into Lustre behavior
  • Vivek:
    • rebuttal

Any Other Business:

  • GPU release
    • end of week
    • after prof fix
    • documentation: how to run examples on remote machines
  • Partitioning
    • VB:
      • API ok
      • revisit use case 'watcher unit'?
      • scalability -> multiple current agents
      • PENDING -> STARTING?
      • API can be kept backward compatible
      • introduces new failur e mode
      • we need more test coverage first, lower priority
        • AM: partial implementations to stretch over time
    • JD:
      • no partitioning of disk space? :-)
    • IP:
      • partition description
        • be specific about partition sizes, not something like 50%
      • direct binding
    • VB
      • auto-account of agent nodes?
        • AM: consider
      • will agents compete for resources?
        • AM: maybe
    • Timelines
      • only active use case: Iannis
      • target: end of summer?
      • timeline for stability / testing:
      • priority on ensuring correctness (versus good failure modes)
      • extend testing to EnTK
      • TODO VB: wishlist for test coverage
      • TODO AM: list of partial releases (configs, bootstrap, ...)
Clone this wiki locally