Skip to content

Roadmap

pcuzner edited this page Jul 11, 2017 · 2 revisions

This is a list documenting the work that is being considered;

Short term

  • make it easier to determine the data retention policy in graphite from the ansible playbook. The caveat is that the lowest granularity is 10s - since this syncs with the collectd interval
  • add an alerting dashboard : create a notification channel, and provide basic alerts from key metrics to this channel
  • examine whether dashUpdater should be used to set up the notification channel for alerting through Grafana's http api
  • update the osd collector to report on recovery and backfill latency per OSD, then review how/where these metrics can be used (at-a-glance and osd-latency dashboard to begin with)

Medium/Long term

  • investigate tagging - can this be used to tag nic interfaces as public/cluster for frontend/backend charts?
  • review a migration to prometheus, replacing graphite - but retaining grafana for visualisation
  • review the benefit of including haproxy based statistics within the dashboard(s)
  • review the integration options with cephmgr
Clone this wiki locally