Add cluster leader election #6

woodsaj · 2016-04-04T15:51:54Z

Issue by woodsaj
Friday Jun 19, 2015 at 20:22 GMT
Originally opened as raintank/grafana#228

There are a few features within the code base that should only be run from one node at a time.
This requires having the nodes co-ordinate this role amongst themselves.

Raft seems to be the new hotness when it comes to these things, so we should use that. Coreos' etcd package has an implementation of raft.
https://godoc.org/github.com/coreos/etcd/raft

woodsaj · 2016-04-04T15:51:55Z

Comment by Dieterbe
Tuesday Jun 23, 2015 at 00:06 GMT

mind sharing a little bit what those features are?
can we get away with transactions on the database?
for alerting, i noticed you mentioned somewhere running only 1 job producer, but i thought we decided we actually wanted to run multiple alert job producers for HA, because if jobs get consistently routed (by key), the consumers will drop jobs they've already processed anyway. this is a fairly simplistic method of HA. if you're thinking of running only 1 producer, and it dies and restarts somewhere else then we also need to keep track of the last timestamp at which jobs were scheduled. in case it takes several seconds to restart a producer, the new producer should also process the missed ticks from the last few seconds. (i actually like this approach, it seems more efficient, but also requires more operations/automation, perhaps we should postpone this improvement until we're at a point where multiple producers bring too much overhead?)

woodsaj · 2016-04-04T15:51:56Z

Comment by woodsaj
Tuesday Jun 23, 2015 at 15:10 GMT

This is a long term goal to meet future scalability needs.

alerting scheduler and also collector session management.
yes, and that is likely what will be deployed first. but it does not scale. So long term we need a better solution.
also true, but as with 2, does not scale. If we are running 10 instances of grafana, we don't want to have all 10 pushing the same messages into the queue.

woodsaj · 2016-04-04T15:51:58Z

Comment by Dieterbe
Tuesday Jun 23, 2015 at 19:59 GMT

This is a long term goal to meet future scalability needs.

I believe @nopzor1200 described the raft leader election as a high-prio item that was a must before we can launch.

I agree with your reasoning @woodsaj but we should make sure we're on the same page regarding urgency and timeline of this.
also, to make this viable I will need to make the alerting scheduler stateful (keeping track of last successfully processed timestamp, perhaps this could go into the raft log or in etcd, or in the database. will we have a HA transactional database?)

woodsaj · 2016-04-04T15:51:59Z

Comment by Dieterbe
Wednesday Jul 08, 2015 at 02:30 GMT

Just saw a docker talk at pre gophercon party about libkv which provides a nice abstraction for leader election (supports etcd, consul and zk)

woodsaj · 2016-04-04T15:52:00Z

Comment by nopzor1200
Saturday Jul 18, 2015 at 05:23 GMT

I originally misunderstood whether this was a high prio vs low prio item @woodsaj confirm it (raft or the like) is not something we need to worry about for now right?

woodsaj · 2016-04-04T15:52:02Z

Comment by woodsaj
Sunday Jul 19, 2015 at 13:24 GMT

This is low prio.

woodsaj · 2016-04-04T15:52:03Z

Comment by Dieterbe
Friday Jul 31, 2015 at 15:54 GMT

(interestingly, this ticket was in "to do" in codetree. when i moved it to backlog it removed the backlog milestone. i guess cause it doesn't use milestones for backlog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster leader election #6

Add cluster leader election #6

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

Add cluster leader election #6

Add cluster leader election #6

Comments

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016