tracking important errors #1

woodsaj · 2016-04-04T15:42:51Z

Issue by Dieterbe
Tuesday May 12, 2015 at 22:17 GMT
Originally opened as raintank/grafana#91

do we have any convention on how we will keep tabs on critical/important error events in grafana?
maybe log to a file and then use heka or logstash to shove them into ES?

woodsaj · 2016-04-04T15:42:51Z

Comment by woodsaj
Wednesday May 13, 2015 at 09:22 GMT

You can use the existing events in Grafana.

https://github.com/raintank/grafana/blob/master/pkg/events/events.go

If enabled, these events will be pushed to a rabbitmq Topic Exchange with the routingKey set to Priority.eventType. This is handled by eventpublisher.eventListener
https://github.com/raintank/grafana/blob/master/pkg/services/eventpublisher/eventpublisher.go#L135

So an event defined as

type OrgCreated struct {
    Timestamp time.Time `json:"timestamp"`
    Id        int64     `json:"id"`
    Name      string    `json:"name"`
}

will use the routingKey INFO.org.created

The events package will need some updating as it hard codes the event.Priority to 'INFO'

woodsaj · 2016-04-04T15:42:51Z

Comment by torkelo
Wednesday May 13, 2015 at 11:43 GMT

@woodsaj not sure the events bus is optimal for this, more meant for business/application events. But could I guess be used for logging events as well.

woodsaj · 2016-04-04T15:42:52Z

Comment by Dieterbe
Wednesday May 13, 2015 at 12:16 GMT

so do we have consensus that this is the best approach? is it still the right approach when we consider regular grafana users who want to run grafana on their server and have a different way to track errors? they often use something like logstash, so in that case a different eventlistener would be needed that pushes to a logstash queue or something i suppose? or perhaps an event listener that writes events to a text file log? (not something for us to worry too much about now, but good to keep in the back of our mind)

woodsaj · 2016-04-04T15:42:52Z

Comment by torkelo
Wednesday May 13, 2015 at 12:28 GMT

one solution is to have something external (like logstash), tail log files and push to ES

woodsaj · 2016-04-04T15:42:52Z

Comment by torkelo
Wednesday May 13, 2015 at 12:29 GMT

but I guess pushing directly to rabbit -> logstash -> ES has some advantages in that you can log more rich data (as json) and have that data indexed and searchable in ES, without going through logfile -> logstash parsing -> ES

woodsaj · 2016-04-04T15:42:53Z

Comment by Dieterbe
Friday May 15, 2015 at 16:01 GMT

not sure how i should go about the actual log calls.
all the current stuff seems to use a session and sess.publishAfterCommit() but that seems overkill and i don't grok it yet.

FWIW my current idea is (not tested yet)

diff --git a/pkg/events/events.go b/pkg/events/events.go
index c3dcac3..bb3ea82 100644
--- a/pkg/events/events.go
+++ b/pkg/events/events.go
@@ -35,7 +35,6 @@ func ToOnWriteEvent(event interface{}) (*OnTheWireEvent, error) {
        eventType := reflect.TypeOf(event).Elem()

        wireEvent := OnTheWireEvent{
-               Priority:  PRIO_INFO,
                EventType: eventType.Name(),
                Payload:   event,
        }
@@ -47,6 +46,13 @@ func ToOnWriteEvent(event interface{}) (*OnTheWireEvent, error) {
                wireEvent.Timestamp = time.Now()
        }

+       baseField = reflect.Indirect(reflect.ValueOf(event)).FieldByName("Priority")
+       if baseField.IsValid() {
+               wireEvent.Priority = baseField.Interface().(Priority)
+       } else {
+               wireEvent.Priority = PRIO_INFO
+       }
+
        return &wireEvent, nil
 }

@@ -77,3 +83,9 @@ type UserUpdated struct {
        Login     string    `json:"login"`
        Email     string    `json:"email"`
 }
+
+type Error struct {
+       Timestamp time.Time `json:"timestamp"`
+       Title     string    `json:"title"`
+       Body      string    `json:"body"` // optional
+}

woodsaj · 2016-04-04T15:42:53Z

Comment by torkelo
Friday May 15, 2015 at 17:02 GMT

bus Publish code does not use publishAfterCommit , publishAfterCommit is just a utility function in the sqlstore package.

woodsaj · 2016-04-04T15:42:53Z

Comment by torkelo
Friday May 15, 2015 at 17:04 GMT

not sure I understand your comment "my current idea is"? just looks like code paste from the events.go

woodsaj · 2016-04-04T15:42:54Z

Comment by Dieterbe
Friday May 15, 2015 at 17:04 GMT

no it's a diff that shows some additions, basically an error type and a way to override the priority

woodsaj · 2016-04-04T15:42:54Z

Comment by Dieterbe
Friday May 15, 2015 at 17:22 GMT

k i'll use bus.Publish

woodsaj · 2016-04-04T15:42:54Z

Comment by torkelo
Friday May 15, 2015 at 17:23 GMT

@Dieterbe but bus Publish is for business events, for logging use the log package

woodsaj · 2016-04-04T15:42:54Z

Comment by torkelo
Friday May 15, 2015 at 17:24 GMT

if you want to pipe log messages to rabbitmq you could write a rabbitmq log writer

woodsaj · 2016-04-04T15:42:55Z

Comment by Dieterbe
Friday May 15, 2015 at 19:00 GMT

hm not sure if i'll get to that before end of next week, but i guess that could fairly easily be done by one of you guys if needed. so i'll focus more on the specifics of alerting itself for now.

woodsaj · 2016-04-04T15:42:55Z

Comment by torkelo
Friday May 15, 2015 at 19:13 GMT

@Dieterbe yea, I think that is probably best. Getting grafana logs to elasticsearch can be fixed later

woodsaj · 2016-04-04T15:42:55Z

Comment by nopzor1200
Monday May 18, 2015 at 19:58 GMT

Can the messages generated by Litmus go into the same storage backend as the messages that are generated from the Collectors? (so they can be viewed in the events panel from elasticsearch)?

woodsaj · 2016-04-04T15:42:55Z

Comment by Dieterbe
Monday May 18, 2015 at 20:03 GMT

i don't see why they couldn't, but one thing to keep in mind is decoupling the monitoring system from the system being monitored. if prod ES goes down for whatever reason then we'll want to look at events in a monitoring system which probably should not be the same ES instance.
(also, if you want to log events like error: can't write to ES than logging those to the same ES would only make it worse)

woodsaj · 2016-04-04T15:42:56Z

Comment by nopzor1200
Tuesday Jun 23, 2015 at 16:18 GMT

@woodsaj to make a call on this

potentially loggly, potentially ELK...

but all in agreement that centralized logging for * is required

woodsaj · 2016-04-04T15:42:56Z

Comment by Dieterbe
Friday Jul 10, 2015 at 18:48 GMT

have we made a decision on this? @woodsaj @ctdk

woodsaj · 2016-04-04T15:42:56Z

Comment by woodsaj
Wednesday Jul 29, 2015 at 05:23 GMT

Given that we now have ELK set up, i think the best approach is to just log warnings and errors. WE just need to ensure that the log messages contain all relevant data.

woodsaj · 2016-04-04T15:42:56Z

Comment by Dieterbe
Wednesday Jul 29, 2015 at 08:40 GMT

yep. do we need to do a big code review to make sure we have the right log calls in all places or are we pretty confident we're in good shape? I'll review the alerting pkg to be sure of that, at least.

woodsaj · 2016-04-04T15:42:56Z

Comment by ctdk
Thursday Jul 30, 2015 at 06:34 GMT

We can do some processing and mutating of the logs as they come into logstash too. I'll have that set up soon.

woodsaj added the in progress label Apr 4, 2016

woodsaj mentioned this issue Apr 4, 2016

can we rename collector? perhaps to probe? #9

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracking important errors #1

tracking important errors #1

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

tracking important errors #1

tracking important errors #1

Comments

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016

woodsaj commented Apr 4, 2016