Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking important errors #1

Open
woodsaj opened this issue Apr 4, 2016 · 21 comments
Open

tracking important errors #1

woodsaj opened this issue Apr 4, 2016 · 21 comments

Comments

@woodsaj
Copy link
Contributor

woodsaj commented Apr 4, 2016

Issue by Dieterbe
Tuesday May 12, 2015 at 22:17 GMT
Originally opened as raintank/grafana#91


do we have any convention on how we will keep tabs on critical/important error events in grafana?
maybe log to a file and then use heka or logstash to shove them into ES?

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by woodsaj
Wednesday May 13, 2015 at 09:22 GMT


You can use the existing events in Grafana.

https://github.com/raintank/grafana/blob/master/pkg/events/events.go

If enabled, these events will be pushed to a rabbitmq Topic Exchange with the routingKey set to Priority.eventType. This is handled by eventpublisher.eventListener
https://github.com/raintank/grafana/blob/master/pkg/services/eventpublisher/eventpublisher.go#L135

So an event defined as

type OrgCreated struct {
    Timestamp time.Time `json:"timestamp"`
    Id        int64     `json:"id"`
    Name      string    `json:"name"`
}

will use the routingKey INFO.org.created

The events package will need some updating as it hard codes the event.Priority to 'INFO'

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Wednesday May 13, 2015 at 11:43 GMT


@woodsaj not sure the events bus is optimal for this, more meant for business/application events. But could I guess be used for logging events as well.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Wednesday May 13, 2015 at 12:16 GMT


so do we have consensus that this is the best approach? is it still the right approach when we consider regular grafana users who want to run grafana on their server and have a different way to track errors? they often use something like logstash, so in that case a different eventlistener would be needed that pushes to a logstash queue or something i suppose? or perhaps an event listener that writes events to a text file log? (not something for us to worry too much about now, but good to keep in the back of our mind)

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Wednesday May 13, 2015 at 12:28 GMT


one solution is to have something external (like logstash), tail log files and push to ES

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Wednesday May 13, 2015 at 12:29 GMT


but I guess pushing directly to rabbit -> logstash -> ES has some advantages in that you can log more rich data (as json) and have that data indexed and searchable in ES, without going through logfile -> logstash parsing -> ES

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Friday May 15, 2015 at 16:01 GMT


not sure how i should go about the actual log calls.
all the current stuff seems to use a session and sess.publishAfterCommit() but that seems overkill and i don't grok it yet.

FWIW my current idea is (not tested yet)

diff --git a/pkg/events/events.go b/pkg/events/events.go
index c3dcac3..bb3ea82 100644
--- a/pkg/events/events.go
+++ b/pkg/events/events.go
@@ -35,7 +35,6 @@ func ToOnWriteEvent(event interface{}) (*OnTheWireEvent, error) {
        eventType := reflect.TypeOf(event).Elem()

        wireEvent := OnTheWireEvent{
-               Priority:  PRIO_INFO,
                EventType: eventType.Name(),
                Payload:   event,
        }
@@ -47,6 +46,13 @@ func ToOnWriteEvent(event interface{}) (*OnTheWireEvent, error) {
                wireEvent.Timestamp = time.Now()
        }

+       baseField = reflect.Indirect(reflect.ValueOf(event)).FieldByName("Priority")
+       if baseField.IsValid() {
+               wireEvent.Priority = baseField.Interface().(Priority)
+       } else {
+               wireEvent.Priority = PRIO_INFO
+       }
+
        return &wireEvent, nil
 }

@@ -77,3 +83,9 @@ type UserUpdated struct {
        Login     string    `json:"login"`
        Email     string    `json:"email"`
 }
+
+type Error struct {
+       Timestamp time.Time `json:"timestamp"`
+       Title     string    `json:"title"`
+       Body      string    `json:"body"` // optional
+}

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Friday May 15, 2015 at 17:02 GMT


bus Publish code does not use publishAfterCommit , publishAfterCommit is just a utility function in the sqlstore package.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Friday May 15, 2015 at 17:04 GMT


not sure I understand your comment "my current idea is"? just looks like code paste from the events.go

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Friday May 15, 2015 at 17:04 GMT


no it's a diff that shows some additions, basically an error type and a way to override the priority

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Friday May 15, 2015 at 17:22 GMT


k i'll use bus.Publish

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Friday May 15, 2015 at 17:23 GMT


@Dieterbe but bus Publish is for business events, for logging use the log package

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Friday May 15, 2015 at 17:24 GMT


if you want to pipe log messages to rabbitmq you could write a rabbitmq log writer

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Friday May 15, 2015 at 19:00 GMT


hm not sure if i'll get to that before end of next week, but i guess that could fairly easily be done by one of you guys if needed. so i'll focus more on the specifics of alerting itself for now.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by torkelo
Friday May 15, 2015 at 19:13 GMT


@Dieterbe yea, I think that is probably best. Getting grafana logs to elasticsearch can be fixed later

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by nopzor1200
Monday May 18, 2015 at 19:58 GMT


Can the messages generated by Litmus go into the same storage backend as the messages that are generated from the Collectors? (so they can be viewed in the events panel from elasticsearch)?

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Monday May 18, 2015 at 20:03 GMT


i don't see why they couldn't, but one thing to keep in mind is decoupling the monitoring system from the system being monitored. if prod ES goes down for whatever reason then we'll want to look at events in a monitoring system which probably should not be the same ES instance.
(also, if you want to log events like error: can't write to ES than logging those to the same ES would only make it worse)

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by nopzor1200
Tuesday Jun 23, 2015 at 16:18 GMT


@woodsaj to make a call on this

potentially loggly, potentially ELK...

but all in agreement that centralized logging for * is required

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Friday Jul 10, 2015 at 18:48 GMT


have we made a decision on this? @woodsaj @ctdk

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by woodsaj
Wednesday Jul 29, 2015 at 05:23 GMT


Given that we now have ELK set up, i think the best approach is to just log warnings and errors. WE just need to ensure that the log messages contain all relevant data.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by Dieterbe
Wednesday Jul 29, 2015 at 08:40 GMT


yep. do we need to do a big code review to make sure we have the right log calls in all places or are we pretty confident we're in good shape? I'll review the alerting pkg to be sure of that, at least.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 4, 2016

Comment by ctdk
Thursday Jul 30, 2015 at 06:34 GMT


We can do some processing and mutating of the logs as they come into logstash too. I'll have that set up soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant