The program is run on each log line from start to finish, with no loops. The only state emitted by the program is the content of the exported metrics. Metrics can be read by the program, though, so exported metrics are the place to keep state between lines of input.
It's often the case that a log line is printed by an application at the start of some session-like interaction, and another at the end. Often these sessions have a session identifier, and every intermediate event in the same session is tagged with that identifier. Using map-valued exported metrics is the way to store session information keyed by session identifier.
The example program rsyncd.mtail
shows how to use a session tracking metric for measuring the total user session time.
counter connection_time_total
hidden gauge connection_time by pid
/connect from \S+/ {
connection_time[$pid] = timestamp()
del connection_time[$pid] after 72h
}
/sent .* bytes received .* bytes total size \d+/ {
connection_time_total += timestamp() - connection_time[$pid]
del connection_time[$pid]
}
rsyncd
uses a child process for each session so the pid
field of the log format contains the session identifier in this example.
hidden metrics
A hidden metric is only visible to the mtail program, it is hidden from export. Internal state can be kept out of the metric collection system to avoid unnecessary memory and network costs.
Hidden metrics are declared by prepending the word hidden
to the declaration:
hidden gauge connection_time by pid
The maps can grow unbounded with a key for every session identifier created as the logs are read. If you see mtail
consuming a lot of memory, it is likely that there's one or more of these maps consuming memory.
(You can remove the hidden
keyword from the declaration, and let mtail
reload the program without restarting and the contents of the session information metric will appear on the exported metrics page. Be warned, that if it's very big, even loading this page may take a long time and cause mtail to crash.)
mtail
can't know when a map value is ready to be garbage collected, so you need to tell it. One way is to defer deletion of the key and its value if it is not updated for some duration of time. The other way is to immediately delete it when the key and value are no longer needed.
del connection_time[$pid] after 72h
Upon creation of a connection time entry, the rsyncd.mtail
program instructs mtail to remove it 72 hours after it's no longer updated. This means that the programmer expects, in this case, that sessions typically do not last longer than 72 hours because mtail
does not track the timestamps for all accesses of metrics, only writes to them.
del connection_time[$pid]
The other form indicates that when the session is closed, the key and value can be removed. The caveat here is that logs can be lossy due to problems with the application restarting, mtail restarting, or the log delivery system (e.g. syslog) losing the messages too. Thus it is recommended to use both forms in programs.
del ... after
form when the metric is created, giving it an expiration time longer than the expected lifespan of the session.del
form when the session is ended, explicitly removing it before the expiration time is up.
It is not an error to delete a nonexistent key from a map.
Expiry is only processed once ever hour, so durations shorter than 1h won't take effect until the next hour has passed.