Skip to content

Alarm Logging

Kay Kasemir edited this page May 25, 2022 · 5 revisions

Alarm Logging Notes

The alarm logger forwards status and configuration updates from one or more alarm servers to ElasticSearch. Kibana is a web-based tool for viewing and analyzing data in ElasticSearch. It can for example provide a list of recent alarms, or show the top-10 alarms of the last 24 hours.

Initial Setup

For basic setup of the alarm server see https://github.com/ControlSystemStudio/phoebus/tree/master/app/alarm

For the alarm logger see https://github.com/ControlSystemStudio/phoebus/tree/master/services/alarm-logger

When simply downloading and starting Kibana, it will connect to elastic on localhost and allow web access to its GUI on localhost. If that is insufficient and remote access is required, edit kibana/config/kibana.yml:

# Default value "localhost" will only allow local access.
# Open to remote access:
server.host="0.0.0.0"

Open web browser to http://localhost:5601 (unless remote access was enabled as shown above).

The raw elastic data is in indices named xxx_alarms_state_2021-11-01 with new indices created each month or week depending on logger settings. Kibana uses index patters to combine data from for example all *_alarms_state_* indices. If you have multiple alarm setups, you can combine data from all alarm systems that way.

After a restart with empty ElasticSearch data, when first using the "Discover" link, an index pattern with strange ID is auto-created. When then creating visualizations, they cannot be exported/imported without adjusting the strange IDs.

To avoid problems, create index names with known IDs:

  • Kibana Management, Index Patterns, Create index pattern:
  • Index Pattern: *alarms_state*, Next
  • Time filter field: Select message_time.
  • Under "show advanced settings", enter ID "alarms_state"

If you have multiple alarm setups, you can decide to either combine all indices via a *alarms_state* pattern, or use a pattern xxx_alarms_state* to combine only messages for setup xxx.

Manage Services

In the following we assume that the alarm logger, elastic and kibana are all running as Linux systemd services which can be checked like this:

systemctl status elasticsearch
systemctl status alarm-logger
systemctl status kibana

To start over, stop all services:

sudo systemctl stop kibana
sudo systemctl stop alarm-logger
sudo systemctl stop elasticsearch

Start back up, verify that it's running:

sudo systemctl start elasticsearch
lynx -dump http://localhost:9200
lynx -dump  http://localhost:9200/_cat/indices?v

Start logger:

sudo systemctl start alarm-logger

After a short while check that it's running and knows the alarm templates:

netstat -an | fgrep 9200
curl http://localhost:9200
curl http://localhost:9200/_template/*alarm*?pretty

If the alarm logger has seen any alarm traffic, it should have created some `alarm indices:

curl http://localhost:9200/_cat/indices?v

Pick a specific index and dump its data

curl 'http://localhost:9200/XXX_alarms_config_2019-04-26/_search?format=json&pretty'

To check log messages, for example to see if kibana complains about an incompatible version of elasticsearch:

sudo journalctl -u kibana

Kibana Recipes

Open web browser to http://localhost:5601 (unless remote access was enabled as shown above).

Basic list of recent alarms:

  • From top-left menu, select Kibana, Discover
  • Select the alarms_state index pattern
  • From available fields, select config, current_severity, severity, latch
  • Select desired time range in upper right corner

To then narrow the alarm listing to just one PV:

  • "Add filter"
  • Select "pv" field
  • Select "is" operator
  • Select one of the suggest PVs for a value

Alternatively to the PV name, or in addition to the PV name, show only the first occurrence of an alarm:

  • "Add filter"
  • Select "latch" field
  • Select "is" operator
  • Select "true"

Table of recent alarm counts:

Kibana keeps changing. The "Visualize" link used to be right in the sidebar. Then it moved to "Kibana, Visualize" and with 7.17 it's under "Kibana, Visualize Library, Create new ..., Aggregation Based".

  • Visualize, add a "Data Table"
  • Select the "alarms_state" data source
  • Leave "Metrics" on "Count"
  • Under Buckets, add "Split Rows"
  • Aggregation "Terms", selecting the field "pv", leaving metric: count
  • "Add sub-buckets", "Split Rows", "Terms", "current_severity"
  • "Add sub-buckets", "Split Rows", "Terms", "severity"
  • Finally under "Options" increase the "Per Page" count to 20

To colorize table cells with "OK", "MINOR", "MAJOR" as green, orange, red: This has to be done via a global setting. Under Management, find Kibana Index Patterns, then the "alarms_state". In the table of fields, locate "current_severity". Click the edit "pencil" at the right edge of the row:

  • Format: Color
  • Add pattern OK with color green
  • Add pattern MINOR with color orange
  • Add pattern MAJOR with color red

Repeat for the "severity" field. GUI may be iffy, had to click on color in popup, was not able to enter the RGB values, so hard to get the exact same coloring for "current_severity" vs. "severity".

Basic "how many alarms over time" plot:

  • Visualize, add a "Line" graph
  • Select the "alarms_state" data source
  • Add filter: current_severity is one of MINOR, MAJOR, INVALID, UNDEFINED (could also filter on current_severity is not OK)
  • Data Metrics: Y-Axis Aggregation "Count", label "Alarm Count"
  • Data Buckets: X-Axis Aggregation "Date Histogram", field "message_time", Interval "Auto", label "Time"
  • Save as "Timeline (PV)". With newer versions, it can directly be added to a new or existing dashboard, but selecting "Add to library" makes easier to reach for addition to multiple dashboards or to edit.

Basic "how many latched alarms over time" plot:

  • As above, but filter on "latch is true"

Plot of Top Alarm Trigger PVs:

  • Visualize, add a "Vertical Bar" graph
  • Select the "alarms_state" data source
  • Add filter: current_severity is one of MINOR, MAJOR, INVALID, UNDEFINED
  • Data Metrics: Y-Axis Aggregation "Count", label "Alarm Count"
  • Data Buckets: X-Axis Aggregation "Terms", field "pv", Order by "Metric: Alarm Count", order "Descending", size "10", label "Alarm PV"
  • Under "Metrix & axes", change X-axis "Align" to "Angled"
  • Save as "Top Alarm Trigger PVs"

Plot of Top Latched Alarms:

  • Same as Top Alarm Trigger PVs, but add a filter where "latch" is "true"

Dashboard:

  • Create dashboard, add/position visualizations
  • Select time range 24 hours
  • Save with "Store time with dashboard" selected

To export/import, use Kibana Management, Saved Objects, select all visualizations and dashboards to export, or import a previously exported file.

Data Management

Older indices need to be deleted to improve Kibana response time and to save disk space. This can be done via Kibana, Management, Stack Management, Data, Index Management, or via the REST interface:

curl http://localhost:9200/_cat/indices?v | fgrep xxx_alarms | sort
curl -X DELETE "localhost:9200/xxx_alarms_state_2019-07-*"
curl -X DELETE "localhost:9200/xxx_alarms_cmd_2019-07-*"
curl -X DELETE "localhost:9200/xxx_alarms_config_2019-07-*"
lynx -dump  http://localhost:9200/_cat/indices?v | fgrep xxx_alarms | sort