-
Notifications
You must be signed in to change notification settings - Fork 95
Alarm Logging
The alarm logger forwards status and configuration updates from one or more alarm servers to ElasticSearch. Kibana is a web-based tool for viewing and analyzing data in ElasticSearch. It can for example provide a list of recent alarms, or show the top-10 alarms of the last 24 hours.
For basic setup of the alarm server see https://github.com/ControlSystemStudio/phoebus/tree/master/app/alarm
For the alarm logger see https://github.com/ControlSystemStudio/phoebus/tree/master/services/alarm-logger
When simply downloading and starting Kibana, it will connect to elastic on localhost and allow web access to its GUI on localhost. If that is insufficient and remote access is required, edit kibana/config/kibana.yml:
# Default value "localhost" will only allow local access.
# Open to remote access:
server.host="0.0.0.0"
Open web browser to http://localhost:5601 (unless remote access was enabled as shown above).
The raw elastic data is in indices named xxx_alarms_state_2021-11-01
with new indices created each month or week depending on logger settings.
Kibana uses index patters to combine data from for example all *_alarms_state_*
indices.
If you have multiple alarm setups, you can combine data from all alarm systems that way.
After a restart with empty ElasticSearch data, when first using the "Discover" link, an index pattern with strange ID is auto-created. When then creating visualizations, they cannot be exported/imported without adjusting the strange IDs.
To avoid problems, create index names with known IDs:
- Kibana Management, Index Patterns, Create index pattern:
- Index Pattern:
*alarms_state*
, Next - Time filter field: Select message_time.
- Under "show advanced settings", enter ID "alarms_state"
If you have multiple alarm setups, you can decide to either combine
all indices via a *alarms_state*
pattern, or use a pattern xxx_alarms_state*
to combine only messages for setup xxx
.
In the following we assume that the alarm logger, elastic and kibana are all running as Linux systemd services which can be checked like this:
systemctl status elasticsearch
systemctl status alarm-logger
systemctl status kibana
To start over, stop all services:
sudo systemctl stop kibana
sudo systemctl stop alarm-logger
sudo systemctl stop elasticsearch
Start back up, verify that it's running:
sudo systemctl start elasticsearch
lynx -dump http://localhost:9200
lynx -dump http://localhost:9200/_cat/indices?v
Start logger:
sudo systemctl start alarm-logger
After a short while check that it's running and knows the alarm templates:
netstat -an | fgrep 9200
curl http://localhost:9200
curl http://localhost:9200/_template/*alarm*?pretty
If the alarm logger has seen any alarm traffic, it should have created some `alarm indices:
curl http://localhost:9200/_cat/indices?v
Pick a specific index and dump its data
curl 'http://localhost:9200/XXX_alarms_config_2019-04-26/_search?format=json&pretty'
To check log messages, for example to see if kibana complains about an incompatible version of elasticsearch:
sudo journalctl -u kibana
Open web browser to http://localhost:5601 (unless remote access was enabled as shown above).
Basic list of recent alarms:
- From top-left menu, select Kibana, Discover
- Select the alarms_state index pattern
- From available fields, select config, current_severity, severity, latch
- Select desired time range in upper right corner
To then narrow the alarm listing to just one PV:
- "Add filter"
- Select "pv" field
- Select "is" operator
- Select one of the suggest PVs for a value
Alternatively to the PV name, or in addition to the PV name, show only the first occurrence of an alarm:
- "Add filter"
- Select "latch" field
- Select "is" operator
- Select "true"
Table of recent alarm counts:
Kibana keeps changing. The "Visualize" link used to be right in the sidebar. Then it moved to "Kibana, Visualize" and with 7.17 it's under "Kibana, Visualize Library, Create new ..., Aggregation Based".
- Visualize, add a "Data Table"
- Select the "alarms_state" data source
- Leave "Metrics" on "Count"
- Under Buckets, add "Split Rows"
- Aggregation "Terms", selecting the field "pv", leaving metric: count
- "Add sub-buckets", "Split Rows", "Terms", "current_severity"
- "Add sub-buckets", "Split Rows", "Terms", "severity"
- Finally under "Options" increase the "Per Page" count to 20
To colorize table cells with "OK", "MINOR", "MAJOR" as green, orange, red: This has to be done via a global setting. Under Management, find Kibana Index Patterns, then the "alarms_state". In the table of fields, locate "current_severity". Click the edit "pencil" at the right edge of the row:
- Format: Color
- Add pattern OK with color green
- Add pattern MINOR with color orange
- Add pattern MAJOR with color red
Repeat for the "severity" field. GUI may be iffy, had to click on color in popup, was not able to enter the RGB values, so hard to get the exact same coloring for "current_severity" vs. "severity".
Basic "how many alarms over time" plot:
- Visualize, add a "Line" graph
- Select the "alarms_state" data source
- Add filter: current_severity is one of MINOR, MAJOR, INVALID, UNDEFINED (could also filter on current_severity is not OK)
- Data Metrics: Y-Axis Aggregation "Count", label "Alarm Count"
- Data Buckets: X-Axis Aggregation "Date Histogram", field "message_time", Interval "Auto", label "Time"
- Save as "Timeline (PV)". With newer versions, it can directly be added to a new or existing dashboard, but selecting "Add to library" makes easier to reach for addition to multiple dashboards or to edit.
Basic "how many latched alarms over time" plot:
- As above, but filter on "latch is true"
Plot of Top Alarm Trigger PVs:
- Visualize, add a "Vertical Bar" graph
- Select the "alarms_state" data source
- Add filter: current_severity is one of MINOR, MAJOR, INVALID, UNDEFINED
- Data Metrics: Y-Axis Aggregation "Count", label "Alarm Count"
- Data Buckets: X-Axis Aggregation "Terms", field "pv", Order by "Metric: Alarm Count", order "Descending", size "10", label "Alarm PV"
- Under "Metrix & axes", change X-axis "Align" to "Angled"
- Save as "Top Alarm Trigger PVs"
Plot of Top Latched Alarms:
- Same as Top Alarm Trigger PVs, but add a filter where "latch" is "true"
Dashboard:
- Create dashboard, add/position visualizations
- Select time range 24 hours
- Save with "Store time with dashboard" selected
To export/import, use Kibana Management, Saved Objects, select all visualizations and dashboards to export, or import a previously exported file.
Older indices need to be deleted to improve Kibana response time and to save disk space. This can be done via Kibana, Management, Stack Management, Data, Index Management, or via the REST interface:
curl http://localhost:9200/_cat/indices?v | fgrep xxx_alarms | sort
curl -X DELETE "localhost:9200/xxx_alarms_state_2019-07-*"
curl -X DELETE "localhost:9200/xxx_alarms_cmd_2019-07-*"
curl -X DELETE "localhost:9200/xxx_alarms_config_2019-07-*"
lynx -dump http://localhost:9200/_cat/indices?v | fgrep xxx_alarms | sort