Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Commit

Permalink
Merge pull request #25 from TechNative-B-V/feature/alarms-json-variable
Browse files Browse the repository at this point in the history
Feature/alarms json variable
  • Loading branch information
AndrNgg authored Aug 19, 2024
2 parents 720041c + 874f1c6 commit 05f5f46
Show file tree
Hide file tree
Showing 9 changed files with 189 additions and 90 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ repos:
- id: trailing-whitespace
- id: detect-aws-credentials
- id: check-json
- id: pretty-format-json
# - id: pretty-format-json
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ The file contains the alarms per service.
In the example below you see the EC2 service that contains the CPU Utilization alarm. This will create the CPU Utilization alarm for every EC2 instance.
```
"EC2" : { <- Service
"CPUUtilization": { <- Alarmname
"AlarmThresholds" : {
"CPUUtilization": { <- Alarmname
"AlarmThresholds" : {
"priority": ["P1", "P2", "P3"], <- for every priority there needs to be a threshold and vice versa
"alarm_threshold": ["90", "80", "75"]
"alarm_threshold": ["90", "80", "75"]
},
"ComparisonOperator" : "GreaterThanThreshold",
"Description" : { <- Description is used for naming the alarm in cloudwatch
Expand Down Expand Up @@ -76,6 +76,7 @@ module "observability_sender" {

| Name | Version |
|------|---------|
| <a name="provider_archive"></a> [archive](#provider\_archive) | n/a |
| <a name="provider_aws"></a> [aws](#provider\_aws) | > 4.3.0 |

## Modules
Expand All @@ -84,7 +85,7 @@ module "observability_sender" {
|------|--------|---------|
| <a name="module_iam_role_lambda_cw_alarm_creator"></a> [iam\_role\_lambda\_cw\_alarm\_creator](#module\_iam\_role\_lambda\_cw\_alarm\_creator) | git@github.com:TechNative-B-V/modules-aws.git//identity_and_access_management/iam_role | v1.1.7 |
| <a name="module_iam_role_lambda_payload_forwarder"></a> [iam\_role\_lambda\_payload\_forwarder](#module\_iam\_role\_lambda\_payload\_forwarder) | git@github.com:TechNative-B-V/modules-aws.git//identity_and_access_management/iam_role | v1.1.7 |
| <a name="module_lambda_cw_alarm_creator"></a> [lambda\_cw\_alarm\_creator](#module\_lambda\_cw\_alarm\_creator) | git@github.com:TechNative-B-V/modules-aws.git//lambda | v1.1.7 |
| <a name="module_lambda_cw_alarm_creator"></a> [lambda\_cw\_alarm\_creator](#module\_lambda\_cw\_alarm\_creator) | git@github.com:wearetechnative/terraform-aws-lambda.git | 13eda5f9e8ae40e51f66a45837cd41a6b35af988 |
| <a name="module_lambda_payload_forwarder"></a> [lambda\_payload\_forwarder](#module\_lambda\_payload\_forwarder) | git@github.com:TechNative-B-V/modules-aws.git//lambda | v1.1.7 |

## Resources
Expand All @@ -98,20 +99,25 @@ module "observability_sender" {
| [aws_cloudwatch_event_target.lambda_target](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource |
| [aws_cloudwatch_event_target.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource |
| [aws_kms_grant.give_lambda_role_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_grant) | resource |
| [aws_lambda_layer_version.custom_actions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_layer_version) | resource |
| [aws_lambda_permission.allow_eventbridge](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource |
| [aws_lambda_permission.allow_eventbridge_instance_terminate_rule](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource |
| [aws_lambda_permission.payload_forwarder](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource |
| [aws_sns_topic.notification_receiver](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic) | resource |
| [aws_sns_topic_policy.allow_lambda_sns_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_policy) | resource |
| [aws_sns_topic_subscription.lambda_eventbridge_forwarder](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_subscription) | resource |
| [archive_file.custom_action](https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/file) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_iam_policy_document.cloudwatch_alarms](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.eventbus](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.kms](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_cw_alarm_creator_dlq_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_ec2_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_ecs_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_elasticache_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_monitoring_account_sqs_access_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_payload_forwarder_dlq_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_rds_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.sns_topic_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source |
Expand All @@ -123,6 +129,7 @@ module "observability_sender" {
| <a name="input_eventbridge_rules"></a> [eventbridge\_rules](#input\_eventbridge\_rules) | EventBridge rule settings. | <pre>map(object({<br> description : string<br> state : string<br> event_pattern : string<br> })<br> )</pre> | `{}` | no |
| <a name="input_kms_key_arn"></a> [kms\_key\_arn](#input\_kms\_key\_arn) | ARN of the KMS key. | `string` | n/a | yes |
| <a name="input_monitoring_account_configuration"></a> [monitoring\_account\_configuration](#input\_monitoring\_account\_configuration) | Configuration settings of the monitoring account. | <pre>object({<br> sqs_name = string<br> sqs_region = string<br> sqs_account = number<br> })</pre> | n/a | yes |
| <a name="input_source_directory_location"></a> [source\_directory\_location](#input\_source\_directory\_location) | Source Directory location for the custom alarm creator actions.py. | `string` | `null` | no |
| <a name="input_sqs_dlq_arn"></a> [sqs\_dlq\_arn](#input\_sqs\_dlq\_arn) | ARN of the Dead Letter Queue. | `string` | n/a | yes |

## Outputs
Expand Down
159 changes: 104 additions & 55 deletions alarm_creator/actions.py
Original file line number Diff line number Diff line change
@@ -1,81 +1,122 @@
import boto3, json
import boto3, json, subprocess, os

from pip import main

# environment_variables
custom_alert_action = os.environ['CUSTOM_ALERT_ACTION']

# Create boto3 clients
CWclient = boto3.client("cloudwatch")
ec2 = boto3.resource("ec2")
rds = boto3.client("rds")
ec2client = boto3.client("ec2")
ecsclient = boto3.client("ecs")
elasticlient = boto3.client("elasticache")

# Create Lambda layer create if statement to choose which one depending on which variable is enabled.

# Load json file containing the alarms
with open('./alarms.json') as alarms_file:
alarms = json.load(alarms_file)

# Load json file containing the alarms, checks if it needs to use a custom alarms json or default json.
if custom_alert_action == "true":
with open('/opt/custom_alarms.json') as alarms_file:
alarms = json.load(alarms_file)
else:
with open('./alarms.json') as alarms_file:
alarms = json.load(alarms_file)

# Alarm creator
def AWS_Alarms():
for service in alarms:

# Fill instances variable with Running instances per service
dimensionlist = []
# instances = None
#Fill instances variable with Running instances per service
if service == "EC2":
instances = GetRunningInstances()
elif service == "RDS":
instances = GetRunningDBInstances()
elif service == "ECS":
instances = GetRunningClusters()
for alarm in alarms[service]:
elif service == "CWAgent":
instances = GetRunningInstances()
# elif service == "ECS":
# instances = GetRunningClusters()
# elif service == "ElastiCache":
# instances = GetRunningCacheClusters()

for alarm in alarms[service]:
# Query the namespaces in CloudWatch Metrics
response = CWclient.list_metrics(Namespace=f"{alarms[service][alarm]['Namespace']}", RecentlyActive='PT3H',)
response = CWclient.list_metrics(Namespace=f"{alarms[service][alarm]['Namespace']}", RecentlyActive='PT3H')

for metrics in response["Metrics"]:

# Check if any of the found metricnames are equal to metric names in alarms file
# Check if any of the found metric names are equal to metric names in alarms file
if metrics["MetricName"] == alarms[service][alarm]['MetricName']:
for dimensions in metrics["Dimensions"]:
if dimensions["Name"] == alarms[service][alarm]['Dimensions']:
for priority, threshold in zip(alarms[service][alarm]['AlarmThresholds']["priority"], alarms[service][alarm]['AlarmThresholds']["alarm_threshold"]):

# To make alarmnames pretty, 'MB/GB' is used instead of 1000000/1000000000 bytes, needs to be in bytes for actual threshold
if alarms[service][alarm]['Description']['ThresholdUnit'] == "GB":
cw_threshold = int(threshold) * 1000000000
elif alarms[service][alarm]['Description']['ThresholdUnit'] == "MB":
cw_threshold = int(threshold) * 1000000
else:
cw_threshold = int(threshold)

# Handling dimensions
instanceDimensions = {
"Name": f"{dimensions['Name']}",
"Value": f"{dimensions['Value']}"
for priority, threshold in zip(alarms[service][alarm]['AlarmThresholds']["priority"], alarms[service][alarm]['AlarmThresholds']["alarm_threshold"]):
# Convert thresholds to bytes if needed
if alarms[service][alarm]['Description']['ThresholdUnit'] == "GB":
cw_threshold = int(threshold) * 1000000000
elif alarms[service][alarm]['Description']['ThresholdUnit'] == "MB":
cw_threshold = int(threshold) * 1000000
else:
cw_threshold = int(threshold)

# Handling dimensions
for instance in instances:

instanceDimensions = {
"Name": f"{alarms[service][alarm]['Dimensions']}",
"Value": instance
}

#Add any additional disk-related dimensions if present
if 'ExtraDimensions' in alarms[service][alarm]:
dimensionlist.extend(alarms[service][alarm]['ExtraDimensions'])

for dimension in dimensionlist:
if dimension["Name"] == "path" and dimension["Value"] == "/":
# Query the namespaces in CloudWatch Metrics
# Find the correct device dimension for the root volume
response_2 = CWclient.list_metrics(Namespace=f"{alarms[service][alarm]['Namespace']}", RecentlyActive='PT3H',
Dimensions=[instanceDimensions, {'Name': 'path', 'Value': '/'}]
)

for metrics in response_2["Metrics"]:
for dimension in metrics["Dimensions"]:
if dimension['Name'] == "device":

dimensionlist = [
instanceDimensions,
{
"Name": "device",
"Value": f"{dimension['Value']}"
}
dimensionlist = []
# For disk alarms there are more dimensions than other alarms
try:
for item in alarms[service][alarm]['DiskDimensions']:
dimensionlist.append(item)
except KeyError: #
dimensionlist = []
dimensionlist.insert(0, instanceDimensions)

for instance in instances:

# Create alarms
CWclient.put_metric_alarm(
AlarmName=f"{instance}-{alarm} {alarms[service][alarm]['Description']['Operatorsymbol']} {threshold} {alarms[service][alarm]['Description']['ThresholdUnit']}",
ComparisonOperator=alarms[service][alarm]['ComparisonOperator'],
EvaluationPeriods=alarms[service][alarm]['EvaluationPeriods'],
MetricName=alarms[service][alarm]['MetricName'],
Namespace=alarms[service][alarm]['Namespace'],
Period=alarms[service][alarm]['Period'],
Statistic=alarms[service][alarm]['Statistic'],
Threshold=cw_threshold,
ActionsEnabled=True,
TreatMissingData=alarms[service][alarm]['TreatMissingData'],
AlarmDescription=f"{priority}",
Dimensions=dimensionlist,
Tags=[{"Key": "CreatedbyLambda", "Value": "True"}],
)
]
dimensionlist.extend(alarms[service][alarm]['ExtraDimensions'])
else:
continue
else:
#Clean up dimensionlist if not extra dimensions are present and only add the instance dimension
dimensionlist = []
dimensionlist = [instanceDimensions]


# Create the alarms
CWclient.put_metric_alarm(
AlarmName=f"{instance}-{alarm} {alarms[service][alarm]['Description']['Operatorsymbol']} {threshold} {alarms[service][alarm]['Description']['ThresholdUnit']}",
ComparisonOperator=alarms[service][alarm]['ComparisonOperator'],
EvaluationPeriods=alarms[service][alarm]['EvaluationPeriods'],
MetricName=alarms[service][alarm]['MetricName'],
Namespace=alarms[service][alarm]['Namespace'],
Period=alarms[service][alarm]['Period'],
Statistic=alarms[service][alarm]['Statistic'],
Threshold=cw_threshold,
ActionsEnabled=True,
TreatMissingData=alarms[service][alarm]['TreatMissingData'],
AlarmDescription=f"{priority}",
Dimensions=dimensionlist,
Tags=[{"Key": "CreatedbyLambda", "Value": "True"}],
)



def GetRunningInstances():
get_running_instances = ec2client.describe_instances(
Expand Down Expand Up @@ -109,18 +150,26 @@ def GetRunningClusters():

return RunningClusterNames

def GetRunningCacheClusters():
get_running_cacheclusters = elasticlient.describe_cache_clusters()
RunningCacheClusters = []
for cachecluster in get_running_cacheclusters["CacheClusters"]:
RunningCacheClusters.append(cachecluster['CacheClusterId'])

return RunningCacheClusters

def DeleteAlarms():
get_alarm_info = CWclient.describe_alarms()
RunningInstances = GetRunningInstances()
RunningRDSInstances = GetRunningDBInstances()
RunningClusters = GetRunningClusters()

# collect alarm metrics and compare alarm metric instanceId with instance id's in array. if the state reason is breaching and instance does not exist delete alarm.
for metricalarm in get_alarm_info["MetricAlarms"]:
instance_id = list(filter(lambda x: x["Name"] == "InstanceId", metricalarm["Dimensions"]))
rds_instance_name = list(filter(lambda x: x["Name"] == "DBInstanceIdentifier", metricalarm["Dimensions"]))
cluster_name = list(filter(lambda x: x["Name"] == "ClusterName", metricalarm["Dimensions"]))

if len(instance_id) == 1:
if instance_id[0]["Value"] not in RunningInstances:
CWclient.delete_alarms(AlarmNames=[metricalarm["AlarmName"]])
Expand Down
Loading

0 comments on commit 05f5f46

Please sign in to comment.