CloudWatch Integration¶
This document covers CloudWatch Logs, Metrics, and Alarms for the Terraformer module.
CloudWatch Logs¶
Log Group¶
The module creates a dedicated log group:
- Name:
/aws/ec2/terraformer - Retention: Configurable (default: 365 days)
- Encryption: Default AWS encryption
What Gets Logged¶
The terraformer instance can write logs to CloudWatch via the CloudWatch agent (configured by Puppet):
- System logs:
/var/log/syslog,/var/log/auth.log - Application logs: Terraform runs, AWS CLI commands
- Puppet logs: Puppet agent runs and errors
- Custom application logs
Accessing Logs¶
- Go to CloudWatch → Log groups
- Find
/aws/ec2/terraformer - Browse log streams (organized by instance ID)
# List log streams
aws logs describe-log-streams \
--log-group-name "/aws/ec2/terraformer" \
--order-by LastEventTime \
--descending \
--max-items 10
# Tail logs
aws logs tail "/aws/ec2/terraformer" --follow
# Search for specific pattern
aws logs filter-log-events \
--log-group-name "/aws/ec2/terraformer" \
--filter-pattern "ERROR"
Testing Log Integration¶
From the terraformer instance, test writing logs:
# Write test log to CloudWatch
aws logs put-log-events \
--log-group-name "/aws/ec2/terraformer" \
--log-stream-name "test-stream" \
--log-events timestamp=$(date +%s)000,message="Test log message"
CloudWatch Metrics¶
Custom Metrics Namespace¶
The module configures a custom namespace for terraformer metrics using the convention Service/Component:
Publishing Metrics¶
From the terraformer instance:
# Publish a custom metric
aws cloudwatch put-metric-data \
--namespace "Terraformer/System" \
--metric-name "TerraformRunDuration" \
--value 120 \
--unit Seconds
IAM Permissions¶
The instance has permission to publish metrics only to its configured namespace:
{
"Effect": "Allow",
"Action": "cloudwatch:PutMetricData",
"Resource": "*",
"Condition": {
"StringEquals": {
"cloudwatch:namespace": "Terraformer/System"
}
}
}
CloudWatch Alarms¶
Auto-Recovery Alarms¶
System Status Check Alarm¶
Purpose: Detect and recover from hardware failures.
resource "aws_cloudwatch_metric_alarm" "terraformer_system_auto_recovery" {
alarm_name = "terraformer-system-auto-recovery-${instance_id}"
alarm_description = "Auto recover Terraformer instance when underlying hardware fails"
namespace = "AWS/EC2"
metric_name = "StatusCheckFailed_System"
statistic = "Maximum"
period = 60
evaluation_periods = 2
threshold = 0.5
comparison_operator = "GreaterThanOrEqualToThreshold"
dimensions = {
InstanceId = aws_instance.terraformer.id
}
alarm_actions = ["arn:aws:automate:${region}:ec2:recover"]
}
What it monitors:
- Host hardware issues
- Network path failures
- Power problems on underlying host
Action: Migrates instance to healthy hardware (preserves instance ID, IP, volumes).
Instance Status Check Alarm¶
Purpose: Detect and recover from software failures.
resource "aws_cloudwatch_metric_alarm" "terraformer_instance_check" {
alarm_name = "terraformer-instance-status-check-${instance_id}"
alarm_description = "Auto-reboot Terraformer instance on status check failures"
namespace = "AWS/EC2"
metric_name = "StatusCheckFailed_Instance"
statistic = "Maximum"
period = 60
evaluation_periods = 3
threshold = 0.5
comparison_operator = "GreaterThanOrEqualToThreshold"
dimensions = {
InstanceId = aws_instance.terraformer.id
}
alarm_actions = ["arn:aws:automate:${region}:ec2:reboot"]
}
What it monitors:
- Kernel panics
- Out of memory conditions
- Network misconfiguration
- Failed instance checks
Action: Reboots the instance.
CPU Utilization Alarm¶
The module automatically creates a CPU utilization alarm that sends notifications to the configured alarm_emails:
Email Confirmation Required
AWS SNS sends confirmation emails to each address. Recipients MUST click the confirmation link to receive notifications.
Configuration:
- Threshold: 90% CPU
- Evaluation: 1 period of 60 seconds
- Action: SNS notification to all confirmed subscribers
Monitoring Examples¶
Create Custom Alarms¶
For Terraform run failures:
resource "aws_cloudwatch_metric_alarm" "terraform_failures" {
alarm_name = "terraformer-terraform-failures"
alarm_description = "Alert on Terraform apply failures"
namespace = "Terraformer/System"
metric_name = "TerraformApplyFailures"
statistic = "Sum"
period = 300
evaluation_periods = 1
threshold = 1
comparison_operator = "GreaterThanOrEqualToThreshold"
# Use the module's SNS topic output
alarm_actions = [module.terraformer.sns_topic_arn]
}
Then from terraformer instance:
#!/bin/bash
# terraform-wrapper.sh
terraform apply -auto-approve
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
# Publish failure metric
aws cloudwatch put-metric-data \
--namespace "Terraformer/System" \
--metric-name "TerraformApplyFailures" \
--value 1 \
--unit Count
fi
exit $EXIT_CODE
Monitor Instance Health¶
Create a dashboard:
resource "aws_cloudwatch_dashboard" "terraformer" {
dashboard_name = "terraformer-health"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
properties = {
metrics = [
["AWS/EC2", "CPUUtilization", { stat = "Average", label = "CPU" }],
[".", "StatusCheckFailed_System", { stat = "Maximum", label = "System Check" }],
[".", "StatusCheckFailed_Instance", { stat = "Maximum", label = "Instance Check" }]
]
period = 300
region = var.region
title = "Terraformer Health"
yAxis = {
left = {
min = 0
}
}
}
}
]
})
}
Log Insights Queries¶
Failed SSH Attempts¶
fields @timestamp, @message
| filter @message like /Failed password/
| sort @timestamp desc
| limit 20
Terraform Apply Commands¶
Sudo Commands¶
fields @timestamp, @message
| filter @message like /sudo:/
| parse @message "* : * : * ; *" as timestamp, user, pwd, command
| display timestamp, user, command
Instance Reboots¶
fields @timestamp, @message
| filter @message like /reboot/ or @message like /shutdown/
| sort @timestamp desc
Cost Optimization¶
Log Retention¶
Balance compliance requirements with cost:
# ISO 27001: 365 days
cloudwatch_log_retention = 365
# GDPR: 90-180 days typical
cloudwatch_log_retention = 90
# PCI DSS: 90 days minimum
cloudwatch_log_retention = 90
Pricing (us-west-2):
- Ingestion: $0.50 per GB
- Storage: $0.03 per GB per month
- 365-day retention: ~$0.93 per GB total
Selective Logging¶
Configure CloudWatch agent to log only what you need:
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/auth.log",
"log_group_name": "/aws/ec2/terraformer",
"log_stream_name": "{instance_id}/auth"
},
{
"file_path": "/var/log/terraform.log",
"log_group_name": "/aws/ec2/terraformer",
"log_stream_name": "{instance_id}/terraform"
}
]
}
}
}
}
Troubleshooting¶
Logs Not Appearing¶
-
Check IAM permissions:
-
Verify CloudWatch agent is running:
-
Check agent configuration:
Metrics Not Publishing¶
-
Test permissions:
-
Check for errors:
Alarm Not Triggering¶
-
Check alarm state:
-
View alarm history:
-
Test alarm: