Skip to content

Troubleshooting

Common issues and their solutions.

DNS Records Not Created

Symptoms

  • Instance launches but no A record appears in Route53
  • ASG instances stuck in Pending:Wait state

Check 1: Lambda Logs

aws logs tail /aws/lambda/update-dns-<asg_name> --follow

Look for:

  • Permission errors (IAM policy issues)
  • Route53 zone not found
  • Instance not found (timing issue)

Check 2: EventBridge Rule

Verify the EventBridge rule exists and is enabled:

aws events list-rules --name-prefix asg-scale

Check 3: Lifecycle Hook

Verify the lifecycle hook is configured:

aws autoscaling describe-lifecycle-hooks \
  --auto-scaling-group-name <asg_name>

The hook names should match the module outputs (lifecycle_name_launching and lifecycle_name_terminating).

Common Fixes

Wrong ASG name: The asg_name variable must exactly match the Auto Scaling Group name. The EventBridge rule filters on this name.

Route53 zone permissions: Ensure the Lambda role has permissions on the correct hosted zone ID.

Instance has no public IP: If route53_public_ip = true but the instance has no public IP, the Lambda will fail. Set route53_public_ip = false for instances in private subnets.

DNS Records Not Deleted on Termination

Symptoms

  • Instance terminates but DNS records remain
  • Stale records pointing to non-existent IPs

Check: Terminating Lifecycle Hook

Ensure you've created both launching AND terminating hooks:

resource "aws_autoscaling_lifecycle_hook" "terminating" {
  autoscaling_group_name = aws_autoscaling_group.web.name
  lifecycle_transition   = "autoscaling:EC2_INSTANCE_TERMINATING"
  name                   = module.update-dns.lifecycle_name_terminating
  heartbeat_timeout      = 3600
}

If the terminating hook is missing, the Lambda is never invoked on termination and records are left behind.

Lambda Timeout

Symptoms

  • CloudWatch duration alarm fires
  • Lifecycle hook times out instead of completing

Possible Causes

DynamoDB lock contention: During rapid scale events, multiple Lambda invocations may compete for the lock. The default Lambda timeout is 60 seconds, which should be sufficient in most cases.

Route53 API throttling: If you're creating many records simultaneously, Route53 may throttle requests. Check Lambda logs for Throttling errors.

Lifecycle Hook Timeout

Symptoms

  • Instances stuck in Pending:Wait or Terminating:Wait

Fix

The default heartbeat_timeout should be at least 3600 seconds (1 hour) to give the Lambda sufficient time to process, including retries:

resource "aws_autoscaling_lifecycle_hook" "launching" {
  # ...
  heartbeat_timeout = 3600
}

If the hook times out, you can manually complete it:

aws autoscaling complete-lifecycle-action \
  --lifecycle-action-result CONTINUE \
  --lifecycle-hook-name <hook_name> \
  --auto-scaling-group-name <asg_name> \
  --instance-id <instance_id>

CloudWatch Alarm Notifications Not Received

Symptoms

  • Lambda errors occur but no email notifications

Fix

After the first terraform apply, check your email for the SNS subscription confirmation. You must click the confirmation link before notifications will be delivered.

Multiple Prefixes Not Working

Symptoms

  • Only one DNS record created instead of multiple

Check

The route53_hostname_prefixes variable only works with _PrivateDnsName_ or _PublicDnsName_. If route53_hostname is set to a custom string, prefixes are ignored:

# This creates THREE records per instance:
route53_hostname          = "_PublicDnsName_"
route53_hostname_prefixes = ["ip", "api", "web"]

# This creates ONE record per instance (prefixes ignored):
route53_hostname          = "myhost"
route53_hostname_prefixes = ["ip", "api", "web"]  # ignored

Getting Help

Collect Debug Information

# Lambda logs
aws logs tail /aws/lambda/update-dns-<asg_name> --since 1h

# ASG lifecycle hooks
aws autoscaling describe-lifecycle-hooks \
  --auto-scaling-group-name <asg_name>

# EventBridge rules
aws events list-rules --name-prefix asg-scale

# Route53 records
aws route53 list-resource-record-sets \
  --hosted-zone-id <zone_id> \
  --query "ResourceRecordSets[?Type=='A']"

Open an Issue

Open a GitHub issue with:

  • Module version
  • Terraform plan/apply output
  • Lambda logs
  • ASG and lifecycle hook configuration