Troubleshooting¶
Common issues and their solutions.
DNS Records Not Created¶
Symptoms¶
- Instance launches but no A record appears in Route53
- ASG instances stuck in
Pending:Waitstate
Check 1: Lambda Logs¶
Look for:
- Permission errors (IAM policy issues)
- Route53 zone not found
- Instance not found (timing issue)
Check 2: EventBridge Rule¶
Verify the EventBridge rule exists and is enabled:
Check 3: Lifecycle Hook¶
Verify the lifecycle hook is configured:
The hook names should match the module outputs (lifecycle_name_launching and lifecycle_name_terminating).
Common Fixes¶
Wrong ASG name: The asg_name variable must exactly match the Auto Scaling Group name. The EventBridge rule filters on this name.
Route53 zone permissions: Ensure the Lambda role has permissions on the correct hosted zone ID.
Instance has no public IP: If route53_public_ip = true but the instance has no public IP, the Lambda will fail. Set route53_public_ip = false for instances in private subnets.
DNS Records Not Deleted on Termination¶
Symptoms¶
- Instance terminates but DNS records remain
- Stale records pointing to non-existent IPs
Check: Terminating Lifecycle Hook¶
Ensure you've created both launching AND terminating hooks:
resource "aws_autoscaling_lifecycle_hook" "terminating" {
autoscaling_group_name = aws_autoscaling_group.web.name
lifecycle_transition = "autoscaling:EC2_INSTANCE_TERMINATING"
name = module.update-dns.lifecycle_name_terminating
heartbeat_timeout = 3600
}
If the terminating hook is missing, the Lambda is never invoked on termination and records are left behind.
Lambda Timeout¶
Symptoms¶
- CloudWatch duration alarm fires
- Lifecycle hook times out instead of completing
Possible Causes¶
DynamoDB lock contention: During rapid scale events, multiple Lambda invocations may compete for the lock. The default Lambda timeout is 60 seconds, which should be sufficient in most cases.
Route53 API throttling: If you're creating many records simultaneously, Route53 may throttle requests. Check Lambda logs for Throttling errors.
Lifecycle Hook Timeout¶
Symptoms¶
- Instances stuck in
Pending:WaitorTerminating:Wait
Fix¶
The default heartbeat_timeout should be at least 3600 seconds (1 hour) to give the Lambda sufficient time to process, including retries:
If the hook times out, you can manually complete it:
aws autoscaling complete-lifecycle-action \
--lifecycle-action-result CONTINUE \
--lifecycle-hook-name <hook_name> \
--auto-scaling-group-name <asg_name> \
--instance-id <instance_id>
CloudWatch Alarm Notifications Not Received¶
Symptoms¶
- Lambda errors occur but no email notifications
Fix¶
After the first terraform apply, check your email for the SNS subscription confirmation. You must click the confirmation link before notifications will be delivered.
Multiple Prefixes Not Working¶
Symptoms¶
- Only one DNS record created instead of multiple
Check¶
The route53_hostname_prefixes variable only works with _PrivateDnsName_ or _PublicDnsName_. If route53_hostname is set to a custom string, prefixes are ignored:
# This creates THREE records per instance:
route53_hostname = "_PublicDnsName_"
route53_hostname_prefixes = ["ip", "api", "web"]
# This creates ONE record per instance (prefixes ignored):
route53_hostname = "myhost"
route53_hostname_prefixes = ["ip", "api", "web"] # ignored
Getting Help¶
Collect Debug Information¶
# Lambda logs
aws logs tail /aws/lambda/update-dns-<asg_name> --since 1h
# ASG lifecycle hooks
aws autoscaling describe-lifecycle-hooks \
--auto-scaling-group-name <asg_name>
# EventBridge rules
aws events list-rules --name-prefix asg-scale
# Route53 records
aws route53 list-resource-record-sets \
--hosted-zone-id <zone_id> \
--query "ResourceRecordSets[?Type=='A']"
Open an Issue¶
Open a GitHub issue with:
- Module version
- Terraform plan/apply output
- Lambda logs
- ASG and lifecycle hook configuration