Troubleshooting¶
This guide covers common issues and their solutions when using the terraform-aws-website-pod module.
Deployment Issues¶
Certificate Validation Timeout¶
Symptoms: - Terraform hangs at aws_acm_certificate_validation.website - Error: "timeout while waiting for state to become 'ISSUED'"
Causes: - DNS propagation delay - Incorrect Route53 zone ID - Cross-account DNS misconfiguration
Solutions:
-
Verify the zone ID is correct:
-
Check if validation records were created:
-
For cross-account DNS, verify the provider configuration:
-
Increase timeout (add to your configuration):
Instances Not Becoming Healthy¶
Symptoms: - Terraform hangs at aws_autoscaling_group.website - Error: "timeout while waiting for state to become 'healthy'" - Instances keep getting replaced
Causes: - Health check path returns non-200 status - Application not starting properly - Security group blocking traffic - Instance failing to provision
Solutions:
-
Check instance status in AWS Console or CLI:
-
Connect to an instance and check logs:
-
Test health check endpoint manually:
-
Verify security groups allow traffic from ALB:
-
Temporarily increase timeouts:
Provider Configuration Errors¶
Symptoms: - Error: "Provider configuration not present" - Error: "Configuration for provider 'aws.dns' is not present"
Solution:
Always pass both providers:
module "website" {
providers = {
aws = aws
aws.dns = aws # Can be the same provider if same account/region
}
# ...
}
Runtime Issues¶
High Error Rate Alarms¶
Symptoms: - CloudWatch alarm: "Low Success Rate" - 5xx errors in ALB access logs
Diagnosis:
-
Check ALB access logs:
-
Look for error patterns:
-
Check target health:
Solutions:
- If instances are unhealthy, check instance logs
- If instances are healthy but returning errors, debug application
- If specific instances are problematic, terminate and let ASG replace
High Latency Alarms¶
Symptoms: - CloudWatch alarm: "Target Response Time" - Slow page loads
Diagnosis:
- Check CloudWatch metrics:
TargetResponseTime- Time to first byte from targetsRequestCount- Traffic volume-
ActiveConnectionCount- Concurrent connections -
Check instance CPU utilization:
Solutions:
- Scale up instance type if CPU is consistently high
- Increase
asg_max_sizeif hitting scaling limits - Optimize application code for slow endpoints
- Consider using
least_outstanding_requestsalgorithm:
Unhealthy Host Alarms¶
Symptoms: - CloudWatch alarm: "Unhealthy Host Count" - Some instances marked unhealthy in target group
Diagnosis:
-
Check target group health:
-
Check instance status:
-
SSH to unhealthy instance and check:
- Application is running
- Health endpoint responds
- No disk space issues
- No memory issues
Solutions:
-
If transient (during deployments), adjust threshold:
-
If persistent, investigate and fix root cause
No Email Notifications¶
Symptoms: - Alarms are firing but no emails received - SNS subscription shows "PendingConfirmation"
Solution:
- Check for confirmation email in spam folder
-
Resend confirmation:
-
Or recreate subscription via Terraform (destroy and apply)
Security Issues¶
Cannot SSH to Instances¶
Symptoms: - SSH connection timeout - "Connection refused"
Causes: - Security group blocking SSH - No route to instance (private subnet without bastion) - Wrong key pair
Solutions:
-
Check security group allows SSH:
-
If in private subnet, use Session Manager:
-
Or deploy a bastion host in public subnet
-
Add
ssh_cidr_blockfor your IP:
Certificate Not Working¶
Symptoms: - Browser shows "Certificate Invalid" - curl fails with SSL error
Diagnosis:
-
Check certificate status:
-
Verify DNS resolves to ALB:
-
Test SSL:
Solutions:
- If certificate is
PENDING_VALIDATION, wait for DNS propagation - If certificate is
FAILED, check validation records - If using wrong certificate, verify
dns_a_recordsincludes all hostnames
Cost Issues¶
Unexpected Charges¶
Common causes and solutions:
- ALB charges: ALBs have hourly charges plus LCU charges
- Review traffic patterns
-
Consider combining multiple services behind one ALB
-
Data transfer: Check CloudWatch for data transfer metrics
- Enable compression in your application
-
Use CloudFront for static assets
-
Spot instance interruptions: Frequent replacements increase costs
- Increase
on_demand_base_capacityfor stability -
Use multiple instance types (requires custom configuration)
-
S3 access logs: Large log volumes increase storage costs
- Set up lifecycle rules to delete old logs
- Consider sampling in high-traffic scenarios
Getting Help¶
If you're still experiencing issues:
- Check existing issues: GitHub Issues
- Open a new issue with:
- Terraform version
- Module version
- Relevant configuration (sanitized)
- Error messages
- Steps to reproduce