Architecture¶
This document explains how the InfraHouse GitHub Actions Runner module works.
Overview¶
Components¶
Auto Scaling Group¶
The core of the module is an AWS Auto Scaling Group that manages EC2 instances:
- Launch Template: Defines instance configuration (AMI, instance type, security groups)
- Scaling Policies: Step scaling based on CloudWatch alarms
- Lifecycle Hooks: Ensure proper registration/deregistration with GitHub
- Warm Pool: Keeps hibernated instances ready for fast scaling
Lambda Functions¶
Three Lambda functions manage the runner lifecycle:
1. Registration Lambda (runner_registration)¶
Triggered by ASG lifecycle hook when an instance launches:
- Retrieves GitHub credentials from Secrets Manager
- Generates a runner registration token
- Stores token in Secrets Manager for the instance to retrieve
- Completes the lifecycle hook
2. Deregistration Lambda (runner_deregistration)¶
Triggered by ASG lifecycle hook when an instance terminates:
- Sends SSM command to gracefully stop the runner service
- Deregisters the runner from GitHub
- Cleans up the registration token from Secrets Manager
- Completes the lifecycle hook
Also runs on a schedule to clean up orphaned runners.
3. Record Metric Lambda (record_metric)¶
Runs on a schedule (default: every minute):
- Queries GitHub API for current runner status
- Counts idle runners
- Publishes metric to CloudWatch
- CloudWatch alarms trigger scaling based on this metric
CloudWatch Alarms¶
Two alarms control scaling:
- idle_runners_low: Triggers scale-out when idle runners < target
- idle_runners_high: Triggers scale-in when idle runners > target
Warm Pool¶
The warm pool keeps instances in a hibernated state:
- Instances are fully booted and configured
- When needed, they wake up in seconds (vs. minutes for cold start)
- Disabled automatically when using spot instances (AWS limitation)
Instance Lifecycle¶
Launch Sequence¶
1. ASG decides to launch instance
│
▼
2. Instance starts from launch template
│
▼
3. registration lifecycle hook fires
│
▼
4. Registration Lambda:
- Gets GitHub token/App credentials
- Creates runner registration token
- Stores token in Secrets Manager
- Completes hook
│
▼
5. Instance cloud-init runs:
- Installs packages
- Runs Puppet (if configured)
- Retrieves registration token
- Registers runner with GitHub
- Starts runner service
│
▼
6. bootstrap lifecycle hook fires
│
▼
7. Instance calls ih-aws to complete hook
│
▼
8. Instance enters InService state
│
▼
9. Runner picks up jobs from GitHub
Termination Sequence¶
1. ASG decides to terminate instance
(scale-in, max lifetime, health check)
│
▼
2. deregistration lifecycle hook fires
│
▼
3. Deregistration Lambda:
- Sends SSM command: stop runner service
- Waits for running job to complete
- Deregisters runner from GitHub
- Deletes registration token
- Completes hook
│
▼
4. Instance terminates
Scaling Behavior¶
Scale Out (Add Runners)¶
record_metricLambda publishes idle runner countidle_runners_lowalarm enters ALARM state- ASG scaling policy adds instances
- If warm pool has instances: wake from hibernation (fast)
- If warm pool empty: launch new instance (slower)
Scale In (Remove Runners)¶
record_metricLambda publishes idle runner countidle_runners_highalarm enters ALARM state- ASG scaling policy removes instances
- Lifecycle hook ensures graceful deregistration
- Instance returns to warm pool (if enabled) or terminates
Security Model¶
Instance Permissions¶
Runners have minimal IAM permissions:
- SSM for command execution
- Secrets Manager read for registration token only
- CloudWatch for metrics
Lambda Permissions¶
Each Lambda has scoped permissions:
- Registration: Secrets Manager write, GitHub API
- Deregistration: SSM commands, Secrets Manager delete, GitHub API
- Record Metric: GitHub API read, CloudWatch put metric
Network¶
- Runners in private subnets with NAT gateway
- Lambdas in VPC with access to AWS services
- Security group allows only outbound traffic
