Scaling¶

This guide covers how to configure scaling behavior, warm pools, and spot instances.

Warm Pool¶

The warm pool keeps instances in a hibernated state, ready to wake up in seconds.

How It Works¶

Instances boot fully and configure themselves
Instead of terminating on scale-in, they hibernate to the warm pool
On scale-out, hibernated instances wake up (~10-30 seconds)
Cold launches only happen when warm pool is empty

Configuration¶

module "actions-runner" {
  # ... required variables ...

  # Keep 2 instances warm, allow up to 5
  warm_pool_min_size = 2
  warm_pool_max_size = 5

  # Target 1 idle runner at all times
  idle_runners_target_count = 1
}

Default Behavior¶

If not specified:

warm_pool_min_size = idle_runners_target_count + 1
warm_pool_max_size = asg_max_size

Limitations¶

Spot Instances

Warm pool is automatically disabled when using spot instances (on_demand_base_capacity is set). This is an AWS limitation — spot instances cannot be hibernated.

Spot Instances¶

Reduce costs by using spot instances for your runners.

Configuration¶

module "actions-runner" {
  # ... required variables ...

  # Use spot instances with 1 on-demand as fallback
  on_demand_base_capacity = 1

  asg_min_size = 2
  asg_max_size = 10
}

This configuration:

Always keeps 1 on-demand instance (reliability)
Uses spot for all additional capacity
Warm pool is disabled

Cost Savings¶

Spot instances typically cost 60-90% less than on-demand. For CI/CD workloads that can tolerate interruption, this is ideal.

Spot Interruption Handling¶

When AWS reclaims a spot instance:

ASG lifecycle hook fires
Deregistration Lambda gracefully stops the runner
Running job may fail (GitHub will retry on another runner)
ASG launches replacement instance

Graceful Drain Time

Configure allowed_drain_time (default: 900 seconds) to give running jobs time to complete before termination.

Autoscaling¶

The module uses CloudWatch alarms to scale based on idle runner count.

How It Works¶

record_metric Lambda (every minute)
        │
        ▼
Publishes: IdleRunnersCount = N
        │
        ▼
CloudWatch Alarms evaluate:
  - idle_runners_low:  N < target → Scale OUT
  - idle_runners_high: N > target → Scale IN
        │
        ▼
ASG Step Scaling Policy executes

Configuration¶

module "actions-runner" {
  # ... required variables ...

  # Target idle runner count
  idle_runners_target_count = 2

  # Add/remove this many instances per scaling action
  autoscaling_step = 1

  # Wait this long before evaluating scale-out
  autoscaling_scaleout_evaluation_period = 60
}

Scaling Behavior¶

Scenario	Action
0 idle runners, target is 2	Scale out by `autoscaling_step`
5 idle runners, target is 2	Scale in by `autoscaling_step`
2 idle runners, target is 2	No action

Tuning Tips¶

For bursty workloads:

autoscaling_step = 3                          # Add more runners at once
autoscaling_scaleout_evaluation_period = 30   # React faster
idle_runners_target_count = 3                 # Keep more idle

For steady workloads:

autoscaling_step = 1                          # Gradual scaling
autoscaling_scaleout_evaluation_period = 120  # Avoid thrashing
idle_runners_target_count = 1                 # Minimal idle capacity

ASG Sizing¶

Basic Configuration¶

module "actions-runner" {
  # Minimum instances (always running)
  asg_min_size = 1

  # Maximum instances (cost control)
  asg_max_size = 10
}

Default Behavior¶

If not specified:

asg_min_size = number of subnets
asg_max_size = number of subnets + 1

Instance Lifetime¶

Instances are automatically recycled to pick up updates:

# Recycle instances every 30 days (default)
max_instance_lifetime_days = 30

# Disable recycling
max_instance_lifetime_days = 0

Drain Time¶

When an instance is terminating, give running jobs time to complete:

# Allow 15 minutes for jobs to finish (default: 900 seconds)
allowed_drain_time = 900

Note

Maximum allowed value is 900 seconds (AWS limitation).

Example: High-Availability Setup¶

module "actions-runner" {
  source  = "registry.infrahouse.com/infrahouse/actions-runner/aws"
  version = "~> 3.2"

  environment     = "production"
  github_org_name = "my-org"
  subnet_ids      = data.aws_subnets.private.ids  # Multiple AZs
  alarm_emails    = ["oncall@example.com"]

  github_token_secret_arn = aws_secretsmanager_secret.token.arn

  # Always have runners ready
  asg_min_size              = 2
  asg_max_size              = 20
  idle_runners_target_count = 3

  # Fast scaling for CI spikes
  autoscaling_step                       = 2
  autoscaling_scaleout_evaluation_period = 30

  # Warm pool for instant availability
  warm_pool_min_size = 3
  warm_pool_max_size = 10
}

Example: Cost-Optimized Setup¶

module "actions-runner" {
  source  = "registry.infrahouse.com/infrahouse/actions-runner/aws"
  version = "~> 3.2"

  environment     = "development"
  github_org_name = "my-org"
  subnet_ids      = [data.aws_subnets.private.ids[0]]  # Single AZ
  alarm_emails    = ["dev@example.com"]

  github_token_secret_arn = aws_secretsmanager_secret.token.arn

  # Minimal capacity
  asg_min_size              = 0
  asg_max_size              = 5
  idle_runners_target_count = 0

  # Use spot instances
  on_demand_base_capacity = 0  # All spot

  # Slower scaling (save money)
  autoscaling_step                       = 1
  autoscaling_scaleout_evaluation_period = 120
}