Skip to content

LPX-620: ECS web autoscaling + yolo scale command#64

Open
stevethomas wants to merge 2 commits into
mainfrom
steve/lpx-620-yolo-ecs-web-scaling-target-tracking-autoscaling-yolo-scale
Open

LPX-620: ECS web autoscaling + yolo scale command#64
stevethomas wants to merge 2 commits into
mainfrom
steve/lpx-620-yolo-ecs-web-scaling-target-tracking-autoscaling-yolo-scale

Conversation

@stevethomas
Copy link
Copy Markdown
Member

@stevethomas stevethomas commented May 29, 2026

Hey, I made a thing! 🥳

Implements Phases 1–4 of LPX-620 — target-tracking autoscaling for the web ECS service plus a yolo scale command. Phase 5 (seeding/tuning the request-count target from a load test) is intentionally deferred.

What problems are you solving?

  • The web service was a fixed single task with no way to scale automatically, and desired-count is create-only — leaving a real gap once traffic grows.
  • No first-class way to change running capacity without editing the manifest and redeploying.

What's in it

  • Aws/ApplicationAutoScaling wrapper + ScalableTarget / ScalingPolicy reconcilers — QueueAlarm/Dashboard-style standalone reconcilers (not the Resource contract, since App Auto Scaling isn't RGT-taggable), dry-run honest (diff live state, only write on drift).
  • sync:app steps SyncScalableTargetStep + SyncScalingPoliciesStep, wired right after SyncEcsServiceStep and gated on a tasks.web.autoscaling block — completely inert without it (today's single-task behaviour preserved).
  • yolo scale <env> [count] — mirrors env:push's note → table → confirm → 🐥 yolo UX. Autoscaling-aware: no scalable target → sets ECS desired count; target registered → raises the target's min capacity (a raw desired set is overridden on the next evaluation) and renders desired as — (autoscaling-managed).
  • Metrics: CPU (ECSServiceAverageCPUUtilization) is always on and needs no tuning; ALBRequestCountPerTarget is added only once request-count-per-target is set (the leading indicator — its target comes from a load test, Phase 5). Both predefined → no custom metrics, no scaling-role IAM.
  • Retired dead EC2 code: the Aws\AutoScaling\AutoScalingClient (EC2 ASG) registration/accessor and the aws.autoscaling.combine / ServerGroup branch.
  • No deployer-policy change. yolo deploy never touches autoscaling (it rolls a task-def revision + UpdateService without desiredCount; App Auto Scaling keeps owning capacity across the rollout). The scaling APIs are exercised only by yolo sync / yolo scale, which run with admin creds — so the deployer role needs nothing.
  • Scheduler safety = docs + a soft, non-blocking sync-time advisory (use ->onOneServer(); separate the scheduler if needed). No Redis lock, no hard guard. Service separation is the follow-up, LPX-649.
  • Docs: yolo scale + tasks.web.autoscaling reference, new Scaling guide (covers the metric choice, choosing the request-count target, and the scheduler caveat).

Is there anything the reviewer needs to know to deploy this?

  • Zero behaviour change for existing apps. Every code path is gated on a tasks.web.autoscaling manifest block; no current manifest has one, so sync/deploy/scale behave exactly as today until someone opts in.
  • desired-count stays create-only — autoscaling takes ownership of capacity only after a scalable target is registered; sync/deploy never fight it.
  • Audit blind spot (documented): App Auto Scaling targets/policies can't be tagged, so they don't appear in yolo audit; teardown must DeregisterScalableTarget.
  • Not yet exercised against live AWS — no sync/scale was run against a real account. First real run will be CL with a tasks.web.autoscaling block once Phase 5 has a load-test number for request-count-per-target. (Provisioning runs under admin/local creds, not CI.)
  • 440 Pest green (1021 assertions), Pint + PHPStan clean, VitePress builds (no dead links).

🤖 Generated with Claude Code

stevethomas and others added 2 commits May 29, 2026 17:47
Adds Application Auto Scaling for the web ECS service (target tracking) and
a `yolo scale` command for out-of-band capacity changes. Phases 1-4 of
LPX-620; Phase 5 (load-test tuning of the request-count target) is deferred.

- Aws/ApplicationAutoScaling wrapper; ScalableTarget + ScalingPolicy
  reconcilers (QueueAlarm/Dashboard-style, dry-run honest)
- sync:app SyncScalableTargetStep + SyncScalingPoliciesStep, gated on a
  tasks.web.autoscaling manifest block (inert without it)
- yolo scale command mirroring env:push's compare->confirm UX,
  autoscaling-aware (desired count vs scalable-target min capacity)
- CPU policy always on; ALBRequestCountPerTarget added once
  request-count-per-target is set
- retired the dead EC2 AutoScalingClient + aws.autoscaling.combine
- deployer IAM (application-autoscaling, cloudwatch alarms, scoped
  CreateServiceLinkedRole) gated on autoscaling being configured
- scheduler safety via docs + a soft sync-time advisory (onOneServer
  recommended; service separation tracked in LPX-649)
- docs: yolo scale + tasks.web.autoscaling reference, new Scaling guide

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deploy never touches autoscaling — it rolls a task-def revision and calls
UpdateService without desiredCount, so the scalable target and its policies
are untouched and App Auto Scaling keeps owning capacity across the rollout.
The scaling APIs are exercised only by `yolo sync` / `yolo scale`, which run
with admin creds, not the GitHub Actions deployer role. Granting them here
contradicted the policy's "exactly what deploy exercises" doctrine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant