Skip to content

add module for existing ECS service#22

Open
flybayer wants to merge 1 commit into
mainfrom
bb-existing-ecs
Open

add module for existing ECS service#22
flybayer wants to merge 1 commit into
mainfrom
bb-existing-ecs

Conversation

@flybayer

@flybayer flybayer commented Jun 12, 2026

Copy link
Copy Markdown
Member
image

Greptile Summary

This PR introduces a new rvn-ecs-existing-service module that deploys releases to an externally managed ECS service without owning any infrastructure, and refactors the existing ECS web module's inline CloudWatch metrics into a shared ecs-service-metrics.yml template.

  • New module (modules_without_stack/ecs_existing_service): accepts cluster/service name, an optional target group ARN suffix, an image repository, and a full task definition template; on each deployment it registers a new task definition revision with the image overridden for the named container and updates the service.
  • Metrics template (partials/templates/ecs-service-metrics.yml): extracts 11 CloudWatch metrics (CPU, memory, running tasks, Container Insights networking, and ALB metrics) into a reusable parameterised template consumed by both the new module and the refactored ECS web definition.
  • Tooling (compiler.ts / guardrails.ts): modules_without_stack is added to MODULE_CATEGORIES in both files so definition files in the new directory are discovered, compiled, and validated correctly.

Confidence Score: 4/5

The change is additive and isolated; the new module does not own any AWS infrastructure, and the metrics refactor in the existing ECS web module is a drop-in equivalent substitution.

The core logic — task definition rendering, image override, and service update — looks correct. The open question is how the platform handles the four ALB metrics when target_group_arn_suffix is left blank: if it silently skips metrics with empty dimensions the behavior is fine, but if it forwards them to CloudWatch the queries will carry an empty TargetGroup dimension. The root README.md is also not updated as required by AGENTS.md, leaving documentation out of sync.

modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml — confirm platform handling of empty CloudWatch dimensions and add the new module to the root README.md.

Important Files Changed

Filename Overview
modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml New module definition for deploying to an existing ECS service; introduces optional target_group that is always passed through to the metrics template without a nil guard, and root README.md is not updated per AGENTS.md requirements.
partials/templates/ecs-service-metrics.yml New reusable CloudWatch metrics template extracted from the ECS web module; correctly parameterises all 11 metrics but lacks conditional logic to suppress ALB metrics when target_group is empty.
compute/ecs_service/rvn-ecs-web-definition.yml Replaces 130 lines of inline metric definitions with the new shared template; semantically equivalent since the web module always has a target group, no behavioral change.
tools/ravion-modules/src/compiler.ts Adds modules_without_stack to MODULE_CATEGORIES so definition files in that directory are discovered and compiled; one-line, safe addition.
tools/ravion-modules/src/guardrails.ts Mirrors the compiler change by adding modules_without_stack to its own MODULE_CATEGORIES, ensuring the new category is treated as an allowed colocated definition directory; safe.

Sequence Diagram

sequenceDiagram
    participant User
    participant Ravion
    participant ECS as AWS ECS
    participant CW as CloudWatch

    User->>Ravion: "Deploy (image_ref = tag or digest)"
    Ravion->>Ravion: Render task_definition_template
    Note over Ravion: Override deploy-target container image
    Ravion->>ECS: RegisterTaskDefinition (new revision)
    ECS-->>Ravion: task_definition_arn
    Ravion->>ECS: UpdateService (existing cluster/service)
    ECS-->>Ravion: deployment started
    Ravion->>ECS: Poll until stable (timeout 1800s)
    ECS-->>Ravion: service stable
    Ravion-->>User: Deployment complete

    User->>Ravion: View metrics
    Ravion->>CW: Query ECS metrics (CPU, memory, tasks)
    Ravion->>CW: Query Container Insights (network I/O)
    alt target_group configured
        Ravion->>CW: Query ALB metrics (requests, errors, latency)
    end
    CW-->>Ravion: metric data
    Ravion-->>User: Metrics dashboard
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml:1-5
**Root README.md not updated**

`AGENTS.md` explicitly marks it as critical to update the root `README.md` Module Directory table whenever a module is added, modified, or removed. This PR introduces a new module (`rvn-ecs-existing-service`) in a new category (`modules_without_stack`) but the root `README.md` is not updated, leaving it out of sync.

### Issue 2 of 2
modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml:196-203
**ALB metrics emitted when target group is absent**

`target_group_arn_suffix` is optional (`required: false`), so it can be empty when the service has no load balancer. The shared `ecs-service-metrics.yml` template has no conditional guard and always emits all four ALB metrics (request count, 4xx/5xx errors, response time, healthy/unhealthy hosts) with `TargetGroup: $with.target_group`. When `target_group_arn_suffix` is blank, the compiled output will carry an empty CloudWatch dimension, which may result in API errors or permanently empty metric graphs visible to users. The README acknowledges this as intentional, but it is worth confirming the platform silently suppresses metrics with nil/empty dimensions rather than surfacing errors.

Reviews (1): Last reviewed commit: "add module for existing ECS service" | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Context used - AGENTS.md (source)

@github-actions

Copy link
Copy Markdown

Ravion Module Publish Plan

Dry run only. No Ravion API mutations were made.

Module Current Version New Version Description
rvn-ecs-existing-service n/a 0.0.1 Initial existing ECS service module definition.

Diffs

rvn-ecs-existing-service n/a -> 0.0.1

--- remote
+++ compiled
-
+description: Deploys releases to an existing, externally managed Amazon ECS service using a user-provided task definition template.
+name: Existing ECS Service
+type: rvn-ecs-existing-service

rvn-ecs-existing-service n/a -> 0.0.1

--- remote
+++ compiled
+deploy:
+  concurrency:
+    queue_overflow: oldest
+    queue_size: 1
+  infrastructure:
+    ecs_cluster_arn: arn:aws:ecs:<<module.input.aws_region>>:<<module.input.aws_account_id>>:cluster/<<module.input.cluster_name>>
+    ecs_service_arns:
+      - arn:aws:ecs:<<module.input.aws_region>>:<<module.input.aws_account_id>>:service/<<module.input.cluster_name>>/<<module.input.service_name>>
+  inputs:
+    - description: Image tag or digest to deploy, resolved in the image repository configured on the module. Do not pass a full image URI.
+      id: image_ref
+      label: Image tag or digest
+      placeholder: sha256:... or v1.2.3
+      required: true
+      type: string
+  task_definition: '<< module.input.task_definition_template | toPairs() | concat(toPairs({"container_definitions": (module.input.task_definition_template.container_definitions != nil ? map(module.input.task_definition_template.container_definitions, #.name == module.input.container_name ? fromPairs(concat(toPairs(#), toPairs(module.input.image_registry_credentials_secret_arn ? {"image": (deploy.input.image_ref contains "sha256:" ? module.input.image_repository + "@" + deploy.input.image_ref : module.input.image_repository + ":" + deploy.input.image_ref), "repository_credentials": {"credentials_parameter": module.input.image_registry_credentials_secret_arn}} : {"image": (deploy.input.image_ref contains "sha256:" ? module.input.image_repository + "@" + deploy.input.image_ref : module.input.image_repository + ":" + deploy.input.image_ref)}))) : #) : [])})) | fromPairs() >>'
+  timeout: 1800
+  type: aws:ecs
+inputs:
+  - id: section_aws
+    label: AWS account & region
+    type: section
+  - id: aws_account_id
+    immutable: true
+    label: AWS account
+    required: true
+    type: string
+    values: $values:ravion/aws_accounts
+  - id: aws_region
+    immutable: true
+    label: Region
+    required: true
+    type: string
+    values: $values:aws/regions
+  - description: Identify the existing ECS cluster and service Ravion should deploy to. Ravion does not create, change, or destroy these resources.
+    id: section_service
+    label: ECS service
+    type: section
+  - description: Name of the existing ECS cluster that contains the service.
+    id: cluster_name
+    immutable: true
+    label: Cluster name
+    patterns:
+      - message: "Use 1-255 characters: letters, numbers, hyphens, and underscores only."
+        pattern: ^[A-Za-z0-9_-]{1,255}$
+    placeholder: production-cluster
+    required: true
+    type: string
+  - description: Name of the existing ECS service to update on each deployment.
+    id: service_name
+    immutable: true
+    label: Service name
+    patterns:
+      - message: "Use 1-255 characters: letters, numbers, hyphens, and underscores only."
+        pattern: ^[A-Za-z0-9_-]{1,255}$
+    placeholder: my-service
+    required: true
+    type: string
+  - collapsible: true
+    description: ARN suffix of the load balancer target group serving this service, such as `targetgroup/my-tg/1234567890abcdef`. Used only for load balancer metrics; leave blank if the service is not behind a load balancer.
+    id: target_group_arn_suffix
+    label: Target group ARN suffix
+    placeholder: targetgroup/my-tg/1234567890abcdef
+    required: false
+    type: string
+  - description: Images are built outside Ravion. Configure the registry repository here and provide only the tag or digest at deploy time.
+    id: section_image
+    label: Image registry
+    type: section
+  - description: Image repository without a tag or digest, such as `nginx`, `ghcr.io/org/app`, or `123456789012.dkr.ecr.us-east-1.amazonaws.com/app`.
+    id: image_repository
+    label: Image repository
+    placeholder: 123456789012.dkr.ecr.us-east-1.amazonaws.com/app
+    required: true
+    type: string
+  - collapsible: true
+    description: Secrets Manager secret ARN for private registries such as GHCR or Docker Hub. The secret must use the ECS repository credentials JSON format. Not needed for public images or normal same-account ECR.
+    id: image_registry_credentials_secret_arn
+    label: Registry credentials secret ARN
+    placeholder: arn:aws:secretsmanager:us-east-1:123456789012:secret:registry-creds
+    required: false
+    type: string
+  - description: Each deployment registers a new task definition revision from the template below, in the template's family, and updates the service to use it.
+    id: section_task_definition
+    label: Task definition
+    type: section
+  - description: Name of the container in the task definition template that receives the deployed image. The template image value for this container is overridden at deploy time.
+    id: container_name
+    label: Deploy target container name
+    patterns:
+      - message: "Use 1-255 characters: letters, numbers, hyphens, and underscores only."
+        pattern: ^[A-Za-z0-9_-]{1,255}$
+    placeholder: app
+    required: true
+    type: string
+  - description: Task definition body registered on every deployment, using snake_case keys (family, container_definitions, port_mappings, log_configuration, and so on). Must include family, usually the family the service already uses. The whole template is registered as written, except the image of the deploy target container is replaced at deploy time.
+    id: task_definition_template
+    label: Task definition template
+    placeholder: |-
+      family: my-service
+      container_definitions:
+        - name: app
+          image: overridden-at-deploy-time
+          essential: true
+          port_mappings:
+            - container_port: 80
+              protocol: tcp
+          environment:
+            - name: NODE_ENV
+              value: production
+          secrets:
+            - name: DATABASE_URL
+              value_from: arn:aws:ssm:us-east-1:123456789012:parameter/database-url
+          log_configuration:
+            log_driver: awslogs
+            options:
+              awslogs-group: /ecs/my-service
+              awslogs-region: us-east-1
+              awslogs-stream-prefix: app
+      cpu: "512"
+      memory: "1024"
+      network_mode: awsvpc
+      requires_compatibilities:
+        - FARGATE
+      runtime_platform:
+        cpu_architecture: X86_64
+        operating_system_family: LINUX
+      execution_role_arn: arn:aws:iam::123456789012:role/my-execution-role
+      task_role_arn: arn:aws:iam::123456789012:role/my-task-role
+    required: true
+    type: object
+readme: |
+  Deploys releases to an existing, externally managed Amazon ECS service using a user-provided task definition template.
 
+  ## Overview
+
+  The Existing ECS Service module connects Ravion deployments to an ECS service that was created outside Ravion. There is no infrastructure stack: Ravion does not create, change, or destroy the cluster, service, load balancer, IAM roles, or networking. You provide the AWS account, region, cluster name, service name, and a task definition template.
+
+  On each deployment, Ravion renders the template, replaces the image of the deploy target container with the image provided at deploy time, registers a new task definition revision in the template's family, and updates the existing ECS service to use it.
+
+  ## What you must provide
+
+  | Field                        | Required | Description                                                        |
+  | ---------------------------- | -------- | ------------------------------------------------------------------ |
+  | AWS account                  | Yes      | Connected AWS account that owns the ECS service                    |
+  | Region                       | Yes      | Region where the cluster and service run                           |
+  | Cluster name                 | Yes      | Existing ECS cluster name                                          |
+  | Service name                 | Yes      | Existing ECS service name                                          |
+  | Target group ARN suffix      | No       | Enables load balancer metrics for the service                      |
+  | Image repository             | Yes      | Image repository without a tag or digest                           |
+  | Registry credentials secret ARN | No    | ECS repository credentials secret for private registries           |
+  | Deploy target container name | Yes      | Container in the template that receives the deployed image         |
+  | Task definition template     | Yes      | Full task definition body registered on every deployment           |
+
+  ## Image registry
+
+  Configure the image repository on the module without a tag or digest, such as `nginx`, `ghcr.io/org/app`, or `123456789012.dkr.ecr.us-east-1.amazonaws.com/app`. For private registries such as GHCR or Docker Hub, provide a Secrets Manager secret ARN in the ECS repository credentials JSON format; it is attached to the deploy target container as repository credentials. Same-account ECR repositories normally need no credentials, but the template's execution role must be able to pull the image.
+
+  ## Task definition template
+
+  The template is the task definition body in snake_case, mirroring the ECS RegisterTaskDefinition API: family, container_definitions, cpu, memory, network_mode, requires_compatibilities, runtime_platform, task_role_arn, execution_role_arn, volumes, and so on. It must include family, usually the family the service already uses, so new revisions land in the right place.
+
+  The entire template is passed through as the registered task definition. Ravion overrides only the deploy target container: its image is replaced with the image resolved at deploy time, and registry credentials are attached to it when a registry credentials secret ARN is configured. Everything else, including additional containers and sidecars, is registered exactly as written.
+
+  Because the service and its IAM roles are externally managed, the template must reference an execution role that can pull the deployed image and write to the configured log destination, and a task role with whatever AWS permissions the application needs.
+
+  ## Deployment
+
+  At deploy time, provide only the image tag or digest, such as `v1.2.3` or `sha256:...`. It is resolved in the image repository configured on the module: digests are joined with `@` and tags with `:`. Image builds happen outside Ravion in your own pipeline.
+
+  Deployments are queued one at a time per module instance; stale queued deployments are collapsed in favor of the newest.
+
+  ## Design decisions
+
+  - No stack: the module never owns or mutates the underlying AWS resources, so destroying the module instance leaves the ECS service untouched.
+  - No build: images come from an external pipeline. The registry repository is module configuration; deploys only choose the tag or digest, mirroring the prebuilt-image mode of the ECS Web Server module.
+  - The task definition template is the single source of truth for everything except the deployed image, keeping drift between Ravion and the external service explicit and reviewable.
+  - The Running tasks, Network in, and Network out metrics use ECS Container Insights and only report data when Container Insights is enabled on the cluster.
+  - Load balancer metrics (request count, 4xx/5xx errors, response time, healthy/unhealthy hosts) read from the optional target group ARN suffix and stay empty when it is not configured.
+
+  ## Learn more
+
+  - [Amazon ECS services](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html)
+  - [RegisterTaskDefinition API](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RegisterTaskDefinition.html)
+  - [Amazon ECS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)
+ui:
+  links:
+    - href: https://<<module.input.aws_region>>.console.aws.amazon.com/ecs/v2/clusters/<<module.inpu
... diff truncated ...

Comment on lines +1 to +5
definition:
type: rvn-ecs-existing-service
name: Existing ECS Service
description: Deploys releases to an existing, externally managed Amazon ECS service using a user-provided task definition template.
release:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Root README.md not updated

AGENTS.md explicitly marks it as critical to update the root README.md Module Directory table whenever a module is added, modified, or removed. This PR introduces a new module (rvn-ecs-existing-service) in a new category (modules_without_stack) but the root README.md is not updated, leaving it out of sync.

Context Used: AGENTS.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml
Line: 1-5

Comment:
**Root README.md not updated**

`AGENTS.md` explicitly marks it as critical to update the root `README.md` Module Directory table whenever a module is added, modified, or removed. This PR introduces a new module (`rvn-ecs-existing-service`) in a new category (`modules_without_stack`) but the root `README.md` is not updated, leaving it out of sync.

**Context Used:** AGENTS.md ([source](https://app.greptile.com/flightcontrol/github/flightcontrolhq/modules/-/custom-context?memory=79f61682-c3c3-4650-8eed-6406e8edd026))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +196 to +203
- The Running tasks, Network in, and Network out metrics use ECS Container Insights and only report data when Container Insights is enabled on the cluster.
- Load balancer metrics (request count, 4xx/5xx errors, response time, healthy/unhealthy hosts) read from the optional target group ARN suffix and stay empty when it is not configured.

## Learn more

- [Amazon ECS services](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html)
- [RegisterTaskDefinition API](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RegisterTaskDefinition.html)
- [Amazon ECS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ALB metrics emitted when target group is absent

target_group_arn_suffix is optional (required: false), so it can be empty when the service has no load balancer. The shared ecs-service-metrics.yml template has no conditional guard and always emits all four ALB metrics (request count, 4xx/5xx errors, response time, healthy/unhealthy hosts) with TargetGroup: $with.target_group. When target_group_arn_suffix is blank, the compiled output will carry an empty CloudWatch dimension, which may result in API errors or permanently empty metric graphs visible to users. The README acknowledges this as intentional, but it is worth confirming the platform silently suppresses metrics with nil/empty dimensions rather than surfacing errors.

Prompt To Fix With AI
This is a comment left during a code review.
Path: modules_without_stack/ecs_existing_service/rvn-ecs-existing-service-definition.yml
Line: 196-203

Comment:
**ALB metrics emitted when target group is absent**

`target_group_arn_suffix` is optional (`required: false`), so it can be empty when the service has no load balancer. The shared `ecs-service-metrics.yml` template has no conditional guard and always emits all four ALB metrics (request count, 4xx/5xx errors, response time, healthy/unhealthy hosts) with `TargetGroup: $with.target_group`. When `target_group_arn_suffix` is blank, the compiled output will carry an empty CloudWatch dimension, which may result in API errors or permanently empty metric graphs visible to users. The README acknowledges this as intentional, but it is worth confirming the platform silently suppresses metrics with nil/empty dimensions rather than surfacing errors.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant