Skip to content

feat(manifests): rewrite SaaS templates to match hcmais deployment#1659

Open
jsell-rh wants to merge 3 commits into
mainfrom
jsell/update-hcmais-templates
Open

feat(manifests): rewrite SaaS templates to match hcmais deployment#1659
jsell-rh wants to merge 3 commits into
mainfrom
jsell/update-hcmais-templates

Conversation

@jsell-rh
Copy link
Copy Markdown
Contributor

@jsell-rh jsell-rh commented Jun 5, 2026

Summary

Replaces legacy OpenShift SaaS templates (from PR #1642) with templates matching the actual hcmais deployment (ambient-api namespace).

Removed: template-operator.yaml (backend-api, frontend, operator, public-api, CRDs, ConfigMaps — none deployed on hcmais)

Rewritten: template-services.yaml now contains only:

  • ambient-api-server — JWT+gRPC, init migration, credential encryption env vars
  • ambient-api-server-db — RHEL PostgreSQL 16, emptyDir, Recreate strategy
  • ambient-control-plane — runner/sidecar image config (all parameterized), Vertex AI
  • ambient-ui — SSO from sso-credentials Secret, readOnlyRootFilesystem

All values parameterized: IMAGE_TAG, KEYCLOAK_REALM_URL, route hostnames, SSO callback, Vertex project ID. CRDs dropped (already in kustomize base/crds/).

Secrets (provisioned separately)

ambient-api-server-db, ambient-control-plane-token, ambient-cp-token-keypair, ambient-vertex, ambient-anthropic, credential-encryption-key (optional), sso-credentials

Test plan

  • bash components/manifests/templates/validate.sh — passes
  • oc process with test params produces valid manifests matching live hcmais state
  • Matt Knop review (original template author)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Restructured deployment templates to focus on core services: API server, control plane, and UI components.
    • Enhanced security configurations with initialization procedures and improved access control management.
    • Updated service parameters and routing configuration for improved deployment flexibility and connectivity.

jsell-rh and others added 3 commits June 5, 2026 13:36
Replace legacy templates (backend-api, frontend, operator, public-api)
with the actual stack running on hcmais (ambient-api namespace):

- ambient-api-server: JWT+gRPC, init migration, credential encryption
- ambient-api-server-db: RHEL PostgreSQL 16, emptyDir
- ambient-control-plane: runner/sidecar image config, Vertex AI
- ambient-ui: SSO credentials, readOnlyRootFilesystem

All configuration parameterized — IMAGE_TAG, KEYCLOAK_REALM_URL,
route hostnames, SSO callback, Vertex project. CRDs dropped from
templates (already in kustomize base/crds/).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 5, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 5ff1e42
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a23157af52ddf000856fd78

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

📝 Walkthrough

Walkthrough

The PR removes the operator template entirely and restructures the services template to consolidate database, API server, control plane, and UI deployments with updated security contexts, init containers, environment variable wiring for JWT/Keycloak/Vertex, routes, and template parameters. The validation script is updated to validate only the services template with expanded parameter arguments.

Changes

Kubernetes Manifest Consolidation

Layer / File(s) Summary
Database infrastructure and template metadata
components/manifests/templates/template-services.yaml
Template name/description updated; PostgreSQL database Deployment and Service defined with image selection, secret-backed credentials, resource requests/limits, probes, and pod/container security context.
API server deployment with migrations and auth wiring
components/manifests/templates/template-services.yaml
API server Deployment adds serviceAccountName, security context, and init container for database migrations; container spec redefined with JWT/JWKS/keyring environment variables, DB connectivity flags, volume mounts for secrets/configs, updated probes, and resource limits; Service updated to match declared ports.
Control plane and UI deployments
components/manifests/templates/template-services.yaml
Control plane Deployment rewritten to wire platform config, API/gRPC endpoints, tokens, MCP/credential images, and Vertex project/region via environment variables with updated security settings; UI Deployment updated with new container image/env variables, volume mounts, and read-only root filesystem security context.
OpenShift routes for API server and UI
components/manifests/templates/template-services.yaml
Route resources defined for API server and UI with service backends, route hostnames, TLS edge termination, and insecure redirect behavior.
Template parameters and validation script
components/manifests/templates/template-services.yaml, components/manifests/templates/validate.sh
Template parameters replaced with new set (image names for components, required IMAGE_TAG, namespace, Keycloak realm URL, route hostnames, OIDC redirect URI, Vertex project/region, preview allowed hosts); validate.sh narrowed to template-services.yaml only and expanded to pass parameters explicitly (--local > /dev/null).

Possibly related PRs

  • ambient-code/platform#1450: Both PRs modify Kubernetes ambient-api-server and control-plane Deployment specs by adjusting securityContext, initContainer configuration, and Vertex/runner environment variable wiring.
  • ambient-code/platform#1642: This PR directly rewrites the OpenShift templates introduced in #1642, removes template-operator.yaml, and updates validate.sh which previously validated both templates in #1642.

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error Line 233: --enable-authz=false disables authorization, violating the security requirement for auth/authz on API endpoints. ServiceAccounts defined without RBAC creates privilege escalation risk. Enable authz by removing --enable-authz=false or setting it to true. Add ClusterRoles/ClusterRoleBindings for service accounts with least-privilege policies.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title follows Conventional Commits format (feat(manifests): ...) and accurately describes the main change: rewriting SaaS templates to match the hcmais deployment.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed PR modifies K8s YAML manifests only, not algorithmic code. No performance issues detected: resource limits configured, connection pooling at 50, no N+1 patterns or unbounded operations.
Kubernetes Resource Safety ✅ Passed All containers have resource limits/requests; proper namespace scoping via ${NAMESPACE}; pod security contexts defined (runAsNonRoot, seccompProfile); no RBAC with wildcard verbs/resources.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jsell/update-hcmais-templates
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch jsell/update-hcmais-templates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jsell-rh jsell-rh requested review from maknop and markturansky June 5, 2026 18:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/manifests/templates/template-services.yaml`:
- Line 252: The template currently sets an extremely verbose log flag "-v=10"
which is too high for production; change it to a safer default (e.g., match the
base "-v=4") and make it configurable instead of hard-coded. Locate the "-v=10"
entry in template-services.yaml and replace it with a parameterized log level
(use a values/config variable such as logLevel or verbosity) and ensure the
template falls back to a sensible default (4) when the config value is not
provided.
- Around line 191-195: The migration init container ("migration") and the main
API container ("api-server") are missing readOnlyRootFilesystem in their
securityContext; add readOnlyRootFilesystem: true to both containers'
securityContext blocks, and if either binary needs writable paths, create
corresponding emptyDir volumes and mount them only at those specific writable
paths (mirror the UI deployment pattern for ephemeral writable mounts) so the
rest of the filesystem remains read-only.
- Around line 112-114: The manifest defines a volume named "data" using emptyDir
(volumes -> name: data -> emptyDir) which makes PostgreSQL data ephemeral;
replace or parameterize this with a persistentVolumeClaim
(persistentVolumeClaim: claimName: ambient-api-server-db-data) so the DB uses
the same PVC as the base manifest, or add a clear comment and a template/flag to
choose between emptyDir for previews and a PVC for production to avoid data loss
on pod restart.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4c718f9f-c57c-40da-a76b-ab9ad0a3e55a

📥 Commits

Reviewing files that changed from the base of the PR and between cd9d5d9 and 5ff1e42.

📒 Files selected for processing (3)
  • components/manifests/templates/template-operator.yaml
  • components/manifests/templates/template-services.yaml
  • components/manifests/templates/validate.sh
💤 Files with no reviewable changes (1)
  • components/manifests/templates/template-operator.yaml

Comment on lines +112 to +114
volumes:
- name: data
emptyDir: {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Database uses emptyDir—data lost on pod restart.

Per context snippet 1, the base manifest uses a PVC (persistentVolumeClaim: claimName: ambient-api-server-db-data). This template uses emptyDir, meaning all PostgreSQL data is ephemeral.

If this is intentional for the SaaS preview environment, consider adding a comment or providing a parameterized option for persistent storage in production scenarios.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` around lines 112 -
114, The manifest defines a volume named "data" using emptyDir (volumes -> name:
data -> emptyDir) which makes PostgreSQL data ephemeral; replace or parameterize
this with a persistentVolumeClaim (persistentVolumeClaim: claimName:
ambient-api-server-db-data) so the DB uses the same PVC as the base manifest, or
add a clear comment and a template/flag to choose between emptyDir for previews
and a PVC for production to avoid data loss on pod restart.

Comment on lines +191 to +195
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

API server containers missing readOnlyRootFilesystem.

Per coding guidelines, all containers should have readOnlyRootFilesystem: true. Both the migration init container and the api-server container are missing this.

If the binary writes to the filesystem, add emptyDir volumes for writable paths (similar to the UI deployment pattern at lines 560-564).

🛡️ Suggested fix
           securityContext:
+              readOnlyRootFilesystem: true
               allowPrivilegeEscalation: false
               capabilities:
                 drop:
                   - ALL

Apply to both the init container (line 191) and main container (line 297) security contexts.

Also applies to: 297-301

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` around lines 191 -
195, The migration init container ("migration") and the main API container
("api-server") are missing readOnlyRootFilesystem in their securityContext; add
readOnlyRootFilesystem: true to both containers' securityContext blocks, and if
either binary needs writable paths, create corresponding emptyDir volumes and
mount them only at those specific writable paths (mirror the UI deployment
pattern for ephemeral writable mounts) so the rest of the filesystem remains
read-only.

- --cors-allowed-headers=X-Ambient-Project
- --grpc-server-bindaddress=:9000
- --alsologtostderr
- -v=10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verbose logging -v=10 in production template.

Log level 10 is very verbose. Context snippet 3 shows base uses -v=4. High verbosity can impact performance and storage costs. Consider parameterizing or defaulting to a lower level.

♻️ Suggested fix
-              - -v=10
+              - -v=4
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- -v=10
- -v=4
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` at line 252, The
template currently sets an extremely verbose log flag "-v=10" which is too high
for production; change it to a safer default (e.g., match the base "-v=4") and
make it configurable instead of hard-coded. Locate the "-v=10" entry in
template-services.yaml and replace it with a parameterized log level (use a
values/config variable such as logLevel or verbosity) and ensure the template
falls back to a sensible default (4) when the config value is not provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant