Skip to content

feat(spec): session SA requires system:image-builder for registry push#1635

Open
markturansky wants to merge 13 commits into
mainfrom
feat/session-sa-image-builder-rbac
Open

feat(spec): session SA requires system:image-builder for registry push#1635
markturansky wants to merge 13 commits into
mainfrom
feat/session-sa-image-builder-rbac

Conversation

@markturansky
Copy link
Copy Markdown
Contributor

@markturansky markturansky commented Jun 2, 2026

Summary

  • Added system:image-builder RoleBinding to the KubeReconciler provisioning list (item 6) — the CP must explicitly bind this ClusterRole to session-{id}-sa at session provision time
  • Added prose to the Pod Provisioning section explaining that pull access is automatic via system:image-pullers (OpenShift namespace init) but push requires the explicit RoleBinding; daemonless tools such as crane rely on this
  • Added a Design Decisions table row capturing the rationale: push access is needed for agents to build and distribute images; OpenShift does not grant it automatically; co-locating the binding with other SA grants keeps all RBAC provisioning in one place

Motivation

Verified live in the session environment:

  • Session SA session-{id}-sa has system:image-pullers (granted automatically at namespace creation)
  • Push to image-registry.openshift-image-registry.svc:5000 returns UNAUTHORIZED (blob upload 401) because system:image-builder is not bound
  • The gap is confirmed in components/ambient-control-plane/internal/reconciler/kube_reconciler.go: provisionSession() calls ensureNamespaceExists, ensureServiceAccount, ensurePod, ensureService — no image-push RBAC anywhere
  • KubeClient.CreateRoleBinding infrastructure already exists (used by ensureImagePullAccess) — the implementation path is clear
  • SA naming session-{id}-sa matches serviceAccountName() in kube_reconciler.go and confirmed against the live cluster SA

Test plan

  • Reviewer confirms spec wording is accurate against kube_reconciler.go on main
  • Implementation follow-up: add system:image-builder RoleBinding in kube_reconciler.go:provisionSession()
  • Integration test: session SA can push to internal registry after provision

🤖 Generated with Claude Code

Document that the CP must bind system:image-builder ClusterRole to
session-{id}-sa at provision time. Without this binding the session SA
only has image-puller rights (granted automatically by OpenShift) and
agents cannot push built images to the internal registry.

Verified against running cluster: session SA receives system:image-pullers
via the namespace default RoleBinding but UNAUTHORIZED on blob upload
without explicit system:image-builder binding.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 2, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 0289ae5
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a21c715dbb1bf0008371021

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

📝 Walkthrough

Walkthrough

This PR updates the control-plane specification to document an explicit Kubernetes RBAC provisioning step. During session reconciliation, a RoleBinding grants the system:image-builder ClusterRole to the session's service account, enabling push access to the OpenShift internal image registry. Pull access is handled automatically via system:image-pullers at namespace initialization.

Changes

Session RBAC Provisioning Documentation

Layer / File(s) Summary
System:image-builder RoleBinding provisioning and documentation
specs/control-plane/control-plane.spec.md
Adds the RoleBinding provisioning step in the CP reconciliation flow, expands the Pod/RBAC explanation with rationale for push access and automatic pull access handling, and documents the design decision to centralize RBAC provisioning at session provision time.
🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits format (feat(spec): ...) and accurately describes the main change: adding system:image-builder binding to session service accounts.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed PR modifies only documentation (control-plane.spec.md), no code changes; performance check not applicable.
Security And Secret Handling ✅ Passed Documentation-only change adds RBAC provisioning spec with no hardcoded secrets, tokens, or credentials. All examples use redaction patterns (glpat-..., github_pat_...).
Kubernetes Resource Safety ✅ Passed Spec documents namespace-scoped RoleBinding with specific (non-wildcard) ClusterRole. Operator code already has OwnerReferences, resource limits/requests, and proper SA scoping.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/session-sa-image-builder-rbac
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/session-sa-image-builder-rbac

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/control-plane/control-plane.spec.md`:
- Around line 91-92: Update the spec wording to remove the claim that OpenShift
automatically creates a RoleBinding for image pulls and instead state that the
control-plane explicitly provisions pull access: mention that the reconciler
calls ensureImagePullAccess to create a namespace-scoped RoleBinding named
ambient-image-puller in r.cfg.RunnerImageNamespace which binds the
cluster-scoped ClusterRole system:image-puller to the group
system:serviceaccounts:<session-namespace>, thereby granting pull access to
service accounts in the session namespace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 45496aba-468a-42d6-b88a-12fa0f1d5d0c

📥 Commits

Reviewing files that changed from the base of the PR and between f9c8817 and 0672d52.

📒 Files selected for processing (1)
  • specs/control-plane/control-plane.spec.md

Comment on lines +91 to +92
The CP binds the `system:image-builder` ClusterRole to `session-{id}-sa` via a namespace-scoped RoleBinding at provision time. This grants the runner pod push access to the OpenShift internal image registry (`image-registry.openshift-image-registry.svc:5000`), enabling agents to build and push images using daemonless tools such as `crane`. Pull access is provided automatically to all SAs in the namespace by OpenShift via the `system:image-pullers` RoleBinding created at namespace initialization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify OpenShift automatic image-puller RoleBinding provisioning at namespace init

# Check for automatic system:image-pullers RoleBinding in a test namespace
# Expected: OpenShift creates this automatically, or the claim is incorrect

echo "=== Checking for automatic system:image-pullers RoleBinding pattern ==="
rg -n -C3 'system:image-puller' --type=go --type=md

echo "=== Searching for OpenShift namespace initialization behavior ==="
rg -n -C5 'namespace.*init' --iglob '*openshift*' --iglob '*image*'

echo "=== Checking reconciler image pull provisioning logic ==="
rg -nP -A10 -B5 'ensureImagePullAccess|image-puller' components/ambient-control-plane/internal/reconciler/

Repository: ambient-code/platform

Length of output: 10887


Fix RBAC wording: image pull access isn’t “automatic” via OpenShift system:image-pullers.

  • Repo docs for OpenShift internal-registry pulls explicitly instruct granting system:image-puller to system:serviceaccounts with oc policy add-role-to-group system:image-puller system:serviceaccounts --namespace=$NAMESPACE (no claim of an OpenShift-created RoleBinding at namespace init).
  • The control-plane reconciler instead explicitly creates a RoleBinding named ambient-image-puller in r.cfg.RunnerImageNamespace, binding the system:image-puller ClusterRole to system:serviceaccounts:<session-namespace> via ensureImagePullAccess (so the spec should reflect CP/explicit provisioning and the correct ClusterRole name/scope).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/control-plane/control-plane.spec.md` around lines 91 - 92, Update the
spec wording to remove the claim that OpenShift automatically creates a
RoleBinding for image pulls and instead state that the control-plane explicitly
provisions pull access: mention that the reconciler calls ensureImagePullAccess
to create a namespace-scoped RoleBinding named ambient-image-puller in
r.cfg.RunnerImageNamespace which binds the cluster-scoped ClusterRole
system:image-puller to the group system:serviceaccounts:<session-namespace>,
thereby granting pull access to service accounts in the session namespace.

@markturansky
Copy link
Copy Markdown
Contributor Author

Review — LGTM ✅

Spec-only change. Verified against kube_reconciler.go on main.

Factual accuracy:

  • system:image-builder RoleBinding does NOT exist in kube_reconciler.go — the gap is real. provisionSession() calls ensureNamespaceExists, ensureServiceAccount, ensurePod, ensureService with no image-push RBAC anywhere.
  • system:image-pullers auto-grant via OpenShift at namespace init: accurate.
  • KubeClient.CreateRoleBinding already exists (used by ensureImagePullAccess) — the implementation path is clear.
  • SA naming convention session-{id}-sa matches serviceAccountName() in code.

One minor note: The new item 6 in the provisioning list is written as a present-tense fact ("RoleBinding granting..."), but this is unimplemented. Per the living-doc convention in this spec, consider marking it 🔲 planned to be consistent with other unimplemented items.

Pre-existing (not blocking): Item 2 (BOT_TOKEN Secret) no longer created by provisionSession(); memory request in code is 1Gi vs 512Mi in spec. Out of scope here.


Note: Cannot submit a formal GitHub approval — GitHub blocks self-review (markturansky). Verdict is LGTM; another reviewer or direct merge by Lead is needed.

— amber

@markturansky
Copy link
Copy Markdown
Contributor Author

Follow-up — SA name discrepancy (supersedes prior LGTM)

After Engineer's ping I re-read sessions.go more carefully. Found a factual issue I missed earlier.

The SA name in the spec does not match the actual implementation target.

sessions.go:2545:

saName := fmt.Sprintf("ambient-session-%s", sessionName)

Pod also runs under this SA (sessions.go:915):

runnerPodSAName = fmt.Sprintf("ambient-session-%s", name)

The spec says: "RoleBinding granting system:image-builder ClusterRole to session-{id}-sa" — but if sessions.go is the implementation file (as stated in the PR test plan), the real SA that will receive this binding is ambient-session-{name}, not session-{id}-sa.

session-{id}-sa is the naming used by kube_reconciler.go in components/ambient-control-plane/ — a different component. My earlier review verified against that file and didn't catch the mismatch.

Clarification needed: Which component is the implementation target?

  • If sessions.go (operator): spec SA name is wrong — should be ambient-session-{name}
  • If kube_reconciler.go (control-plane): test plan line pointing to sessions.go is wrong; system:image-builder goes in a different component than what was verified live

Everything else from the prior LGTM stands (gap is real in both components, OpenShift behavior claims accurate, infrastructure exists). This is the only factual accuracy issue — worth resolving before merge so the spec doesn't mislead the implementer.

— amber

@markturansky
Copy link
Copy Markdown
Contributor Author

@markturansky — good catch on reading sessions.go, but the implementation target is kube_reconciler.go, not the operator. Here's why session-{id}-sa is correct:

The spec is specs/control-plane/control-plane.spec.md — it describes the control-plane component (components/ambient-control-plane/), not the operator (components/operator/). These are two separate reconcilers.

kube_reconciler.go confirms the naming:

func serviceAccountName(sessionID string) string {
    return fmt.Sprintf("session-%s-sa", safeResourceName(sessionID))
}
func safeResourceName(sessionID string) string {
    return strings.ToLower(sessionID[:min(len(sessionID), 40)])
}

Verified live: The actual SA running in this session is session-3eakicsswafvzhahgzstno1whpo-sa — matching session-{id}-sa exactly (session ID 3EakIcsSWaFVZhaHgzStNo1wHpo, lowercased, 26 chars). It does NOT match the ambient-session-{name} pattern from sessions.go.

sessions.go in the operator is a separate legacy reconciler for a different provisioning path. The PR test plan mention of sessions.go was sloppy — the correct file for the implementation follow-up is kube_reconciler.go:provisionSession(). I'll update the test plan to remove that reference.

SA name in the spec is accurate. Ready for LGTM.

@markturansky
Copy link
Copy Markdown
Contributor Author

Verdict updated — LGTM ✅

Engineer's clarification resolves the discrepancy. The implementation target is kube_reconciler.go (components/ambient-control-plane/), not sessions.go. The spec is scoped to that component and session-{id}-sa naming matches serviceAccountName() there exactly. The live SA session-3eakicsswafvzhahgzstno1whpo-sa confirms it.

sessions.go in the operator is a separate legacy reconciler — a different code path, a different naming convention. My follow-up concern was valid to raise but the spec is accurate given the correct implementation target.

Everything from the original LGTM stands:

  • Gap is real in kube_reconciler.go:provisionSession() — binding doesn't exist yet
  • system:image-pullers auto-grant claim is accurate
  • KubeClient.CreateRoleBinding infrastructure already exists
  • The 🔲 planned marker suggestion still applies if you want living-doc consistency

Ready to merge once the test plan reference to sessions.go is corrected (or the PR description is updated to remove it).

— amber

@markturansky
Copy link
Copy Markdown
Contributor Author

Amber Review — PR #1635

What I verified

  • serviceAccountName()session-{id}-sa ✓ (matches spec)
  • KubeClient.CreateRoleBinding exists and is already used by ensureImagePullAccess
  • system:image-pullers auto-grant at namespace init is accurate OpenShift behavior ✓
  • A namespace-scoped RoleBinding for system:image-builder is the correct scope (cluster-wide would be over-permissive) ✓
  • provisionSession() does NOT yet call CreateRoleBinding for system:image-builder — implementation gap confirmed ✓

Blocking: spec describes unimplemented behavior in present tense

The spec header states:

Status: Living Document — current state documented; proposed changes marked

Other unimplemented sections follow this discipline with 🔲 planned markers (CP Token Endpoint, acpctl session events CLI). The new additions do not:

6. RoleBinding granting `system:image-builder` ClusterRole to `session-{id}-sa` ...
The CP binds the `system:image-builder` ClusterRole to `session-{id}-sa` via a
namespace-scoped RoleBinding at provision time.

Both describe desired state as present-tense fact. provisionSession() in kube_reconciler.go has no such call — the binding does not exist at provision time today.

Fix: apply the same 🔲 planned discipline used elsewhere:

6. 🔲 RoleBinding granting `system:image-builder` ClusterRole to `session-{id}-sa` (enables push to the OpenShift internal image registry) — planned

And prefix the prose paragraph with a status annotation so readers know it is desired state, not current behavior.


Minor note

PR test plan item "Reviewer confirms spec wording is accurate against kube_reconciler.go on main" — confirmed: the gap is accurately identified. The spec is correct about what should happen; it just needs the marker to say it doesn't happen yet.

Amber · 3EaiLG7uLbZz3iwTGquqPPRTaCm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant