Skip to content

feat(providers): add dynamic identity sources to Oauth2TokenExchange strategy #1736

@usize

Description

@usize

Problem Statement

PR #1681 adds an Oauth2TokenExchange refresh strategy that sends standard RFC 8693 form fields with configurable token_url, client_id, audience, scope, and subject_token.

However, the subject_token is captured once at configuration time as a static string. When it expires, the exchange breaks. There also appears to be no actor_token for delegation chains via act claims (RFC 8693 §4.1), no client-assertion path for secret-free STS authentication, and no way to source a per-sandbox SPIFFE SVID as the subject identity.

This limits the strategy to short-lived interactive sessions and prevents integration with workload identity systems (#1665, PR #1414) where the sandbox proves its identity via attestation rather than a captured user token.

See review comment on PR #1681: #1681 (comment)
See related discussion on #896: #896 (comment)

Proposed Design

Extend the Oauth2TokenExchange strategy incrementally with new identity source types, delivered in phases rather than a single abstraction:

Phase 1:

Add client_assertion support (private_key_jwt client auth). This is orthogonal to subject_token sourcing and follows the existing pattern: store a PEM key in material then sign a JWT at mint time, like GoogleServiceAccountJwt.

For SPIFFE-federated IdPs, the assertion type would be urn:ietf:params:oauth:client-assertion-type:jwt-spiffe.

Phase 2:

Add OIDC refresh token persistence for caller_oidc. The gateway stores the user's refresh token and uses it to obtain a fresh access token before each exchange, rather than relying on a static access token. (Requires security review of stored credential model.)

Phase 3:

Add spiffe_svid source. The sandbox's own JWT-SVID becomes the subject_token or client assertion for outbound federation. Depends on PR #1414 merging and #1665's per-sandbox SVID infrastructure. Also requires designing the SVID-to-gateway data flow (sandbox pushes SVID to gateway via relay, or gateway requests it).

Desired end state:

Which identity source fills which RFC 8693 role (subject_token, actor_token, client_assertion) is a profile-level configuration decision where the exchange function resolves configured sources at mint time and assembles the standard form.

Prior art:

AuthBridge implements this pattern as an external sidecar SPIFFE JWT-SVIDs as client assertions against Keycloak, with route-based audience mapping. This sidecar approach works but is redundant given OpenShell's existing sandbox proxy and credential refresh infrastructure.

Alternatives Considered

  • External sidecar proxy (AuthBridge). Requires its own operator, a container per workload, and iptables init containers for transparent interception. Redundant given OpenShell's existing proxy and credential refresh worker.
  • Per-IdP provider implementations (feat(auth): support live Okta OBO token exchange #1681 for Okta, Feat microsoft provider v2 #1424 for Entra). Each reimplements the same RFC 8693 flow with different configuration. Source indirection subsumes these with configuration rather than code.
  • L7 middleware (feat(sandbox): introduce middleware layer for L7 proxy request transformations #1694) for per-request exchange. Viable for caller_oidc where user identity must come from the current request. Complementary to the refresh worker path, not a replacement — the worker handles background renewal while middleware handles request-scoped identity.
  • Full source abstraction upfront (oneof source { static | caller_oidc | spiffe_svid }). Premature — the use cases differ enough that the right abstraction shape isn't visible yet. Better to add sources as concrete variants and extract the pattern after two or three exist.

Agent Investigation

Investigated the credential refresh lifecycle, SPIFFE infrastructure, and outbound request path across the gateway, sandbox, and provider subsystems.

Key findings:

  • mint_oauth2_token_exchange in provider_refresh.rs reads subject_token from state.material (a static HashMap<String, String>) and sends the standard RFC 8693 form via request_token(). Source resolution would insert before the existing required_material call.
  • The refresh worker (run_refresh_worker_tick) runs as a background timer loop with access to the Store but no request context, no Principal, and no sandbox identity. Dynamic sources must be resolvable from this context.
  • SandboxIdentitySource::SpiffeSvid exists as a reserved enum variant in principal.rs but is never constructed. PR feat(auth): add SPIFFE supervisor authentication #1414 adds SPIFFE auth for supervisor-to-gateway but is not yet merged.
  • No SPIFFE SVID fetching capability exists in the sandbox supervisor today. The gateway does not hold per-sandbox SVIDs. Using a sandbox's SVID as subject_token requires a new data flow: either the sandbox pushes its SVID to the gateway via the relay channel, or the gateway requests it via the supervisor gRPC connection.
  • client_assertion (private_key_jwt) follows the same pattern as GoogleServiceAccountJwt — store a PEM key, sign a JWT at mint time. Lowest complexity addition.
  • The request_token function already supports optional basic auth. Adding a client_assertion auth mode is structurally similar.
  • Test patterns use wiremock::MockServer for token endpoints and in-memory SQLite stores. New sources should follow the same pattern.

Complexity: High overall (spans proto, profiles, refresh worker, configure handler, and potentially sandbox relay). Medium if scoped to Phase 1 (client_assertion only). Phase 3 (spiffe_svid) depends on unmerged infrastructure and needs its own spike.

Confidence: Medium — Phase 1 has a clear path. Phases 2-3 have open architectural questions about where identity material lives and how it reaches the refresh worker.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions