You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inside every sandbox the supervisor does a lot at once: it runs the proxy, evaluates policy, resolves identity, audits, manages the agent process, and builds the isolation boundary itself (it creates the network namespace and routing, which is why the agent's environment is granted elevated privileges). The driver provisions the environment the supervisor runs in, but boundary-building is the supervisor's own inline code, not a separate component it talks to.
Those are really two different concerns: the policy authority (decide what the agent may do, mediate it) should be stable across environments, while the boundary machinery (create the namespace and routing, or whatever a given environment requires) is privileged and environment-specific, and should be free to vary. RFC 0001 describes the other major subsystems as driver-backed contracts and even names an Isolation Backend for this one, but the contract was never defined; it's only a box on the diagram. So changing how the boundary is built means changing the supervisor itself.
Extensibility. Safer topologies (a separate pod, VM, or node agent) require changing the supervisor itself, not just adding a backend. This is already concrete: the split-pod proposal (Proposal: Split Supervisor and Agent into Separate Pods with gVisor Isolation #981) has to wire its topology directly into the Kubernetes driver, and the next topology would have to do the same again.
This issue asks whether we should define that interface, drawing the line between policy and mechanism, starting with the network boundary, before anyone turns it into an RFC.
Proposed Design
One possible shape is to draw that line between policy and mechanism: define an interface so the boundary becomes a pluggable Isolation Backend the supervisor drives. The backend could eventually cover more of the containment envelope (filesystem, syscall, process identity); this proposal suggests starting with the network boundary and considering the other dimensions later, but that scoping is one of the questions below.
At a sketch level, the shape looks like this:
provisioning happens at sandbox creation, through the driver (control plane);
runtime mediation stays with the supervisor (data plane), the policy authority for proxy, policy, identity, and audit;
the Isolation Backend interface sits between them; the two hand off through what provisioning leaves behind, rather than calling each other.
The goal would be for the supervisor to drive one common runtime interface while the backend's provisioning varies by environment, so the same supervisor-facing interface works whether the boundary is an in-pod network namespace or a separate pod, VM, or node agent. Today's in-container setup would become the in-pod backend; a delegated backend could build the boundary outside the agent's container, so the untrusted container would no longer need the network-boundary privileges that build its own containment boundary.
Whatever the backend, two candidate invariants would keep the interface grounded in concrete safety properties: no unguarded agent-workload egress before the boundary is verified and ready, and no agent-workload execution before it's ready.
Moving the setup out of the agent's container reduces the privilege in that container, but it doesn't make the boundary trustworthy on its own: once the backend builds the boundary somewhere the supervisor can't see, the supervisor still has to verify it was realized as admitted and fail closed if not, rather than trust an unauthenticated report. The interface has to make that verification possible, which is why this is a contract, not just a relocation.
The working assumption is that the backend would be chosen by deployment and admission configuration, set by the operator, not by the workload, so an untrusted workload can't select weaker isolation for itself (worth confirming this is the right place for that choice).
A full design would still need to work out the trust model a delegated backend depends on: an authenticated, verifiable handoff from the control plane to the supervisor (so the supervisor knows the runtime it talks to is the one provisioned for this sandbox), and the control-plane authorization underneath it. This issue just names those as things the RFC would have to cover, not solve here.
I've sketched a more detailed runtime contract and poked through the codebase to sanity-check feasibility, and I'm happy to share that if it's useful. I've kept this issue light on purpose: it seems worth agreeing this is the right interface to define before anyone invests in a full RFC.
a later delegated backend could build the network boundary outside the agent's container, so that container no longer needs the network-boundary privileges (one prerequisite for restricted Pod Security; other privileges may need follow-on work)
the boundary's privilege sits with the untrusted workload
the network-boundary privilege could move out of the agent container, out of a compromise's reach, assuming the delegated handoff is verified
Extensibility
each new topology means rewriting the supervisor
VM, separate-pod, and node-agent backends could land behind one interface, without changing the supervisor
A natural place to start would be the in-pod backend: wrap today's network setup behind the interface with the same privileges, startup order, behavior, and tests, then add delegated backends later. But the question for this issue is whether that's the right direction to head, not how to sequence it.
How this fits with work in flight
This isn't a new track; it names a boundary several efforts already press against from different sides.
docs(rfc): propose sandbox proxy egress adapter model #1511 scopes the proxy pipeline above the boundary and notes the nftables rules "belong to the sandbox network boundary." This names and scopes that boundary beneath the proxy, the thing that forces traffic into it. The two are closely related (and how they relate is question 2 below).
RFC 0005: Platform-managed Kubernetes sandboxes #1680 (platform-managed Kubernetes via Agent Sandbox) is one way a backend's provisioning could be arranged; the interface defines what the supervisor then attaches to.
Proposal: Split Supervisor and Agent into Separate Pods with gVisor Isolation #981 (the split-pod / gVisor proposal) is the closest neighbor, and complementary rather than competing: it designs one concrete delegated topology, and this interface could be what it fits behind. The cross-pod problems it works through (identity across pods, CA/trust handoff, NetworkPolicy/CNI enforcement) are exactly the kind of thing the interface's contract would need to name.
It also preserves the merged foundations: RFC 0001 (which draws the box and makes the supervisor the policy authority), RFC 0002 (agent-proposed policy stays the runtime authority; the backend is read-only at runtime), and RFC 0004 (typed resources; the backend references them, it doesn't redefine them).
Each of these efforts runs into the same missing interface from a different side. Defining it once could give them a shared contract to build on, instead of each one working around its absence.
Does "the supervisor drives a pluggable backend, in-pod first, delegated later" sound like the right direction?
Is network-first the right scope to start with, or should the first RFC also define the filesystem, syscall, and process-identity responsibilities of the Isolation Backend?
I'd especially welcome corrections from folks closer to this code path.
A short summary of what I found in the code; happy to share the longer write-up.
RFC 0001 specifies four driver-backed subsystems as contracts (only ComputeDriver is realized as a gRPC service today), but leaves the Isolation Backend as a box with no interface.
The supervisor builds the boundary at startup (openshell-sandbox: create netns, install rules, then spawn the agent into it). The agent's container is granted NET_ADMIN, SYS_ADMIN, SYS_PTRACE, SYSLOG, plus SETUID/SETGID/DAC_READ_SEARCH under user namespaces, for the boundary setup plus the supervisor's process-management and identity-resolution duties there.
The egress boundary is the netns + routing (the agent's traffic routes to the proxy, the only listener it can reach, given the host does not forward the sandbox subnet); nftables adds fast-fail and bypass logging on top.
The supervisor already separates a process spec from the boundary handle when it spawns the agent, so wrapping today's path as the in-pod backend looks like a real but bounded refactor, not a rewrite.
Verified against main at b7ce0be4.
Checklist
I've reviewed existing issues and the architecture docs
This is a design proposal, not a "please build this" request
Problem Statement
Inside every sandbox the supervisor does a lot at once: it runs the proxy, evaluates policy, resolves identity, audits, manages the agent process, and builds the isolation boundary itself (it creates the network namespace and routing, which is why the agent's environment is granted elevated privileges). The driver provisions the environment the supervisor runs in, but boundary-building is the supervisor's own inline code, not a separate component it talks to.
Those are really two different concerns: the policy authority (decide what the agent may do, mediate it) should be stable across environments, while the boundary machinery (create the namespace and routing, or whatever a given environment requires) is privileged and environment-specific, and should be free to vary. RFC 0001 describes the other major subsystems as driver-backed contracts and even names an Isolation Backend for this one, but the contract was never defined; it's only a box on the diagram. So changing how the boundary is built means changing the supervisor itself.
That coupling costs us three ways:
This issue asks whether we should define that interface, drawing the line between policy and mechanism, starting with the network boundary, before anyone turns it into an RFC.
Proposed Design
One possible shape is to draw that line between policy and mechanism: define an interface so the boundary becomes a pluggable Isolation Backend the supervisor drives. The backend could eventually cover more of the containment envelope (filesystem, syscall, process identity); this proposal suggests starting with the network boundary and considering the other dimensions later, but that scoping is one of the questions below.
At a sketch level, the shape looks like this:
The goal would be for the supervisor to drive one common runtime interface while the backend's provisioning varies by environment, so the same supervisor-facing interface works whether the boundary is an in-pod network namespace or a separate pod, VM, or node agent. Today's in-container setup would become the in-pod backend; a delegated backend could build the boundary outside the agent's container, so the untrusted container would no longer need the network-boundary privileges that build its own containment boundary.
Whatever the backend, two candidate invariants would keep the interface grounded in concrete safety properties: no unguarded agent-workload egress before the boundary is verified and ready, and no agent-workload execution before it's ready.
Moving the setup out of the agent's container reduces the privilege in that container, but it doesn't make the boundary trustworthy on its own: once the backend builds the boundary somewhere the supervisor can't see, the supervisor still has to verify it was realized as admitted and fail closed if not, rather than trust an unauthenticated report. The interface has to make that verification possible, which is why this is a contract, not just a relocation.
The working assumption is that the backend would be chosen by deployment and admission configuration, set by the operator, not by the workload, so an untrusted workload can't select weaker isolation for itself (worth confirming this is the right place for that choice).
A full design would still need to work out the trust model a delegated backend depends on: an authenticated, verifiable handoff from the control plane to the supervisor (so the supervisor knows the runtime it talks to is the one provisioned for this sandbox), and the control-plane authorization underneath it. This issue just names those as things the RFC would have to cover, not solve here.
I've sketched a more detailed runtime contract and poked through the codebase to sanity-check feasibility, and I'm happy to share that if it's useful. I've kept this issue light on purpose: it seems worth agreeing this is the right interface to define before anyone invests in a full RFC.
What this could enable
A natural place to start would be the in-pod backend: wrap today's network setup behind the interface with the same privileges, startup order, behavior, and tests, then add delegated backends later. But the question for this issue is whether that's the right direction to head, not how to sequence it.
How this fits with work in flight
This isn't a new track; it names a boundary several efforts already press against from different sides.
sandboxintoprocessandnetworksubcrates. #1650 splits the supervisor's process and network responsibilities; this interface sits under the network side of that split.It also preserves the merged foundations: RFC 0001 (which draws the box and makes the supervisor the policy authority), RFC 0002 (agent-proposed policy stays the runtime authority; the backend is read-only at runtime), and RFC 0004 (typed resources; the backend references them, it doesn't redefine them).
Each of these efforts runs into the same missing interface from a different side. Defining it once could give them a shared contract to build on, instead of each one working around its absence.
This also relates to roadmap issue #1720.
Feedback requested
A few things I'd love a read on:
I'd especially welcome corrections from folks closer to this code path.
Alternatives Considered
Agent Investigation
A short summary of what I found in the code; happy to share the longer write-up.
ComputeDriveris realized as a gRPC service today), but leaves the Isolation Backend as a box with no interface.openshell-sandbox: create netns, install rules, then spawn the agent into it). The agent's container is grantedNET_ADMIN,SYS_ADMIN,SYS_PTRACE,SYSLOG, plusSETUID/SETGID/DAC_READ_SEARCHunder user namespaces, for the boundary setup plus the supervisor's process-management and identity-resolution duties there.mainatb7ce0be4.Checklist