Skip to content

feat: route workloads to city locations via distributed scheduling (foundation)#107

Merged
scotwells merged 14 commits into
mainfrom
feat/federated-deployment-scheduling
Jun 11, 2026
Merged

feat: route workloads to city locations via distributed scheduling (foundation)#107
scotwells merged 14 commits into
mainfrom
feat/federated-deployment-scheduling

Conversation

@scotwells

@scotwells scotwells commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Workloads targeting a city location are automatically routed to the correct physical site, with instance health and readiness surfaced back to the platform in real time. This replaces the single central scheduler with per-site distributed scheduling, so each site operates independently. User-facing behavior is unchanged — city-code targeting, instance visibility, and the existing API all work as before.

This is the complete federation foundation. Decomposed from one large PR; the genuinely-independent pieces landed first and are merged:

What remains here is the full controller layer and the operational-completeness fixes that make it correct on its own: quota self-heals when a grant arrives late (backing-off safety-net requeue, and the quota condition is persisted before transient errors return so granted state can't be lost), instance restart actually rolls instances (recreate instead of in-place update), a downstream-WorkloadDeployment status watch so aggregated status mirrors back immediately instead of on resync, rollout progress via UpdatedReplicas/ObservedGeneration, instance blocking reasons, and instanceType vCPU/memory quota sizing. (These were briefly split into a separate PR and folded back in, since the review showed the foundation is incomplete without them.)

Design & docs

What's inside

14 thematic commits, intended to be reviewed commit-by-commit:

  • Remove the central WorkloadDeployment scheduler
  • WorkloadDeployment federator (project control plane → federation hub)
  • InstanceProjector (federation hub → project namespaces)
  • Distributed WorkloadDeployment and Workload reconciliation
  • Instance controller for federated scheduling
  • Webhook validation updates for federation
  • Cell and management-plane wiring with feature gates
  • Roll instances by recreate so restart actually rolls them
  • Rollout progress via UpdatedReplicas + ObservedGeneration
  • Instance blocking reasons and instanceType vCPU/memory quota claims
  • CRDs, RBAC, and kustomize overlays for federation
  • IAM: allow users to patch workloads
  • Regression tests for replica counting and scheduling-gate clearing
  • Toolchain: Go 1.25 + golangci-lint v2.12.2

Testing

Covered by unit tests here, including regression coverage for the replica-counting/gate-clearing path and for quota-condition persistence across transient reconcile errors. End-to-end coverage is deferred to #149 — the original harness ran the operators locally (go run) rather than deploying them to the cells, so it didn't exercise RBAC/manifests/image. It'll be rebuilt as a proper in-cluster harness; the deferred suites are preserved on archive/e2e-local-deferred.

Known follow-ups (from review)

Not blockers for review, tracked separately: single-cluster overlay bootability, the status interpreter not being wired into any overlay, management-plane leader-election scoping (the federation manager runs outside leader election), and observability (metrics/Events) on the federation paths.

Closes #85

@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch from 0c0d8df to 134086f Compare May 19, 2026 21:10
@scotwells scotwells changed the title feat: federated deployment scheduling across POP cells feat: Route workloads to city locations via distributed scheduling May 20, 2026
@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch 3 times, most recently from 6e9a268 to 492eb6c Compare May 20, 2026 22:19
@scotwells scotwells requested a review from mattdjenkinson May 27, 2026 00:15
mattdjenkinson
mattdjenkinson previously approved these changes May 27, 2026
privateip
privateip previously approved these changes May 28, 2026
@scotwells scotwells closed this May 28, 2026
@scotwells scotwells reopened this May 28, 2026
@scotwells scotwells marked this pull request as draft May 28, 2026 20:53
@scotwells

Copy link
Copy Markdown
Contributor Author

Setting to draft while I continue to iterate on getting this working in staging.

@scotwells

Copy link
Copy Markdown
Contributor Author

📦 The federation e2e chainsaw suites (~900 lines of test YAML) have been split out into a dedicated PR so this foundation reviews without them inline. The shared test/e2e/env harness stays here. See the federation-e2e PR (stacked on this branch).

Base automatically changed from split/api-rename to main June 5, 2026 19:56
@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch 3 times, most recently from 71e388c to 5718fbb Compare June 10, 2026 19:26
scotwells and others added 2 commits June 10, 2026 14:52
Bump the toolchain to Go 1.25 and golangci-lint v2.12.2 and align the
CI workflows and Makefile with the new versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delete the central scheduler that placed WorkloadDeployments from a
single control plane, and drop its registration from main. Placement
now happens through the distributed federator and per-cell controllers
introduced in the following commits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch 4 times, most recently from 9542455 to 615ef54 Compare June 10, 2026 23:33
scotwells and others added 2 commits June 10, 2026 18:47
Introduce the federator that fans a WorkloadDeployment out to the cells
selected for its placement, replacing the central scheduler. Add the
city-code field indexer it uses to map subnet/location events back to
the deployments that depend on them.

Beyond fanning the spec out, the federator watches the downstream
Karmada WorkloadDeployment (milosource cluster source with
cluster-name-preserving enqueue) so aggregated status mirrors back to
the project WorkloadDeployment immediately instead of waiting on an
informer resync. Downstream events map back to the bare project cluster
name the multicluster provider keys on, dropping events for clusters
that are not engaged yet.

The "cluster-<name>" label encoding (project path with "/" -> "_") is
centralized in EncodeClusterName/DecodeClusterName so the wire format
lives in one place; the federator wraps the shared decoder and trims to
the last path segment to recover the provider cluster key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the projector that mirrors cell-side Instances back to the
management plane, writing their status (readiness, placement, blocking
reasons) onto the project-scoped Instance so callers see a single view
across cells. Include the shared controller test helpers that build the
project/Karmada fake clients and multi-cluster manager used by the
federation tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch from 615ef54 to 5b638bb Compare June 10, 2026 23:51
scotwells and others added 10 commits June 10, 2026 19:46
…liation

Rework the WorkloadDeployment and Workload controllers to run per cell,
resolving networks and Locations locally and driving Instance lifecycle
through the stateful instance-control logic rather than a central
scheduler. Update the instance-control packages to manage Instances
within a cell's control plane.

The reconciler requeues explicitly after adding its finalizer (the
metadata-only Update can be dropped by watch-side event filtering, which
would otherwise leave a new cell WorkloadDeployment unreconciled), and
the scheduling-gate clearing path guards the nilable Spec.Controller
that the infra provider populates independently of networking readiness.

A deployment whose city has no Location yet has no other wake-up event
(SubnetClaims/Subnets only exist after a Location resolved), so the
controller watches Locations to re-reconcile waiting deployments, and
surfaces the wait on the Available condition (NoMatchingLocation, naming
the city code) instead of only logging it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Update the Instance controller to compute the Ready/Available conditions
and apply the per-project quota gate within a single reconcile pass, so
federated placement reflects real allocatable capacity.

Quota flow: the ResourceClaim is named after the Instance (unique within
the project control plane, "instance-" prefixed so it cannot collide
with other kinds' claims) and carries an instance-namespace label so a
grant event maps back to the owning Instance for immediate re-enqueue.
Because the grant lives on the project control plane and the watch event
can be missed (informer engagement races, relist gaps), a backing-off
safety-net requeue runs while QuotaGranted != True — anchored on the
Instance creation time, computed up front so every return path honors
it, logged for observability, and falling back to the bounded quota
interval on write conflicts instead of controller-runtime's error
backoff.

The controller also emits Warning events explaining why an Instance is
blocked (QuotaNoBudget, NetworkFailedToCreate, ...) so the signal
reaches kubectl describe and the activity timeline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update Workload webhook and Instance validation so the API accepts the
fields federated scheduling adds and continues to reject invalid
placement and runtime specs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the manager to run in either cell or management-plane mode, gating
the federator, projector, and per-cell controllers behind feature flags.
Add the feature-gate registry and extend configuration to carry the
downstream kubeconfig and discovery settings each mode needs.

Single-mode project resolution (decoding edge namespace labels into
project identity) lives in the controller package as
NewSingleModeProjectID/NewSingleModeProjectNamespace constructors;
main.go keeps only the wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… them

A template-hash change (an image update, or a restartedAt annotation from
`datumctl compute restart`) previously resolved to an in-place Update of the
Instance. The unikraft provider bakes the pod at creation time and never
recomputes an existing pod's spec, so the in-place update silently failed to
roll the running workload — instances kept their old pod.

Emit a delete (recreate) for drifted Ready instances instead. The next
reconcile refills the slot via the create path with the new template, and the
provider's finalizer-gated teardown plus create-on-new-Instance roll the pod
with no provider changes. Ordered one-at-a-time pacing is preserved by the
existing descending-ordinal sort, skip-all-but-first, and the
DeletionTimestamp WaitAction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvedGeneration

A restart/rolling update was invisible from the project plane: there was no
status field representing how many instances are on the new template revision.
Add UpdatedReplicas (instances whose observed template hash matches the desired
template, regardless of readiness) and ObservedGeneration to both
WorkloadDeployment and Workload (plus placement) status.

UpdatedReplicas is computed on the cell WD reconcile alongside CurrentReplicas
(which is now its Programmed subset), aggregated up into the Workload, and rides
the existing status sync to the project plane. Repoint the "Up-to-date"
printcolumn to .status.updatedReplicas to match `kubectl get deployment`
semantics, so a roll is visible as the count dips below Replicas and recovers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emory

Two Instance-controller correctness changes:

- Blocking-reason rollup: surface the most specific provider sub-condition
  (ImageUnavailable, InstanceCrashing, ConfigurationError, Provisioning) and its
  message onto the Instance Ready condition instead of a generic "Instance has
  not been programmed", so e.g. an image-pull failure reads as ImageUnavailable
  with the real message. Ranks the API reason constants in the blocking-reason
  priority.

- Quota sizing: resolve vCPU/memory for instanceType-sized instances from a new
  instanceTypeCatalog (datumcloud/d1-standard-2 = 1 vCPU / 2 GiB) so the quota
  ResourceClaim requests vcpus + memory, not just instance count. Explicit
  container limits / instance requests still take precedence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Regenerate the Instance, Workload, and WorkloadDeployment CRDs for the
new API fields and add the kustomize structure that deploys the manager
in cell or management-plane mode: federation and downstream RBAC bases,
cell/management/quota-credentials components, the WorkloadDeployment
status interpreter, and the matching overlays.

The regenerated controller role also grants the event writes the
instance controller performs when surfacing blocking reasons
(QuotaNoBudget, ImageUnavailable, NetworkFailedToCreate, ...) so those
signals reach kubectl describe and the activity timeline instead of
being rejected by RBAC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate clearing

Adds unit coverage for the WorkloadDeployment controller's replica
bucketing (updated/current/ready/quota-blocked), the network
scheduling-gate clearing path, the nil Spec.Controller and nil
Status.Controller regressions, and the finalizer-add requeue with
status publication (ObservedGeneration, DesiredReplicas, ReplicasReady).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@scotwells scotwells force-pushed the feat/federated-deployment-scheduling branch from 5b638bb to 7dc94a0 Compare June 11, 2026 00:56
@scotwells scotwells merged commit 7db1bcb into main Jun 11, 2026
9 checks passed
@scotwells scotwells deleted the feat/federated-deployment-scheduling branch June 11, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define integration strategy with federated control plane for workload deployment scheduling

3 participants