feat: route workloads to city locations via distributed scheduling (foundation)#107
Merged
Conversation
0c0d8df to
134086f
Compare
6e9a268 to
492eb6c
Compare
mattdjenkinson
approved these changes
May 22, 2026
mattdjenkinson
previously approved these changes
May 27, 2026
privateip
previously approved these changes
May 28, 2026
Contributor
Author
|
Setting to draft while I continue to iterate on getting this working in staging. |
This was referenced May 29, 2026
The base branch was changed.
This was referenced Jun 4, 2026
82955e2 to
bf73355
Compare
Contributor
Author
|
📦 The federation e2e chainsaw suites (~900 lines of test YAML) have been split out into a dedicated PR so this foundation reviews without them inline. The shared |
71e388c to
5718fbb
Compare
Bump the toolchain to Go 1.25 and golangci-lint v2.12.2 and align the CI workflows and Makefile with the new versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delete the central scheduler that placed WorkloadDeployments from a single control plane, and drop its registration from main. Placement now happens through the distributed federator and per-cell controllers introduced in the following commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9542455 to
615ef54
Compare
Introduce the federator that fans a WorkloadDeployment out to the cells selected for its placement, replacing the central scheduler. Add the city-code field indexer it uses to map subnet/location events back to the deployments that depend on them. Beyond fanning the spec out, the federator watches the downstream Karmada WorkloadDeployment (milosource cluster source with cluster-name-preserving enqueue) so aggregated status mirrors back to the project WorkloadDeployment immediately instead of waiting on an informer resync. Downstream events map back to the bare project cluster name the multicluster provider keys on, dropping events for clusters that are not engaged yet. The "cluster-<name>" label encoding (project path with "/" -> "_") is centralized in EncodeClusterName/DecodeClusterName so the wire format lives in one place; the federator wraps the shared decoder and trims to the last path segment to recover the provider cluster key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the projector that mirrors cell-side Instances back to the management plane, writing their status (readiness, placement, blocking reasons) onto the project-scoped Instance so callers see a single view across cells. Include the shared controller test helpers that build the project/Karmada fake clients and multi-cluster manager used by the federation tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
615ef54 to
5b638bb
Compare
…liation Rework the WorkloadDeployment and Workload controllers to run per cell, resolving networks and Locations locally and driving Instance lifecycle through the stateful instance-control logic rather than a central scheduler. Update the instance-control packages to manage Instances within a cell's control plane. The reconciler requeues explicitly after adding its finalizer (the metadata-only Update can be dropped by watch-side event filtering, which would otherwise leave a new cell WorkloadDeployment unreconciled), and the scheduling-gate clearing path guards the nilable Spec.Controller that the infra provider populates independently of networking readiness. A deployment whose city has no Location yet has no other wake-up event (SubnetClaims/Subnets only exist after a Location resolved), so the controller watches Locations to re-reconcile waiting deployments, and surfaces the wait on the Available condition (NoMatchingLocation, naming the city code) instead of only logging it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Update the Instance controller to compute the Ready/Available conditions and apply the per-project quota gate within a single reconcile pass, so federated placement reflects real allocatable capacity. Quota flow: the ResourceClaim is named after the Instance (unique within the project control plane, "instance-" prefixed so it cannot collide with other kinds' claims) and carries an instance-namespace label so a grant event maps back to the owning Instance for immediate re-enqueue. Because the grant lives on the project control plane and the watch event can be missed (informer engagement races, relist gaps), a backing-off safety-net requeue runs while QuotaGranted != True — anchored on the Instance creation time, computed up front so every return path honors it, logged for observability, and falling back to the bounded quota interval on write conflicts instead of controller-runtime's error backoff. The controller also emits Warning events explaining why an Instance is blocked (QuotaNoBudget, NetworkFailedToCreate, ...) so the signal reaches kubectl describe and the activity timeline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update Workload webhook and Instance validation so the API accepts the fields federated scheduling adds and continues to reject invalid placement and runtime specs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the manager to run in either cell or management-plane mode, gating the federator, projector, and per-cell controllers behind feature flags. Add the feature-gate registry and extend configuration to carry the downstream kubeconfig and discovery settings each mode needs. Single-mode project resolution (decoding edge namespace labels into project identity) lives in the controller package as NewSingleModeProjectID/NewSingleModeProjectNamespace constructors; main.go keeps only the wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… them A template-hash change (an image update, or a restartedAt annotation from `datumctl compute restart`) previously resolved to an in-place Update of the Instance. The unikraft provider bakes the pod at creation time and never recomputes an existing pod's spec, so the in-place update silently failed to roll the running workload — instances kept their old pod. Emit a delete (recreate) for drifted Ready instances instead. The next reconcile refills the slot via the create path with the new template, and the provider's finalizer-gated teardown plus create-on-new-Instance roll the pod with no provider changes. Ordered one-at-a-time pacing is preserved by the existing descending-ordinal sort, skip-all-but-first, and the DeletionTimestamp WaitAction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvedGeneration A restart/rolling update was invisible from the project plane: there was no status field representing how many instances are on the new template revision. Add UpdatedReplicas (instances whose observed template hash matches the desired template, regardless of readiness) and ObservedGeneration to both WorkloadDeployment and Workload (plus placement) status. UpdatedReplicas is computed on the cell WD reconcile alongside CurrentReplicas (which is now its Programmed subset), aggregated up into the Workload, and rides the existing status sync to the project plane. Repoint the "Up-to-date" printcolumn to .status.updatedReplicas to match `kubectl get deployment` semantics, so a roll is visible as the count dips below Replicas and recovers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emory Two Instance-controller correctness changes: - Blocking-reason rollup: surface the most specific provider sub-condition (ImageUnavailable, InstanceCrashing, ConfigurationError, Provisioning) and its message onto the Instance Ready condition instead of a generic "Instance has not been programmed", so e.g. an image-pull failure reads as ImageUnavailable with the real message. Ranks the API reason constants in the blocking-reason priority. - Quota sizing: resolve vCPU/memory for instanceType-sized instances from a new instanceTypeCatalog (datumcloud/d1-standard-2 = 1 vCPU / 2 GiB) so the quota ResourceClaim requests vcpus + memory, not just instance count. Explicit container limits / instance requests still take precedence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Regenerate the Instance, Workload, and WorkloadDeployment CRDs for the new API fields and add the kustomize structure that deploys the manager in cell or management-plane mode: federation and downstream RBAC bases, cell/management/quota-credentials components, the WorkloadDeployment status interpreter, and the matching overlays. The regenerated controller role also grants the event writes the instance controller performs when surfacing blocking reasons (QuotaNoBudget, ImageUnavailable, NetworkFailedToCreate, ...) so those signals reach kubectl describe and the activity timeline instead of being rejected by RBAC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate clearing Adds unit coverage for the WorkloadDeployment controller's replica bucketing (updated/current/ready/quota-blocked), the network scheduling-gate clearing path, the nil Spec.Controller and nil Status.Controller regressions, and the finalizer-add requeue with status publication (ObservedGeneration, DesiredReplicas, ReplicasReady). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
5b638bb to
7dc94a0
Compare
privateip
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Workloads targeting a city location are automatically routed to the correct physical site, with instance health and readiness surfaced back to the platform in real time. This replaces the single central scheduler with per-site distributed scheduling, so each site operates independently. User-facing behavior is unchanged — city-code targeting, instance visibility, and the existing API all work as before.
This is the complete federation foundation. Decomposed from one large PR; the genuinely-independent pieces landed first and are merged:
Running → Availablecondition rename + federation status surface → An Instance is "Available" when it's ready to serve, even when scaled to zero #150 ✅ (this branch builds on it)What remains here is the full controller layer and the operational-completeness fixes that make it correct on its own: quota self-heals when a grant arrives late (backing-off safety-net requeue, and the quota condition is persisted before transient errors return so granted state can't be lost), instance restart actually rolls instances (recreate instead of in-place update), a downstream-WorkloadDeployment status watch so aggregated status mirrors back immediately instead of on resync, rollout progress via
UpdatedReplicas/ObservedGeneration, instance blocking reasons, andinstanceTypevCPU/memory quota sizing. (These were briefly split into a separate PR and folded back in, since the review showed the foundation is incomplete without them.)Design & docs
What's inside
14 thematic commits, intended to be reviewed commit-by-commit:
UpdatedReplicas+ObservedGenerationinstanceTypevCPU/memory quota claimsTesting
Covered by unit tests here, including regression coverage for the replica-counting/gate-clearing path and for quota-condition persistence across transient reconcile errors. End-to-end coverage is deferred to #149 — the original harness ran the operators locally (
go run) rather than deploying them to the cells, so it didn't exercise RBAC/manifests/image. It'll be rebuilt as a proper in-cluster harness; the deferred suites are preserved onarchive/e2e-local-deferred.Known follow-ups (from review)
Not blockers for review, tracked separately: single-cluster overlay bootability, the status interpreter not being wired into any overlay, management-plane leader-election scoping (the federation manager runs outside leader election), and observability (metrics/Events) on the federation paths.
Closes #85