feat: support graceful scale-down for AlluxioRuntime using AdvancedStatefulSet#5805
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @jakharmonika364. Thanks for your PR. I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Code Review
This pull request introduces graceful worker scale-down for Alluxio by decommissioning workers before they are terminated by the StatefulSet controller, ensuring cached blocks can be migrated. This functionality is gated by a new AdvancedStatefulSet feature. The changes include new decommissioning operations in AlluxioFileUtils, integration into the SyncReplicas reconciliation loop, and comprehensive unit tests. Review feedback suggests improving context propagation by replacing context.TODO(), optimizing efficiency by passing existing runtime objects to avoid redundant API lookups, and refining error handling during the draining phase to prevent log noise.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5805 +/- ##
==========================================
+ Coverage 58.17% 64.91% +6.73%
==========================================
Files 478 482 +4
Lines 32485 33616 +1131
==========================================
+ Hits 18899 21821 +2922
+ Misses 12042 10066 -1976
- Partials 1544 1729 +185 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
b77d0d4 to
c62bca3
Compare
|
cheyang
left a comment
There was a problem hiding this comment.
Review: Graceful Scale-Down for AlluxioRuntime via AdvancedStatefulSet
Overall this is a well-structured feature. The architecture (feature gate + decommission operations + reconciler integration) follows Kubernetes and Fluid patterns correctly. The decommission-before-terminate approach is the right way to prevent data loss during scale-in. A few items remain before this is merge-ready.
Architecture & Design
The layering is sound: pkg/features for the gate, pkg/ddc/alluxio/operations for the CLI wrapper, and the reconciler hook in SyncReplicas. Keeping the gate at Alpha/disabled-by-default is appropriate for a first iteration.
One design concern: the PR title references "AdvancedStatefulSet" (OpenKruise) for selective pod deletion, but the current implementation only does highest-ordinal-first decommission — it does not yet interact with OpenKruise APIs. The feature gate name suggests broader scope than what is delivered. Consider documenting (or renaming) this clearly so users understand the gate currently enables graceful scale-down without the selective-pod-deletion piece.
Correctness & Safety
-
context.TODO()indrainScalingDownWorkers(pkg/ddc/alluxio/replicas.go:142)
The reconciler already carries acruntime.ReconcileRequestContext. Thecontext.TODO()ine.Client.Get(...)should be replaced with a proper context propagated fromSyncReplicas. If the reconciler is cancelled (e.g., leader election loss), this call will hang until the Kubernetes client timeout expires. -
Redundant
getRuntime()call ingetWorkerRPCPort()(pkg/ddc/alluxio/replicas.go:174)
SyncReplicasalready fetches the runtime object. Passing it as a parameter (or caching it on the engine) would avoid an extra API round-trip on every reconcile during scale-in. -
Requeue semantics for "not yet drained" (pkg/ddc/alluxio/replicas.go:107–109)
Returningfmt.Errorf(...)from insideretry.RetryOnConflictis technically correct (non-conflict errors surface immediately), but the reconciler will log this at error-level viaLoggingErrorExceptConflict. Since "workers not yet drained" is a normal transient state, consider returning a dedicated sentinel (e.g., wrapping with a condition type) so the outer handler can log at Info level and requeue with a reasonable interval instead of exponential backoff. -
alluxio fsadmin decommission --addressesflag verification
Please confirm this CLI form is supported by the Alluxio version used in Fluid's CI (the Alluxio docs historically have varied betweendecommissionWorkeranddecommission; this could silently fail against older images).
Test Coverage (Primary Gap)
decommission_test.gothoroughly covers the operations layer — good.- Missing: unit tests for
drainScalingDownWorkers. This is the most critical function — it orchestrates decommission, polls active workers, and gates the scale-down. It needs tests covering at least:- Pod already deleted (NotFound path)
- Pod has no IP yet
- Decommission call fails
- Active count still above threshold (requeue path)
- Happy path (drained successfully)
- Missing: unit test for
getWorkerRPCPortcovering the custom-port and default-port branches.
Without these, the core reconciler logic has no direct regression coverage.
Minor Items
| File | Note |
|---|---|
pkg/ddc/alluxio/operations/decommission.go |
Copyright header says 2024; the current year is 2026. |
test/gha-e2e/curvine/read_job.yaml |
The retry-loop improvement is unrelated to the feature; consider splitting into a separate PR for cleaner history, or at minimum noting it in the PR description. |
pkg/features/features.go:44 |
The init() that registers the gate will run on import. Make sure the cmd/ entrypoint imports pkg/features so the gate is actually wired up for the controller binary. If it's only transitively imported via pkg/ddc/alluxio/replicas.go, that's fine, but worth verifying. |
Verdict
Needs work. The design is correct and safe, but unit tests for drainScalingDownWorkers are required before merge, and the context.TODO() issue should be resolved. Once those are addressed this should be ready.
|
The graceful scale-down approach using Alluxio's decommission mechanism is architecturally sound. However, two concerns remain from prior review:
The Verdict: needs-work — command format needs verification |
…atefulSet (fluid-cloudnative#4193) Signed-off-by: Monika Jakhar <jakharmonika364@gmail.com>
c62bca3 to
215e960
Compare
|
|
Pushed fixes for all of this:
@cheyang could you please review it. |



Ⅰ. Describe what this PR does
This PR implements a graceful decommissioning workflow for Alluxio workers during scale-in. It adds a new AdvancedStatefulSet feature gate that allows Fluid to leverage OpenKruise capabilities for finer pod lifecycle management. When enabled, workers are decommissioned and their cached data is migrated before the pods are terminated, ensuring cluster stability and data availability during scaling operations.
Ⅱ. Does this pull request fix one issue?
fixes #4193
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews
The feature is currently in Alpha and disabled by default. It provides the necessary infrastructure to support selective pod deletion in later phases.