docs: add OpenSpec change for downscale improvements#253
Open
dobrerazvan wants to merge 4 commits into
Open
Conversation
Author
Adds the design artifacts for the kafka cluster downscale improvement initiative, covering two related fixes: 1. Batched broker removal - submit all brokers removed in a single manifest apply as one CC remove_broker operation instead of N separate operations, eliminating redundant partition movements. 2. Draining broker listener retention - keep brokers removed from spec in envoy/istio/contour config until CruiseControl completes draining (GracefulDownscaleSucceeded), preserving client connectivity. Artifacts: proposal, design, capability specs, and implementation tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Control completes Brokers removed from the KafkaCluster spec were immediately excluded from envoy, istio, and contour config, even while CruiseControl was still draining them. Clients lost connectivity to brokers that still held partition leaders. Root cause: ShouldIncludeBroker() returned false when brokerConfig==nil (broker not in spec). Add a fallback path: when brokerConfig is nil, check the broker's CruiseControlState in status. If the state is an active downscale (IsDownscale && !IsSucceeded) and the broker was previously bound to the requested ingressConfig, keep it in the external listener resources. Brokers stuck in CompletedWithError or Paused are also retained, allowing manual investigation while keeping client connectivity. ShouldIncludeBroker is the single gatekeeper for all external listener reconcilers (envoy, istio, contour), so no other files need changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When removing multiple brokers from a KafkaCluster spec, the operator previously created one CruiseControlOperation per broker. CC can only run one operation at a time, causing data to be shuffled onto brokers that will themselves be decommissioned, far more partition movements than necessary. Collect all broker IDs in GracefulDownscaleRequired state and submit them as a single remove_broker CC operation, mirroring the existing addBrokers pattern. Because the KafkaCluster reconciler sets all removed brokers to Required via a single atomic status patch, all brokers from a single manifest apply are guaranteed to land in the same batch. Includes unit test (createCCOperation multi-broker params), integration test (exactly one CCOperation created for two simultaneous removals), e2e test scaffold, and 5-broker sample manifest. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3cf667b to
be0ad95
Compare
This was referenced Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the design artifacts for the kafka cluster downscale improvement initiative, covering two related fixes:
Batched broker removal - submit all brokers removed in a single manifest apply as one CC remove_broker operation instead of N separate operations, eliminating redundant partition movements.
Draining broker listener retention - keep brokers removed from spec in envoy/istio/contour config until CruiseControl completes draining (GracefulDownscaleSucceeded), preserving client connectivity.
Artifacts: proposal, design, capability specs, and implementation tasks.
Description
Please provide a meaningful description of what this change will do, or is for. Bonus points for including links to
related issues, other PRs, or technical references.
Note that by not including a description, you are asking reviewers to do extra work to understand the context of this
change, which may lead to your PR taking much longer to review, or result in it not being reviewed at all.
Type of Change
Checklist