Skip to content

Add IMDSv2 migration command for ROSA Classic clusters#909

Open
givaldolins wants to merge 4 commits into
openshift:masterfrom
givaldolins:imdsv2
Open

Add IMDSv2 migration command for ROSA Classic clusters#909
givaldolins wants to merge 4 commits into
openshift:masterfrom
givaldolins:imdsv2

Conversation

@givaldolins

@givaldolins givaldolins commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Implements automated IMDSv2 migration workflow with:

  • Pre-flight health checks (ClusterOperators, CPMS, nodes, etcd)
  • Hive MachinePool patching for infra/worker nodes
  • Sequential infra machine replacement with health validation
  • ControlPlaneMachineSet update for master node rollout
  • Post-migration validation

Supports selective migration via --nodes flag (all, master, infra, workers).

This MR automates SOP

Summary by CodeRabbit

  • New Features

    • Added osdctl cluster imdsv2 to migrate ROSA Classic clusters to enforce EC2 Instance Metadata Service v2 (IMDSv2).
    • Supports --nodes all|master|infra|workers, runs migration in ordered phases, prompts for confirmation when updates are required, and validates node + provider-spec IMDSv2 configuration.
  • Documentation

    • Added dedicated command docs and linked it from the main cluster command references, including usage, options, and examples.
  • Tests

    • Added unit tests for option validation, pre-flight checks, IMDSv2 configuration validation, and AWS Classic vs hypershift behavior.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR introduces osdctl cluster imdsv2, a new command that orchestrates AWS ROSA Classic cluster node migration to enforce IMDSv2. The implementation provides helper utilities for cluster client setup and validation, registers the Cobra command with flag wiring, initializes OCM and Kubernetes clients, runs pre-flight health checks, executes conditional migrations on Hive MachinePool and ControlPlaneMachineSet resources, validates node and Machine provider spec compliance, and includes comprehensive test coverage and user documentation.

Changes

IMDSv2 Migration Feature

Layer / File(s) Summary
Cluster and Hive client utilities
cmd/cluster/helpers.go
Provides ClusterClients struct and SetupClusterClients/SetupHiveClients factory functions to establish read-only and elevated admin clients; includes health-check helpers (CheckClusterOperators, WaitForClusterOperatorsHealthy), CPMS monitoring (CheckCPMSState, MonitorCPMSRollout), node utilities (CountReadyNodes), and validation helpers (ValidateAWSClassicCluster, GetHiveNamespace).
Command registration and scaffolding
cmd/cluster/cmd.go, cmd/cluster/imdsv2.go (lines 1–109)
Registers newCmdIMDSv2() subcommand with Cobra; defines module constants for CPMS identity, polling timeouts, IMDSv2 authentication modes, and Hive override annotation; establishes imdsv2Options struct and wires flags (--cluster-id, --nodes, --reason) with validation for cluster key format and allowed node-role values.
Client initialization and cluster validation
cmd/cluster/imdsv2.go (lines 111–149)
Establishes OCM connection, retrieves target cluster, validates cluster ID format, rejects non-AWS and Hypershift-enabled clusters, sets up standard Kubernetes read-only and elevated admin clients, conditionally initializes Hive clients when infra and/or worker node migration is requested.
Pre-flight safety validation
cmd/cluster/imdsv2.go (lines 232–284)
Checks ClusterOperator health and availability, validates ControlPlaneMachineSet Active state with correct replica counts, confirms exactly 3 Ready master nodes, verifies all infra nodes Ready when infra is in scope, and gates master migration on CPMS readiness.
Migration orchestration and execution
cmd/cluster/imdsv2.go (lines 151–680)
Orchestrates complete workflow: validates options, initializes clients, prints cluster context, runs pre-flight checks, executes conditional infra MachinePool patching (clone and patch, wait for operators), ControlPlaneMachineSet update (unmarshal spec, patch, monitor rollout completion), and worker MachinePool patching (list, filter, user confirmation, patch via admin client); then validates node readiness with retries and Machine provider specs, reporting per-phase success summary.
Comprehensive test suite
cmd/cluster/imdsv2_test.go
Unit tests validate command option parsing for cluster ID and node-role values; validate IMDSv2 node readiness polling skipping nodes with deletion timestamp and unschedulable flag; test MachinePool IMDSv2 configuration extraction from AWS provider spec; validate preflight master node enforcement and CPMS state checking; verify node-role filtering logic; test helper utilities; and assert AWS Classic cluster validation across provider and Hypershift scenarios.
User documentation
docs/README.md, docs/osdctl_cluster.md, docs/osdctl_cluster_imdsv2.md
Adds imdsv2 to README cluster command overview with detailed workflow description, enumerated automated steps (Hive MachinePool patching, infra sequential replacement, CPMS update, validation), and pre-flight checks; updates cluster command reference with SEE ALSO entry; introduces dedicated osdctl_cluster_imdsv2.md page with usage synopsis, example invocations, command-specific flags (--cluster-id, --nodes, --reason), parent command flag inheritance note, and link to parent osdctl cluster documentation.

Sequence Diagram

sequenceDiagram
  participant User as User/CLI
  participant Cmd as osdctl imdsv2
  participant OCM as OCM API
  participant K8s as Kubernetes API
  participant Hive as Hive API

  User->>Cmd: cluster imdsv2 --cluster-id=... --nodes=...
  Cmd->>OCM: Fetch cluster and validate AWS Classic
  Cmd->>K8s: Initialize standard and admin clients
  Cmd->>Hive: Initialize Hive clients (if infra/workers)
  
  Cmd->>K8s: Pre-flight: Check operators, CPMS, nodes

  rect rgba(100, 200, 150, 0.5)
  Note over Cmd,Hive: Infra Migration (if infra/all)
  Cmd->>Hive: Get infra MachinePool
  Cmd->>Hive: Clone and patch for IMDSv2
  Cmd->>K8s: Wait for operators healthy
  end

  rect rgba(200, 150, 100, 0.5)
  Note over Cmd,K8s: CPMS Update (if master/all)
  Cmd->>K8s: Get ControlPlaneMachineSet
  Cmd->>K8s: Patch provider spec for IMDSv2
  Cmd->>K8s: Monitor rollout completion
  end

  rect rgba(150, 150, 200, 0.5)
  Note over Cmd,Hive: Worker Migration (if workers/all)
  Cmd->>Hive: List worker MachinePools
  Cmd->>Hive: Patch for IMDSv2 via admin client
  end

  Cmd->>K8s: Final validation: poll nodes Ready
  Cmd->>K8s: Check Machine provider specs
  Cmd->>User: Success/warnings report
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Multiple logging statements expose sensitive data: cluster name/ID (line 162), user reason (line 164), node names (lines 615,618,630), machine names (line 672), and customer tracking info passed to... Remove or redact sensitive logging: avoid printing cluster names, node/machine names, and customer-provided reasons. Log only counts or operational status.
Docstring Coverage ⚠️ Warning Docstring coverage is 59.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test file imdsv2_test.go has 3 assertions without meaningful failure messages (lines 81, 132, 573), violating the requirement that assertions include messages to help diagnose failures. Add meaningful failure messages to assert.NoError(t, err) calls at lines 81, 132, and 573 following the pattern used in lines 179 and 222 of the same file.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main addition: an IMDSv2 migration command for ROSA Classic clusters, which directly matches the core changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR uses Go standard testing package (not Ginkgo). All 11 test functions and their subtests use stable, deterministic names with static string literals; no dynamic values like timestamps, UUIDs,...
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The test file imdsv2_test.go uses standard Go testing package with testing.T, not Ginkgo framework, so the check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. The test file cmd/cluster/imdsv2_test.go contains only standard Go unit tests using the testing.T pattern, not Ginkgo tests (It/Describe/Context/When).
Topology-Aware Scheduling Compatibility ✅ Passed This PR adds a CLI subcommand (osdctl cluster imdsv2) for ROSA Classic cluster management. The custom check applies only to deployment manifests, operator code, or controllers deployed to clusters....
Ote Binary Stdout Contract ✅ Passed osdctl is a standalone CLI tool, not an OTE (OpenShift Tests Extension) binary. The custom check applies only to OTE binaries that communicate with openshift-tests via JSON. This PR adds normal Cob...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added in PR. New test file (cmd/cluster/imdsv2_test.go) uses Go standard testing with testify, not Ginkgo. Check only applies to Ginkgo e2e tests.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret/token comparisons found. All authentication comparisons are a...
Container-Privileges ✅ Passed PR contains only Go CLI source code and documentation; no container manifests or K8s configurations present with privilege escalation settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from devppratik and petrkotas June 3, 2026 20:35

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cmd/cluster/imdsv2.go (1)

348-395: ⚖️ Poor tradeoff

Consider using the Kubernetes client instead of shelling out to oc.

Shelling out to oc introduces a dependency on the CLI being installed and configured. The kubeconfig context used by oc may differ from the client initialized via backplane in init(). Consider using the o.client to fetch ClusterOperator resources directly via the config.openshift.io/v1 API group.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/imdsv2.go` around lines 348 - 395, The checkClusterOperators
function currently shells out to `oc` to list ClusterOperators; replace that
with direct API calls using the existing o.client so the same
kubeconfig/backplane client is used. Use the OpenShift config v1 ClusterOperator
type (e.g., configv1.ClusterOperatorList) or a dynamic client list against
"config.openshift.io/v1" to fetch all ClusterOperator resources with
o.client.List(ctx, &list) (or equivalent), then iterate list.Items and inspect
each item.Status.Conditions exactly as done now to build unhealthyOps, returning
errors on client.List or unmarshalling failures as appropriate; keep the same
final behavior/message and error formatting but remove the exec.CommandContext
call and JSON unmarshal logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/imdsv2.go`:
- Around line 728-754: The function monitorCPMSRollout creates a timeout context
from context.Background(), breaking cancellation propagation; change the context
creation to derive from the passed ctx (use context.WithTimeout(ctx,
imdsv2RolloutPollTimeout) and keep cancel deferred) so parent cancellations
(SIGINT or upstream cancellation) interrupt the polling loop; update the context
creation in monitorCPMSRollout where pollCtx and cancel are defined (and
continue to pass pollCtx into wait.PollUntilContextTimeout as before).

---

Nitpick comments:
In `@cmd/cluster/imdsv2.go`:
- Around line 348-395: The checkClusterOperators function currently shells out
to `oc` to list ClusterOperators; replace that with direct API calls using the
existing o.client so the same kubeconfig/backplane client is used. Use the
OpenShift config v1 ClusterOperator type (e.g., configv1.ClusterOperatorList) or
a dynamic client list against "config.openshift.io/v1" to fetch all
ClusterOperator resources with o.client.List(ctx, &list) (or equivalent), then
iterate list.Items and inspect each item.Status.Conditions exactly as done now
to build unhealthyOps, returning errors on client.List or unmarshalling failures
as appropriate; keep the same final behavior/message and error formatting but
remove the exec.CommandContext call and JSON unmarshal logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 993b8d49-1d85-43e7-9f2d-d04b1a1555f0

📥 Commits

Reviewing files that changed from the base of the PR and between 7117673 and 38748b6.

📒 Files selected for processing (2)
  • cmd/cluster/cmd.go
  • cmd/cluster/imdsv2.go

Comment thread cmd/cluster/imdsv2.go Outdated
@givaldolins

Copy link
Copy Markdown
Contributor Author

/retest

@givaldolins

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@clcollins clcollins left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions.

Also, Claude noticed ""preFlightChecks, monitorCPMSRollout, init, and Hive client setup are near-identical copies of changevolumetype.go" and suggested they should be consolidated/adjusted to work so there's less redundant code.

Comment thread cmd/cluster/imdsv2.go Outdated
}

// replaceInfraMachines replaces infra machines one at a time.
func (o *imdsv2Options) replaceInfraMachines(ctx context.Context) error {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to try to use

func RunMachinePoolDance(ctx context.Context, clients DanceClients, originalMp, newMp *hivev1.MachinePool, onTimeout func(ctx context.Context, nodes *corev1.NodeList) error) error {
for this piece.

Comment thread cmd/cluster/imdsv2.go Outdated
}

// Delete the machine (MachineSet will create replacement with IMDSv2)
if err := o.clientAdmin.Delete(ctx, machine); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are confirmation functions elsewhere in this PR, but there's not one included here. Is that intended or desired?

Comment thread cmd/cluster/imdsv2.go Outdated
}
}

// Step 3: Clean up temporary override annotations from MachinePools

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this process fails partway through, before it gets here? The annotations would continue to live on the machine pools:

Comment thread cmd/cluster/imdsv2.go Outdated
fmt.Printf("Hive namespace: %s\n", hiveNamespace)

// Retrieve all MachinePools for this cluster
mpList := &hivev1.MachinePoolList{}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this would match machinepools you don't want to patch? Are ALL machinepools in that namespace supposed to be patched, every time? (Truly asking - i'm not 100% sure)

Comment thread cmd/cluster/imdsv2.go Outdated
fmt.Println("Worker nodes must be replaced by the customer.")
}

printer.PrintlnGreen("\n✓ IMDSv2 migration completed successfully!")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Everything before this is an if statement, so there's a chance (however small - errors, or changes in the future) where you might get here with no actual operation happening, and still get the "completed successfully!" message. Is there something you can check on to ensure that it happened? Some state on the nodes, etc?

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: givaldolins
Once this PR has been reviewed and has the lgtm label, please assign zmird-r for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cmd/cluster/imdsv2.go (1)

628-634: 💤 Low value

time.Sleep ignores context cancellation.

The retry loop uses time.Sleep(retryDelay) which doesn't respect the passed ctx. If the user sends SIGINT during the 30-second sleep, the command won't respond until the sleep completes.

Consider using a context-aware wait:

Suggested improvement
-		if attempt < maxRetries {
-			fmt.Printf("  ⏳ Waiting for %d nodes to become Ready (attempt %d/%d): %s\n",
-				len(notReadyNodes), attempt, maxRetries, strings.Join(notReadyNodes, ", "))
-			time.Sleep(retryDelay)
-			continue
+		if attempt < maxRetries {
+			fmt.Printf("  ⏳ Waiting for %d nodes to become Ready (attempt %d/%d): %s\n",
+				len(notReadyNodes), attempt, maxRetries, strings.Join(notReadyNodes, ", "))
+			select {
+			case <-ctx.Done():
+				return ctx.Err()
+			case <-time.After(retryDelay):
+			}
+			continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/imdsv2.go` around lines 628 - 634, The `time.Sleep(retryDelay)`
call in the retry loop blocks indefinitely and ignores context cancellation,
preventing responsive handling of SIGINT signals. Replace the
`time.Sleep(retryDelay)` statement with a context-aware wait mechanism using a
select statement that monitors both `ctx.Done()` for cancellation and a timer
for the delay. If the context is cancelled first, break out of the retry loop;
otherwise, continue to the next retry attempt after the timer fires.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/imdsv2.go`:
- Around line 399-406: The validWorkerNames map and filtering logic in the
MachinePool iteration only matches the default "worker" pool name, silently
skipping custom worker pools like "worker-2" or "gpu-workers" that ROSA
supports. Replace the inclusion-based filter (validWorkerNames map checking
mp.Spec.Name == "worker") with an exclusion-based approach that skips only
"infra" and "master" pool types, or add logging to indicate which pools are
being skipped so operators understand which pools require separate handling.
This ensures the function processes all worker pools as its comment claims
rather than just the single default pool.

---

Nitpick comments:
In `@cmd/cluster/imdsv2.go`:
- Around line 628-634: The `time.Sleep(retryDelay)` call in the retry loop
blocks indefinitely and ignores context cancellation, preventing responsive
handling of SIGINT signals. Replace the `time.Sleep(retryDelay)` statement with
a context-aware wait mechanism using a select statement that monitors both
`ctx.Done()` for cancellation and a timer for the delay. If the context is
cancelled first, break out of the retry loop; otherwise, continue to the next
retry attempt after the timer fires.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a8204653-bfa2-4998-9304-1c072b527ed8

📥 Commits

Reviewing files that changed from the base of the PR and between 07e230f and a43353a.

📒 Files selected for processing (2)
  • cmd/cluster/helpers.go
  • cmd/cluster/imdsv2.go

Comment thread cmd/cluster/imdsv2.go Outdated
This command automates the SOP for migrating ROSA Classic cluster nodes
to enforce IMDSv2 (Instance Metadata Service v2) by:
- Replacing infra nodes using the MachinePool dance pattern
- Updating ControlPlaneMachineSet for automatic master node rollout
- Patching worker MachinePools (customer performs machine replacement)
- Validating all nodes/machines are using IMDSv2

Key features:
- Pre-flight checks verify cluster health before making changes
- Confirmation prompts before all destructive operations
- Tracks actual changes and shows accurate success/skip messages
- Supports selective migration: --nodes all|master|infra|workers
- Context-aware with proper cancellation handling
- Comprehensive test coverage (11 test cases)

Implementation follows all review comments from @clcollins:
1. Code consolidation via helpers.go (shared with changevolumetype)
2. Uses RunMachinePoolDance for atomic infra replacement
3. Confirmation prompts before destructive operations
4. No annotation cleanup risk (eliminated pattern entirely)
5. MachinePool name validation for safety
6. Accurate change tracking and reporting

Additional improvements from CodeRabbit AI review:
- Exclusion-based worker pool filtering to support custom pools
  (worker-2, gpu-workers, etc.)
- Context-aware sleep for responsive cancellation handling

Files added:
- cmd/cluster/imdsv2.go (501 lines) - main command implementation
- cmd/cluster/helpers.go (235 lines) - shared helper functions
- cmd/cluster/imdsv2_test.go (580 lines) - comprehensive tests
- docs/osdctl_cluster_imdsv2.md - generated documentation

All tests passing, builds successfully.

Signed-off-by: Givaldo Lins <glins@redhat.com>
@givaldolins

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/imdsv2_test.go`:
- Around line 224-333: The tests `TestCheckIMDSv2Configuration` and
`TestMachinePoolNameValidation` are tautological because they duplicate logic
locally instead of testing actual production functions. To fix this, identify
the real production functions or helpers in `imdsv2.go` that implement the
IMDSv2 configuration checking and machine pool name validation logic, then
rewrite these tests to call those actual functions with the test inputs and
assert on their outputs, rather than reimplementing the same logic inline. If no
suitable public functions exist, extract testable helper functions from
`imdsv2.go` and test those instead. This ensures the tests will catch actual
regressions in the production code.
- Line 397: The Name field assignment is using string(rune(i)) which converts
the integer loop counter to its corresponding Unicode character rather than its
numeric digit string representation, resulting in non-printable control
characters. Replace the rune conversion with proper numeric string formatting
such as using strconv.Itoa(i) or fmt.Sprintf with the %d verb to convert the
integer i to its string representation, so that the master node names are
generated correctly as "master-0", "master-1", etc.
- Around line 88-90: Replace all instances where `AddToScheme()` and `Build()`
errors are ignored using blank identifier assignments (like `_ =
corev1.AddToScheme(scheme)`) with proper error checking using
`require.NoError(t, err)` instead. This applies to all occurrences in
imdsv2_test.go, particularly at the locations mentioned (lines 88-90, 136-138,
183-185, 337, 366-368, and 527-553). Capture the error returned from each
AddToScheme() and Build() call and validate it immediately to ensure setup
failures are caught at the point of failure rather than causing misleading test
symptoms later.

In `@cmd/cluster/imdsv2.go`:
- Around line 514-518: Add a nil check for ProviderSpec.Value before
dereferencing Value.Raw in the code block that parses the AWS provider spec.
Before the json.Unmarshal call that accesses
cpms.Spec.Template.OpenShiftMachineV1Beta1Machine.Spec.ProviderSpec.Value.Raw,
verify that Value is not nil and return an appropriate error if it is missing.
This defensive check should follow the same pattern used in the validateIMDSv2
function to prevent potential panics from malformed or missing provider
specifications.

In `@docs/osdctl_cluster_imdsv2.md`:
- Around line 7-40: The command documentation advertises workers as a supported
value in the --nodes flag option, but the synopsis and examples only document
migration steps for infra and master nodes, creating a discrepancy. Either add a
documented worker node migration step and example to match the advertised
--nodes workers capability, or remove workers from the supported values in the
--nodes option. Since this documentation is auto-generated, make these changes
in the source code generator or template that produces this markdown file, not
directly in the emitted markdown.

In `@docs/README.md`:
- Around line 1767-1798: The documentation for the imdsv2 command advertises
support for the --nodes workers flag but does not describe the worker migration
workflow, leaving this documented CLI option undocumented. Instead of editing
the markdown file directly under docs/, locate and update the source generator
or template that produces this documentation (likely a source file or template
outside the docs/ directory). Add documentation describing the worker migration
path using the --nodes workers flag to ensure all advertised command options are
properly documented and the CLI contract is consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b71875da-6e93-49c0-bc85-cbd7a3620dc8

📥 Commits

Reviewing files that changed from the base of the PR and between a43353a and e743555.

📒 Files selected for processing (7)
  • cmd/cluster/cmd.go
  • cmd/cluster/helpers.go
  • cmd/cluster/imdsv2.go
  • cmd/cluster/imdsv2_test.go
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_imdsv2.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/cluster/cmd.go
  • docs/osdctl_cluster.md
  • cmd/cluster/helpers.go

Comment thread cmd/cluster/imdsv2_test.go Outdated
Comment thread cmd/cluster/imdsv2_test.go
Comment thread cmd/cluster/imdsv2_test.go Outdated
Comment thread cmd/cluster/imdsv2.go
Comment thread docs/osdctl_cluster_imdsv2.md
Comment thread docs/README.md
@givaldolins

Copy link
Copy Markdown
Contributor Author

A couple of questions.

Also, Claude noticed ""preFlightChecks, monitorCPMSRollout, init, and Hive client setup are near-identical copies of changevolumetype.go" and suggested they should be consolidated/adjusted to work so there's less redundant code.

Hi @clcollins,
I made some changes and rebased the code to avoid unnecessary additional commits. I believe the changes address the observations you made. Could you please take another look whenever you can?
Thank you.

@givaldolins givaldolins requested a review from clcollins June 17, 2026 20:46
givaldolins and others added 2 commits June 22, 2026 15:51
Address four issues identified in code review:

1. Add nil check for CPMS ProviderSpec.Value
   - Prevents potential panic in updateCPMSForIMDSv2 when Value is nil
   - Matches existing validation pattern in validateIMDSv2

2. Fix tautological tests
   - TestPreFlightChecks_MasterNodesCount now calls CountReadyNodes()
   - Add clarifying comments to other tests linking to production code

3. Fix string(rune(i)) bug in test
   - Change to fmt.Sprintf("master-%d", i) for proper node names
   - Previously generated unprintable characters instead of "master-0", etc.

4. Properly handle scheme registration errors
   - Replace _ = AddToScheme() with require.NoError(t, AddToScheme())
   - Tests now fail during setup if scheme registration fails

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
CodeRabbit flagged that cmd/cluster/cmd.go contains an unused help()
function (lines 60-65) that violates the OTE Binary Stdout Contract by
using fmt.Println to write errors to stdout instead of stderr.

Since the function is not referenced anywhere in the codebase and Cobra
already provides built-in help functionality via cmd.Help(), removing
it entirely is the cleanest solution.

Changes:
- Remove unused help() function
- Remove unused fmt import
- Clean up import ordering (gofmt)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@givaldolins

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Fix two issues identified in code review:

1. No-Sensitive-Data-In-Logs violation:
   - Remove logging of cluster name/ID (line 162)
   - Remove logging of user-provided reason (line 164)
   - Replace node name lists with counts (lines 615, 618, 630)
   - Replace machine name lists with counts (line 672)

   Rationale: Avoid exposing customer-identifying information in logs.
   Log only operational metrics (counts, status) instead of names/IDs.

2. Test Structure And Quality - missing assertion messages:
   - Add descriptive failure messages to assert.NoError() calls at
     lines 81, 132, and 573 in imdsv2_test.go
   - Follows pattern from lines 179 and 222 of the same file

   Rationale: Helps diagnose test failures by explaining what was
   expected when an assertion fails.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@givaldolins: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants