Skip to content

USHIFT-7176: Rebase 5.0 + updated OVN-K init flags#6829

Merged
openshift-merge-bot[bot] merged 10 commits into
openshift:mainfrom
pmtk:ovnk-flags
Jun 8, 2026
Merged

USHIFT-7176: Rebase 5.0 + updated OVN-K init flags#6829
openshift-merge-bot[bot] merged 10 commits into
openshift:mainfrom
pmtk:ovnk-flags

Conversation

@pmtk

@pmtk pmtk commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • Chores

    • Updated container image digests and versions for multiple components including Multus, OLM, SR-IOV, and core platform services across both x86_64 and ARM64 architectures.
    • Bumped base release versions to latest nightly builds.
    • Updated Go toolchain version in development environment.
    • Refreshed embedded component commit references.
  • Bug Fixes

    • Improved SELinux volume conflict detection with optimized caching mechanism.
    • Fixed etcd storage monitor lifecycle management to properly close monitors on server shutdown.
  • Refactor

    • Simplified OVN Kubernetes initialization process with updated startup flags.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 8, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@pmtk: This pull request explicitly references no jira issue.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pmtk

pmtk commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Obsoletes #6817

@openshift-ci openshift-ci Bot requested review from ggiguash and vanhalenar June 8, 2026 15:18
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2026
@pmtk pmtk changed the title NO-ISSUE: Rebase 5.0 + updated OVN-K init flags USHIFT-7176: Rebase 5.0 + updated OVN-K init flags Jun 8, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 8, 2026

Copy link
Copy Markdown

@pmtk: This pull request references USHIFT-7176 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This PR updates MicroShift to a newer OCP nightly build with three major change areas: refreshing container image digests and version pins across components, changing OVN master initialization flags, refactoring the volume cache from channel-based to slice-based conflict reporting, and optimizing etcd monitor lifecycle management with caching.

Changes

Nightly Release & Component Updates

Layer / File(s) Summary
Makefile version and Git commit pins
Makefile.kube_git.var, Makefile.version.*.var
KUBE_GIT_COMMIT and OCP_VERSION for both x86_64 and aarch64 are updated to point to newer nightly build identifiers.
Multus component images and release manifests
assets/components/multus/kustomization.*.yaml, assets/components/multus/release-multus-*.json
Container image digests for multus-cni-microshift and containernetworking-plugins-microshift are refreshed in kustomization files and release manifests.
OVN master initialization flags
assets/components/ovn/*/master/daemonset.yaml
Replaces --init-master with --init-cluster-manager and --init-ovnkube-controller flags in both single-node and multi-node OVN master DaemonSet configurations.
Operator Lifecycle Manager images
assets/optional/operator-lifecycle-manager/kustomization.*.yaml, assets/optional/operator-lifecycle-manager/release-olm-*.json
OLM operator, registry, and kube-rbac-proxy image digests are updated in kustomization and release manifests for both architectures.
SR-IOV component images
assets/optional/sriov/kustomization.*.yaml, assets/optional/sriov/release-sriov-*.json
SR-IOV operator, CNI, device-plugin, webhook, and metrics image digests are refreshed across kustomization and release manifests.
Core release manifests and CRI-O config
assets/release/release-*.json, packaging/crio.conf.d/10-microshift_*.conf
Base release versions and core component digests (cli, coredns, haproxy-router, pod, service-ca-operator, csi-snapshot-controller) are updated; CRI-O pause image pins are refreshed.
Auto-rebase workflow and build configuration
scripts/auto-rebase/commits.txt, scripts/auto-rebase/changelog.txt, scripts/auto-rebase/last_rebase.sh, scripts/devenv-builder/configure-vm.sh
Embedded component and image SHAs are pinned to new versions, auto-rebase scripts reference newer nightly tags, and Go toolchain is downgraded from 1.26.3 to 1.25.8.

Volume Cache & SELinux Warning Refactor

Layer / File(s) Summary
SELinux label parsing utility
deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go, selinux_label_test.go
New package introduces ParseSELinuxLabel to split SELinux label strings into fixed four-element arrays of user, role, type, and level components with comprehensive test coverage.
SELinux translator pre-parsing
deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/translator/selinux_translator.go
Translator adds ConflictsParsed method to compare pre-parsed label components; public Conflicts method now delegates to it after parsing both labels.
Volume cache core implementation
deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache.go
VolumeCache interface replaces channel-based SendConflicts with slice-returning GetConflicts; implementation adds reverse pod-to-volumes index and per-volume conflict cache; podInfo stores parsed SELinux parts; AddVolume and DeletePod maintain both indexes and conflict cache.
Volume cache test suite
deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
Refactors conflict tests to use GetConflicts, adds reverse-index consistency verification, renames and extends conflict test, adds multi-volume conflict and delete-pod cleanup scenario tests.
Metrics and test fakes
deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/metrics.go, selinux_warning_controller_test.go
Metrics collection iterates GetConflicts() directly; test fake implements new GetConflicts API.

Etcd Monitor Cache Optimization

Layer / File(s) Summary
Monitor cache lifecycle
deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go
Replaces per-invocation monitor getter with monitorCache that lazily initializes and caches monitors under RWMutex; monitors are closed asynchronously when server drains via DrainedNotify channel instead of on getter failure.
Monitor cache testing
deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd_test.go
Adds TestMonitorCache validating cached reuse, error handling post-close, and concurrent access safety; introduces fakeMonitor with atomic Close tracking.
Metrics cleanup
deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go
Removes explicit m.Close() from metrics collection after context cancellation; lifecycle cleanup now delegated to monitor cache.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • openshift/microshift#6812: Updates the same nightly release pinning inputs, including OCP_VERSION, multus/OLM/SR-IOV component digests, and release metadata across both architectures.

Suggested reviewers

  • ggiguash
  • vanhalenar
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: a rebase to version 5.0 and updates to OVN-K initialization flags, which are prominent in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests found in PR. All test files use standard Go testing with stable test names like TestVolumeCache_MultiVolumeConflicts and TestMonitorCache.
Test Structure And Quality ✅ Passed The PR does not contain Ginkgo test code. All modified test files use standard Go testing with testing.T; check is not applicable.
Microshift Test Compatibility ✅ Passed PR adds no new Ginkgo e2e tests in MicroShift's main codebase—only unit test updates in vendored upstream kubernetes deps/, making the check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR contains no new Ginkgo e2e tests. All test changes are standard Go unit tests (func Test* pattern) in deps/ directory, not e2e tests requiring SNO compatibility checks.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces no new topology-incompatible scheduling constraints; OVN manifest changes only update init flags, leaving pre-existing scheduling rules unchanged.
Ote Binary Stdout Contract ✅ Passed PR is a rebase/dependency update with no OTE test binaries, test entry points, or process-level stdout writes. The OTE stdout contract check is not applicable here.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR; only unit tests in dependency updates use standard Go testing, making the check inapplicable.
No-Weak-Crypto ✅ Passed PR contains no weak cryptography algorithms, custom crypto implementations, or unsafe secret comparisons. Changes involve SELinux label parsing, configuration updates, and infrastructure refactoring.
Container-Privileges ✅ Passed No new privileged container configurations were added or modified; pre-existing privilege settings in OVN daemonsets remain unchanged.
No-Sensitive-Data-In-Logs ✅ Passed No passwords, tokens, API keys, PII, or session IDs are exposed in logging. Node names logged in OVN daemonsets are standard infrastructure identifiers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2."
level=warning msg="Suggested new configuration:\nlinters:\n enable:\n - gomodguard_v2\n"
level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/apparentlymart/go-cidr@v1.1.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/go-systemd@v0.0.0-20190321100706-95778dfbb74e: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/miekg/dns@v1.1.63: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/openshift/api@v0.0.0-20260511191110-9b69e5fa27e9: is

... [truncated 31032 characters] ...

elet: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-cli-plugin: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-controller: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands and usage tips.

@pmtk

pmtk commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

/verified by ci

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 8, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@pmtk: This PR has been marked as verified by ci.

Details

In response to this:

/verified by ci

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
assets/components/ovn/single-node/master/daemonset.yaml (1)

338-338: ⚡ Quick win

Update echo statement to reflect actual flags.

The log message still references --init-master ${K8S_NODE}, but the command now uses --init-cluster-manager and --init-ovnkube-controller.

📝 Suggested fix
-          echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE} --init-node ${K8S_NODE}"
+          echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE} --init-node ${K8S_NODE}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/components/ovn/single-node/master/daemonset.yaml` at line 338, The
echo log still says "--init-master ${K8S_NODE}" but the actual command uses
"--init-cluster-manager" and "--init-ovnkube-controller"; update the echo string
in the startup line (the echo call that currently prints ovnkube-master - start
ovnkube --init-master ${K8S_NODE} --init-node ${K8S_NODE}) so it accurately
reflects the real flags used (e.g., mention --init-cluster-manager and
--init-ovnkube-controller with the appropriate ${K8S_NODE} placeholders) to keep
logs consistent with the invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/components/ovn/multi-node/master/daemonset.yaml`:
- Around line 377-380: The startup echo message is stale and references
--init-master while the actual command invokes /usr/bin/ovnkube with
--init-cluster-manager and --init-ovnkube-controller; update the echo in the
ovnkube startup block (the line that prints "ovnkube-master - start ovnkube
--init-master ${K8S_NODE}") to accurately list the flags being passed (e.g.,
--init-cluster-manager and --init-ovnkube-controller) and include the
${K8S_NODE} variable so logs reflect the real command invoked by
/usr/bin/ovnkube.

In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`:
- Around line 264-272: newMonitorCache/monitorCache currently allows creating
monitors even if the cache has already begun draining, causing monitors to be
returned and immediately closed which leads callers to treat them as live and
report failures; modify newMonitorCache and the monitor creation path (methods
initialize(), get(), and close()) so that before allocating/returning any
monitor it checks whether the cache is already drained (via DrainedNotify() or
the same drained flag used in close()) and if so returns an error or a
closed-indicator instead of creating a monitor; ensure initialize() does not arm
close() or spawn monitors when DrainedNotify() is closed and update get() to
consult the drained state atomically to avoid races so callers never receive a
monitor that will be closed immediately.
- Around line 308-310: The code currently ignores errors from Monitor.Close()
when iterating over the monitors slice in both the rollback and shutdown paths;
change those sites (the loops handling monitors and the corresponding rollback
block) to collect and handle Close() errors: in the rollback path (where the
function returns) aggregate the Close() errors (using a multi-error pattern or
fmt.Errorf chaining) and return a combined error instead of swallowing it, and
in the shutdown path log each Close() failure (use the package logger such as
klog.Errorf or the server's logger variable) while continuing to attempt closing
remaining monitors; reference the monitors variable and the Monitor.Close() call
to locate and update both occurrences mentioned (around the current loops at the
two locations).

---

Nitpick comments:
In `@assets/components/ovn/single-node/master/daemonset.yaml`:
- Line 338: The echo log still says "--init-master ${K8S_NODE}" but the actual
command uses "--init-cluster-manager" and "--init-ovnkube-controller"; update
the echo string in the startup line (the echo call that currently prints
ovnkube-master - start ovnkube --init-master ${K8S_NODE} --init-node
${K8S_NODE}) so it accurately reflects the real flags used (e.g., mention
--init-cluster-manager and --init-ovnkube-controller with the appropriate
${K8S_NODE} placeholders) to keep logs consistent with the invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 66046a82-9303-4863-a2cb-b4afb24a7768

📥 Commits

Reviewing files that changed from the base of the PR and between 524e3da and 69f2c5a.

⛔ Files ignored due to path filters (7)
  • vendor/k8s.io/apiserver/pkg/server/options/etcd.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/metrics.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/translator/selinux_translator.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (35)
  • Makefile.kube_git.var
  • Makefile.version.aarch64.var
  • Makefile.version.x86_64.var
  • assets/components/multus/kustomization.aarch64.yaml
  • assets/components/multus/kustomization.x86_64.yaml
  • assets/components/multus/release-multus-aarch64.json
  • assets/components/multus/release-multus-x86_64.json
  • assets/components/ovn/multi-node/master/daemonset.yaml
  • assets/components/ovn/single-node/master/daemonset.yaml
  • assets/optional/operator-lifecycle-manager/kustomization.aarch64.yaml
  • assets/optional/operator-lifecycle-manager/kustomization.x86_64.yaml
  • assets/optional/operator-lifecycle-manager/release-olm-aarch64.json
  • assets/optional/operator-lifecycle-manager/release-olm-x86_64.json
  • assets/optional/sriov/kustomization.aarch64.yaml
  • assets/optional/sriov/kustomization.x86_64.yaml
  • assets/optional/sriov/release-sriov-aarch64.json
  • assets/optional/sriov/release-sriov-x86_64.json
  • assets/release/release-aarch64.json
  • assets/release/release-x86_64.json
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label_test.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/metrics.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/selinux_warning_controller_test.go
  • deps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/translator/selinux_translator.go
  • deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go
  • deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd_test.go
  • deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go
  • packaging/crio.conf.d/10-microshift_amd64.conf
  • packaging/crio.conf.d/10-microshift_arm64.conf
  • scripts/auto-rebase/changelog.txt
  • scripts/auto-rebase/commits.txt
  • scripts/auto-rebase/last_rebase.sh
  • scripts/devenv-builder/configure-vm.sh
💤 Files with no reviewable changes (1)
  • deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go

Comment on lines 377 to +380
echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}"
exec /usr/bin/ovnkube \
--init-master "${K8S_NODE}" \
--init-cluster-manager "${K8S_NODE}" \
--init-ovnkube-controller "${K8S_NODE}" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update startup log to match actual OVN flags.

The log message still prints --init-master while the command now uses --init-cluster-manager and --init-ovnkube-controller. This can mislead debugging.

Suggested diff
-          echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}"
+          echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}"
exec /usr/bin/ovnkube \
--init-master "${K8S_NODE}" \
--init-cluster-manager "${K8S_NODE}" \
--init-ovnkube-controller "${K8S_NODE}" \
echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE}"
exec /usr/bin/ovnkube \
--init-cluster-manager "${K8S_NODE}" \
--init-ovnkube-controller "${K8S_NODE}" \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/components/ovn/multi-node/master/daemonset.yaml` around lines 377 -
380, The startup echo message is stale and references --init-master while the
actual command invokes /usr/bin/ovnkube with --init-cluster-manager and
--init-ovnkube-controller; update the echo in the ovnkube startup block (the
line that prints "ovnkube-master - start ovnkube --init-master ${K8S_NODE}") to
accurately list the flags being passed (e.g., --init-cluster-manager and
--init-ovnkube-controller) and include the ${K8S_NODE} variable so logs reflect
the real command invoked by /usr/bin/ovnkube.

Comment on lines +264 to +272
func newMonitorCache(factory serverstorage.StorageFactory, stopCh <-chan struct{}) (*monitorCache, error) {
if stopCh == nil {
return nil, fmt.Errorf("stopCh is required for monitor cache cleanup")
}
cache := &monitorCache{
factory: factory,
stopCh: stopCh,
}
return cache, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't initialize monitors after drain has already started.

close() is only armed after the first successful initialize(). If c.DrainedNotify() is already closed on the first get(), this still creates monitors, returns them, and then closes them immediately in the background. Downstream metrics collection treats any successful getter result as live monitors and will call Monitor(ctx) on them, so this can turn shutdown into spurious monitor failures instead of a clean "monitor cache is closed" path.

Suggested fix
 func newMonitorCache(factory serverstorage.StorageFactory, stopCh <-chan struct{}) (*monitorCache, error) {
 	if stopCh == nil {
 		return nil, fmt.Errorf("stopCh is required for monitor cache cleanup")
 	}
 	cache := &monitorCache{
 		factory: factory,
 		stopCh:  stopCh,
 	}
+	go func() {
+		<-stopCh
+		cache.close()
+	}()
 	return cache, nil
 }
 
 func (c *monitorCache) initialize() ([]metrics.Monitor, error) {
 	c.mu.Lock()
 	defer c.mu.Unlock()
+
+	select {
+	case <-c.stopCh:
+		c.closed = true
+		return nil, fmt.Errorf("monitor cache is closed")
+	default:
+	}
 
 	if c.closed {
 		return nil, fmt.Errorf("monitor cache is closed")
 	}
 	if c.monitors != nil {
@@
 	}
 	c.monitors = monitors
-
-	go func() {
-		<-c.stopCh
-		c.close()
-	}()
 
 	return c.monitors, nil
 }

Also applies to: 293-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`
around lines 264 - 272, newMonitorCache/monitorCache currently allows creating
monitors even if the cache has already begun draining, causing monitors to be
returned and immediately closed which leads callers to treat them as live and
report failures; modify newMonitorCache and the monitor creation path (methods
initialize(), get(), and close()) so that before allocating/returning any
monitor it checks whether the cache is already drained (via DrainedNotify() or
the same drained flag used in close()) and if so returns an error or a
closed-indicator instead of creating a monitor; ensure initialize() does not arm
close() or spawn monitors when DrainedNotify() is closed and update get() to
consult the drained state atomically to avoid races so callers never receive a
monitor that will be closed immediately.

Comment on lines +308 to 310
for _, already := range monitors {
already.Close() //nolint:errcheck
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle Close() failures instead of suppressing them.

Both the rollback path and the shutdown path drop Monitor.Close() errors with //nolint:errcheck. That makes partial cleanup and leaked monitor clients invisible right where this change is taking ownership of monitor lifetime. Please aggregate/return rollback errors and at least log shutdown close failures.

As per coding guidelines, **/*.go: Never ignore error returns.

Also applies to: 332-334

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`
around lines 308 - 310, The code currently ignores errors from Monitor.Close()
when iterating over the monitors slice in both the rollback and shutdown paths;
change those sites (the loops handling monitors and the corresponding rollback
block) to collect and handle Close() errors: in the rollback path (where the
function returns) aggregate the Close() errors (using a multi-error pattern or
fmt.Errorf chaining) and return a combined error instead of swallowing it, and
in the shutdown path log each Close() failure (use the package logger such as
klog.Errorf or the server's logger variable) while continuing to attempt closing
remaining monitors; reference the monitors variable and the Monitor.Close() call
to locate and update both occurrences mentioned (around the current loops at the
two locations).

Source: Coding guidelines

@ggiguash

ggiguash commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2026
@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 07d0112 into openshift:main Jun 8, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants