USHIFT-7176: Rebase 5.0 + updated OVN-K init flags#6829
Conversation
|
@pmtk: This pull request explicitly references no jira issue. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Obsoletes #6817 |
|
@pmtk: This pull request references USHIFT-7176 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR updates MicroShift to a newer OCP nightly build with three major change areas: refreshing container image digests and version pins across components, changing OVN master initialization flags, refactoring the volume cache from channel-based to slice-based conflict reporting, and optimizing etcd monitor lifecycle management with caching. ChangesNightly Release & Component Updates
Volume Cache & SELinux Warning Refactor
Etcd Monitor Cache Optimization
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2." ... [truncated 31032 characters] ... elet: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-cli-plugin: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-controller: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
/verified by ci |
|
@pmtk: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
assets/components/ovn/single-node/master/daemonset.yaml (1)
338-338: ⚡ Quick winUpdate echo statement to reflect actual flags.
The log message still references
--init-master ${K8S_NODE}, but the command now uses--init-cluster-managerand--init-ovnkube-controller.📝 Suggested fix
- echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE} --init-node ${K8S_NODE}" + echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE} --init-node ${K8S_NODE}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/components/ovn/single-node/master/daemonset.yaml` at line 338, The echo log still says "--init-master ${K8S_NODE}" but the actual command uses "--init-cluster-manager" and "--init-ovnkube-controller"; update the echo string in the startup line (the echo call that currently prints ovnkube-master - start ovnkube --init-master ${K8S_NODE} --init-node ${K8S_NODE}) so it accurately reflects the real flags used (e.g., mention --init-cluster-manager and --init-ovnkube-controller with the appropriate ${K8S_NODE} placeholders) to keep logs consistent with the invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@assets/components/ovn/multi-node/master/daemonset.yaml`:
- Around line 377-380: The startup echo message is stale and references
--init-master while the actual command invokes /usr/bin/ovnkube with
--init-cluster-manager and --init-ovnkube-controller; update the echo in the
ovnkube startup block (the line that prints "ovnkube-master - start ovnkube
--init-master ${K8S_NODE}") to accurately list the flags being passed (e.g.,
--init-cluster-manager and --init-ovnkube-controller) and include the
${K8S_NODE} variable so logs reflect the real command invoked by
/usr/bin/ovnkube.
In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`:
- Around line 264-272: newMonitorCache/monitorCache currently allows creating
monitors even if the cache has already begun draining, causing monitors to be
returned and immediately closed which leads callers to treat them as live and
report failures; modify newMonitorCache and the monitor creation path (methods
initialize(), get(), and close()) so that before allocating/returning any
monitor it checks whether the cache is already drained (via DrainedNotify() or
the same drained flag used in close()) and if so returns an error or a
closed-indicator instead of creating a monitor; ensure initialize() does not arm
close() or spawn monitors when DrainedNotify() is closed and update get() to
consult the drained state atomically to avoid races so callers never receive a
monitor that will be closed immediately.
- Around line 308-310: The code currently ignores errors from Monitor.Close()
when iterating over the monitors slice in both the rollback and shutdown paths;
change those sites (the loops handling monitors and the corresponding rollback
block) to collect and handle Close() errors: in the rollback path (where the
function returns) aggregate the Close() errors (using a multi-error pattern or
fmt.Errorf chaining) and return a combined error instead of swallowing it, and
in the shutdown path log each Close() failure (use the package logger such as
klog.Errorf or the server's logger variable) while continuing to attempt closing
remaining monitors; reference the monitors variable and the Monitor.Close() call
to locate and update both occurrences mentioned (around the current loops at the
two locations).
---
Nitpick comments:
In `@assets/components/ovn/single-node/master/daemonset.yaml`:
- Line 338: The echo log still says "--init-master ${K8S_NODE}" but the actual
command uses "--init-cluster-manager" and "--init-ovnkube-controller"; update
the echo string in the startup line (the echo call that currently prints
ovnkube-master - start ovnkube --init-master ${K8S_NODE} --init-node
${K8S_NODE}) so it accurately reflects the real flags used (e.g., mention
--init-cluster-manager and --init-ovnkube-controller with the appropriate
${K8S_NODE} placeholders) to keep logs consistent with the invocation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 66046a82-9303-4863-a2cb-b4afb24a7768
⛔ Files ignored due to path filters (7)
vendor/k8s.io/apiserver/pkg/server/options/etcd.gois excluded by!**/vendor/**,!vendor/**vendor/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.gois excluded by!**/vendor/**,!vendor/**vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache.gois excluded by!**/vendor/**,!vendor/**vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.gois excluded by!**/vendor/**,!vendor/**vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/metrics.gois excluded by!**/vendor/**,!vendor/**vendor/k8s.io/kubernetes/pkg/controller/volume/selinuxwarning/translator/selinux_translator.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (35)
Makefile.kube_git.varMakefile.version.aarch64.varMakefile.version.x86_64.varassets/components/multus/kustomization.aarch64.yamlassets/components/multus/kustomization.x86_64.yamlassets/components/multus/release-multus-aarch64.jsonassets/components/multus/release-multus-x86_64.jsonassets/components/ovn/multi-node/master/daemonset.yamlassets/components/ovn/single-node/master/daemonset.yamlassets/optional/operator-lifecycle-manager/kustomization.aarch64.yamlassets/optional/operator-lifecycle-manager/kustomization.x86_64.yamlassets/optional/operator-lifecycle-manager/release-olm-aarch64.jsonassets/optional/operator-lifecycle-manager/release-olm-x86_64.jsonassets/optional/sriov/kustomization.aarch64.yamlassets/optional/sriov/kustomization.x86_64.yamlassets/optional/sriov/release-sriov-aarch64.jsonassets/optional/sriov/release-sriov-x86_64.jsonassets/release/release-aarch64.jsonassets/release/release-x86_64.jsondeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/cache/volumecache_test.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/internal/parse/selinux_label_test.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/metrics.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/selinux_warning_controller_test.godeps/github.com/openshift/kubernetes/pkg/controller/volume/selinuxwarning/translator/selinux_translator.godeps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.godeps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd_test.godeps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.gopackaging/crio.conf.d/10-microshift_amd64.confpackaging/crio.conf.d/10-microshift_arm64.confscripts/auto-rebase/changelog.txtscripts/auto-rebase/commits.txtscripts/auto-rebase/last_rebase.shscripts/devenv-builder/configure-vm.sh
💤 Files with no reviewable changes (1)
- deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go
| echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}" | ||
| exec /usr/bin/ovnkube \ | ||
| --init-master "${K8S_NODE}" \ | ||
| --init-cluster-manager "${K8S_NODE}" \ | ||
| --init-ovnkube-controller "${K8S_NODE}" \ |
There was a problem hiding this comment.
Update startup log to match actual OVN flags.
The log message still prints --init-master while the command now uses --init-cluster-manager and --init-ovnkube-controller. This can mislead debugging.
Suggested diff
- echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}"
+ echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-master ${K8S_NODE}" | |
| exec /usr/bin/ovnkube \ | |
| --init-master "${K8S_NODE}" \ | |
| --init-cluster-manager "${K8S_NODE}" \ | |
| --init-ovnkube-controller "${K8S_NODE}" \ | |
| echo "I$(date "+%m%d %H:%M:%S.%N") - ovnkube-master - start ovnkube --init-cluster-manager ${K8S_NODE} --init-ovnkube-controller ${K8S_NODE}" | |
| exec /usr/bin/ovnkube \ | |
| --init-cluster-manager "${K8S_NODE}" \ | |
| --init-ovnkube-controller "${K8S_NODE}" \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/components/ovn/multi-node/master/daemonset.yaml` around lines 377 -
380, The startup echo message is stale and references --init-master while the
actual command invokes /usr/bin/ovnkube with --init-cluster-manager and
--init-ovnkube-controller; update the echo in the ovnkube startup block (the
line that prints "ovnkube-master - start ovnkube --init-master ${K8S_NODE}") to
accurately list the flags being passed (e.g., --init-cluster-manager and
--init-ovnkube-controller) and include the ${K8S_NODE} variable so logs reflect
the real command invoked by /usr/bin/ovnkube.
| func newMonitorCache(factory serverstorage.StorageFactory, stopCh <-chan struct{}) (*monitorCache, error) { | ||
| if stopCh == nil { | ||
| return nil, fmt.Errorf("stopCh is required for monitor cache cleanup") | ||
| } | ||
| cache := &monitorCache{ | ||
| factory: factory, | ||
| stopCh: stopCh, | ||
| } | ||
| return cache, nil |
There was a problem hiding this comment.
Don't initialize monitors after drain has already started.
close() is only armed after the first successful initialize(). If c.DrainedNotify() is already closed on the first get(), this still creates monitors, returns them, and then closes them immediately in the background. Downstream metrics collection treats any successful getter result as live monitors and will call Monitor(ctx) on them, so this can turn shutdown into spurious monitor failures instead of a clean "monitor cache is closed" path.
Suggested fix
func newMonitorCache(factory serverstorage.StorageFactory, stopCh <-chan struct{}) (*monitorCache, error) {
if stopCh == nil {
return nil, fmt.Errorf("stopCh is required for monitor cache cleanup")
}
cache := &monitorCache{
factory: factory,
stopCh: stopCh,
}
+ go func() {
+ <-stopCh
+ cache.close()
+ }()
return cache, nil
}
func (c *monitorCache) initialize() ([]metrics.Monitor, error) {
c.mu.Lock()
defer c.mu.Unlock()
+
+ select {
+ case <-c.stopCh:
+ c.closed = true
+ return nil, fmt.Errorf("monitor cache is closed")
+ default:
+ }
if c.closed {
return nil, fmt.Errorf("monitor cache is closed")
}
if c.monitors != nil {
@@
}
c.monitors = monitors
-
- go func() {
- <-c.stopCh
- c.close()
- }()
return c.monitors, nil
}Also applies to: 293-320
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`
around lines 264 - 272, newMonitorCache/monitorCache currently allows creating
monitors even if the cache has already begun draining, causing monitors to be
returned and immediately closed which leads callers to treat them as live and
report failures; modify newMonitorCache and the monitor creation path (methods
initialize(), get(), and close()) so that before allocating/returning any
monitor it checks whether the cache is already drained (via DrainedNotify() or
the same drained flag used in close()) and if so returns an error or a
closed-indicator instead of creating a monitor; ensure initialize() does not arm
close() or spawn monitors when DrainedNotify() is closed and update get() to
consult the drained state atomically to avoid races so callers never receive a
monitor that will be closed immediately.
| for _, already := range monitors { | ||
| already.Close() //nolint:errcheck | ||
| } |
There was a problem hiding this comment.
Handle Close() failures instead of suppressing them.
Both the rollback path and the shutdown path drop Monitor.Close() errors with //nolint:errcheck. That makes partial cleanup and leaked monitor clients invisible right where this change is taking ownership of monitor lifetime. Please aggregate/return rollback errors and at least log shutdown close failures.
As per coding guidelines, **/*.go: Never ignore error returns.
Also applies to: 332-334
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@deps/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go`
around lines 308 - 310, The code currently ignores errors from Monitor.Close()
when iterating over the monitors slice in both the rollback and shutdown paths;
change those sites (the loops handling monitors and the corresponding rollback
block) to collect and handle Close() errors: in the rollback path (where the
function returns) aggregate the Close() errors (using a multi-error pattern or
fmt.Errorf chaining) and return a combined error instead of swallowing it, and
in the shutdown path log each Close() failure (use the package logger such as
klog.Errorf or the server's logger variable) while continuing to attempt closing
remaining monitors; reference the monitors variable and the Monitor.Close() call
to locate and update both occurrences mentioned (around the current loops at the
two locations).
Source: Coding guidelines
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pmtk: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
Chores
Bug Fixes
Refactor