Skip to content

allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106

Open
wenningerk wants to merge 2 commits into
ClusterLabs:mainfrom
wenningerk:modify_fence_scsi_devices
Open

allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106
wenningerk wants to merge 2 commits into
ClusterLabs:mainfrom
wenningerk:modify_fence_scsi_devices

Conversation

@wenningerk
Copy link
Copy Markdown
Contributor

@wenningerk wenningerk commented May 7, 2026

Investigating why with Pacemaker 2.1.7 the high-level-tooling approach to prevent a resource restart as a result of a parameter change in the config by in parallel changing the digests in the status section wouldn't work anymore ...
Was barking at the wrong tree investigating the commit that changes how digests are calculated.
In parallel there was as well a commit in Pacemaker 2.1.7 ( ec65417) that removes the special handling of replacing the full cib and now triggers an election + join refresh whenever an “unsafe” client modifies the status section, overwriting the manually pushed digest before the scheduler can see it.
Added a ‘--commit-status’ option to crm_shadow that prevents this behavior (client will claim to be crm_shadow_status and this is added to the safe clients list).

@wenningerk wenningerk force-pushed the modify_fence_scsi_devices branch from 62b194c to a09fa0c Compare May 20, 2026 14:37
@wenningerk wenningerk changed the title [WIP] don't trigger restart of fence_scsi when devices are modified and start digest is adapted accordingly allow the status-section to be updated on a crm_shadow commit with an explicit switch May 20, 2026
Copy link
Copy Markdown
Contributor

@nrwahl2 nrwahl2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few comments. Two of them are requests to improve maintainability. The third comment questions this approach more broadly.

I'll try to give this issue some further thought tonight as well.

Comment thread cts/cts-cli.in
"crm_shadow --diff",
expected_rc=ExitStatus.ERROR),
], cib_gen=partial(copy_existing_cib, f"{cts_cli_data}/crm_mon.xml")),
TestGroup([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these tests before the fix commit, with expected_rc set to an error? Then update the expected_rc and the output in the fix commit

"crm_attribute",
"crm_node",
"crm_resource",
"crm_shadow_status",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the commit message to reference RHEL-70283, and to explain why we're doing this (since future contributors may not be able to access this RHEL Jira issue).

We've encountered numerous issues when commit messages don't explain the reasoning for the changes in the commit. We often end up, much later, removing or changing behavior that we depend on. At best, we spend a long time trying to figure out why things are the way they are, and we're often not confident in the conclusions we reach.

Comment thread tools/crm_shadow.c
return;
}

if (options.update_status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this would fix the issue. Based on RHEL-70283, it appears that the issue is that adding devices to a fence_scsi or fence_mpath resource using pcs stonith update causes resource restarts. However, pcs doesn't use crm_shadow at all. So updating the crm_shadow CLI tool doesn't seem like it would affect the behavior of pcs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants