Skip to content

Implement kexec reboot support for SSH mode#20

Open
aleksandrov-denis wants to merge 6 commits intorhkdump:mainfrom
aleksandrov-denis:kexec-reboot-feature
Open

Implement kexec reboot support for SSH mode#20
aleksandrov-denis wants to merge 6 commits intorhkdump:mainfrom
aleksandrov-denis:kexec-reboot-feature

Conversation

@aleksandrov-denis
Copy link
Copy Markdown

@aleksandrov-denis aleksandrov-denis commented Apr 9, 2026

Summary

REBOOT_STRATEGY=kexec was documented and configurable but not implemented — do_kexec_reboot() was a
stub that always fell back to a full system reboot. This PR implements it for SSH mode, reducing
per-iteration reboot time from ~60s to ~18s.

Changes

Implement kexec support for SSH mode

Adds two helpers to lib.sh:

  • kexec_load_kernel(kernel_release) — validates that the vmlinuz and initramfs files exist on the test
    host, reads the current kernel command line from /proc/cmdline, and loads the new kernel into memory
    with kexec -l
  • kab_kexec() — executes the loaded kernel via the existing reboot_and_wait mechanism

Updates do_kexec_reboot() in reboot_handler.sh:

  • In SSH mode: loads the kernel with kexec_load_kernel, falls back to a full reboot if loading fails,
    otherwise executes with kab_kexec
  • In local/CRIU mode: preserves the existing fallback to full reboot, since kexec bypasses the normal
    boot sequence that the CRIU daemon relies on to restore the bisect process

install_from_rpm: redirect dnf output on the test host

Fixes a bug where dnf install output in install_from_rpm was redirected to /var/log/install.log via a
local shell redirect, which fails when the controller runs as a non-root user. The redirect is now
passed as an argument to run_cmd so it executes on the test host — consistent with how install_from_git
handles its build log.

Testing

Tested end-to-end on RHEL 9.8 in SSH mode (INSTALL_STRATEGY=rpm, TEST_STRATEGY=simple,
REBOOT_STRATEGY=kexec) bisecting across three CentOS Stream 9 kernel versions (5.14.0-687/688/689):

  • kexec loaded the target kernel and rebooted in ~18s (vs ~60s for full reboot)
  • Test ran correctly on the rebooted kernel
  • Bisect correctly identified the first bad commit

Resolves #7

@coiby coiby force-pushed the main branch 4 times, most recently from 4ed3d67 to 6d12ccf Compare April 16, 2026 11:30
Comment thread handlers/reboot_handler.sh Outdated
log "Strategy: Performing kexec reboot (fast reboot)"
if ! kexec_load_kernel "$TESTED_KERNEL"; then
log "Falling back to full reboot"
kab_reboot
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to extend criu-daemon.sh to process a kexec reboot request, I guess CRIU should work as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fedora doesn't install kexec-tools by default. So kexec-tools needs to be installed as a dependency.

Comment thread lib.sh Outdated
return 1
fi

cmdline=$(run_cmd cat /proc/cmdline)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/proc/cmdline contains BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.17.1-300.fc43.x86_64. I'm not sure how it will affect kexec rebooting. I think kexec --reuse-cmdline should be more robust.

@coiby
Copy link
Copy Markdown
Member

coiby commented Apr 16, 2026

Hi @aleksandrov-denis

Thanks for opening this PR! Good to know the rebooting speed has greatly improved. Besides suggestions in inline code comment, you can change one of the integration tests to adopt the kexec reboot strategy so this feature will be covered.

Btw, you may want to link this PR to #7.

@gemini-cli /review

Denis Aleksandrov and others added 5 commits April 23, 2026 18:36
do_kexec_reboot() was a stub that always fell back to a full reboot.
Implement it properly for SSH mode, where CRIU is not involved and
kexec is safe to use.

Add kexec_load_kernel() to lib.sh which validates that the kernel and
initramfs files exist, reads the current command line from
/proc/cmdline, and loads the new kernel into memory with kexec -l.
Add kab_kexec() which executes the loaded kernel via reboot_and_wait.

In local/CRIU mode, kexec bypasses the normal boot sequence that CRIU
relies on to restart the daemon, so the existing fallback to full
reboot is preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The dnf install output was being redirected to /var/log/install.log on
the controller machine (local shell redirect), which fails when the
controller runs as a non-root user. Pass the redirect as an argument to
run_cmd so it is evaluated on the test host, consistent with how
install_from_git handles its build log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two improvements based on review feedback:

Use kexec --reuse-cmdline instead of reading /proc/cmdline manually.
This avoids passing bootloader-specific parameters like BOOT_IMAGE= that
kexec does not need and which can vary across boot environments.

Extend kexec support to local/CRIU mode. kexec still performs a full
Linux boot, so the CRIU daemon restarts via cron and can restore the
bisect process normally. kab_kexec() now signals the daemon with a
"kexec" checkpoint type, which triggers kexec -e after checkpointing.
criu-daemon.sh is updated to recognise and allow kexec -e commands.
do_kexec_reboot() no longer special-cases CRIU mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set REBOOT_STRATEGY="kexec" in the SSH integration test so that the
kexec code path gets exercised on every test run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kexec-tools was only installed as part of kdump setup, which only runs
when TEST_STRATEGY="panic". A user running REBOOT_STRATEGY="kexec" with
TEST_STRATEGY="simple" would get a missing kexec binary.

Install kexec-tools at the start of setup_kdump() whenever
REBOOT_STRATEGY="kexec" is set, independent of the test strategy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aleksandrov-denis
Copy link
Copy Markdown
Author

All comments should be resolved with the latest changes, let me know what you think :))

Set REBOOT_STRATEGY="kexec" so the CRIU integration test exercises the
kexec+CRIU code path: the kernel is loaded via kexec -l and the CRIU
daemon checkpoints kab before executing kexec -e, then restores it
after the system comes back up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Kexec Support

2 participants