Skip to content

fuzzing: Improve testcase isolation by draining the IPC staging queue#10839

Open
tmleman wants to merge 2 commits into
thesofproject:mainfrom
tmleman:topic/upstream/pr/fuzzing/enhancement/part3
Open

fuzzing: Improve testcase isolation by draining the IPC staging queue#10839
tmleman wants to merge 2 commits into
thesofproject:mainfrom
tmleman:topic/upstream/pr/fuzzing/enhancement/part3

Conversation

@tmleman
Copy link
Copy Markdown
Contributor

@tmleman tmleman commented Jun 3, 2026

Improve fuzz testcase isolation by draining the IPC staging queue between libFuzzer calls and aborting stale state when the tick budget is exhausted, fixing non-reproducible crashes caused by inter-testcase state leakage.

Copilot AI review requested due to automatic review settings June 3, 2026 15:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves libFuzzer testcase isolation for the Zephyr POSIX simulator fuzz harness by ensuring staged IPC input is drained between libFuzzer calls and by explicitly aborting leftover staged state when the simulator tick budget is exhausted, reducing non-reproducible crashes due to cross-testcase state leakage.

Changes:

  • Added IPC-layer helpers to reset/observe/abort staged fuzz input state between testcases.
  • Updated LLVMFuzzerTestOneInput() to run the simulator in small time quanta, exiting early once staged input is drained, and aborting pending state when the time budget is exhausted.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/platform/posix/ipc.c Adds testcase-isolation helper APIs that reset/observe/abort the IPC staging buffer state used by the fuzz interrupt handler.
src/platform/posix/fuzz.c Implements a bounded “drain-or-abort” loop around nsi_exec_for() and calls the new IPC-layer helpers to isolate testcases.

Comment thread src/platform/posix/ipc.c
Comment on lines +38 to +58
/*
* Testcase-isolation helpers used by the libFuzzer entry point in
* fuzz.c. They keep ownership of the cross-call state in one module
* so a new testcase never observes leftovers from a previous one that
* failed to drain inside the simulator tick budget.
*/
void posix_fuzz_case_begin(void)
{
fuzz_in_sz = 0;
}

bool posix_fuzz_case_pending(void)
{
return posix_fuzz_sz != 0 || fuzz_in_sz != 0;
}

void posix_fuzz_case_abort(void)
{
posix_fuzz_sz = 0;
fuzz_in_sz = 0;
}
tmleman added 2 commits June 3, 2026 19:08
The libFuzzer entry point in fuzz.c stages each testcase by writing
posix_fuzz_buf/sz and raising the fuzz IRQ; fuzz_isr() then drains
those bytes into the static fuzz_in[] queue and feeds them into the
IPC layer one message at a time. Two pieces of state therefore
survive across LLVMFuzzerTestOneInput() calls:

  * `posix_fuzz_sz`     - the raw input length still to consume,
  * `fuzz_in[] / _sz`   - the per-call staging queue.

The fuzzer harness has no way to inspect either of them today, which
makes it impossible to tell whether a previous testcase fully
drained before the next one begins. That is the root cause of the
"not reproducible" crashes documented in
FUZZER_ISOLATION_RESEARCH.md.

Introduce three small helpers, kept in the module that owns the
state, with no callers yet:

  posix_fuzz_case_begin()   - drop the staging queue at the start of
                              a new testcase,
  posix_fuzz_case_pending() - true while either buffer still has
                              bytes to deliver,
  posix_fuzz_case_abort()   - wipe both buffers (used when a case
                              exceeds the simulator tick budget).

A follow-up commit wires these into LLVMFuzzerTestOneInput(). This
commit is a pure code-addition refactor: no callers, no behaviour
change, the build still emits the same object code for the existing
entry points.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
The libFuzzer harness used to stage the testcase bytes, raise the
fuzz IRQ, and then unconditionally run the native_sim scheduler for
CONFIG_ZEPHYR_POSIX_FUZZ_TICKS ticks before returning. That has two
problems for reproducibility:

  * If the OS finishes draining the IPC much faster than the tick
    budget (the common case), we still burn the full budget, which
    slows exec/s without buying any coverage.
  * If the OS does NOT finish within the budget (deep handlers, long
    pipeline walks, large payloads), the staged input buffer plus
    the per-call fuzz_in[] queue carry over into the next testcase.
    That leaks state across cases and is the root cause of crashes
    that disappear when replayed individually.

Split the budget into POSIX_FUZZ_DRAIN_QUANTA (=8) quanta and after
each one ask the IPC layer whether anything is still pending; return
as soon as the queue is empty, otherwise run the abort hook to wipe
both the raw fuzz buffer and the staged IPC payload before the next
call. Together with the hooks added in the previous commit this
guarantees that LLVMFuzzerTestOneInput observes a clean staging
state on entry regardless of what the previous case did.

No protocol or coverage change is intended; the goal is reproducible
crashes and slightly higher throughput on short inputs.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants