fuzzing: Improve testcase isolation by draining the IPC staging queue by tmleman · Pull Request #10839 · thesofproject/sof

tmleman · 2026-06-03T15:48:23Z

Improve fuzz testcase isolation by draining the IPC staging queue between libFuzzer calls and aborting stale state when the tick budget is exhausted, fixing non-reproducible crashes caused by inter-testcase state leakage.

Copilot

Pull request overview

This PR improves libFuzzer testcase isolation for the Zephyr POSIX simulator fuzz harness by ensuring staged IPC input is drained between libFuzzer calls and by explicitly aborting leftover staged state when the simulator tick budget is exhausted, reducing non-reproducible crashes due to cross-testcase state leakage.

Changes:

Added IPC-layer helpers to reset/observe/abort staged fuzz input state between testcases.
Updated LLVMFuzzerTestOneInput() to run the simulator in small time quanta, exiting early once staged input is drained, and aborting pending state when the time budget is exhausted.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/platform/posix/ipc.c`	Adds testcase-isolation helper APIs that reset/observe/abort the IPC staging buffer state used by the fuzz interrupt handler.
`src/platform/posix/fuzz.c`	Implements a bounded “drain-or-abort” loop around `nsi_exec_for()` and calls the new IPC-layer helpers to isolate testcases.

+/*
+ * Testcase-isolation helpers used by the libFuzzer entry point in
+ * fuzz.c. They keep ownership of the cross-call state in one module
+ * so a new testcase never observes leftovers from a previous one that
+ * failed to drain inside the simulator tick budget.
+ */
+void posix_fuzz_case_begin(void)
+{
+	fuzz_in_sz = 0;
+}
+
+bool posix_fuzz_case_pending(void)
+{
+	return posix_fuzz_sz != 0 || fuzz_in_sz != 0;
+}
+
+void posix_fuzz_case_abort(void)
+{
+	posix_fuzz_sz = 0;
+	fuzz_in_sz = 0;
+}


The libFuzzer entry point in fuzz.c stages each testcase by writing posix_fuzz_buf/sz and raising the fuzz IRQ; fuzz_isr() then drains those bytes into the static fuzz_in[] queue and feeds them into the IPC layer one message at a time. Two pieces of state therefore survive across LLVMFuzzerTestOneInput() calls: * `posix_fuzz_sz` - the raw input length still to consume, * `fuzz_in[] / _sz` - the per-call staging queue. The fuzzer harness has no way to inspect either of them today, which makes it impossible to tell whether a previous testcase fully drained before the next one begins. That is the root cause of the "not reproducible" crashes documented in FUZZER_ISOLATION_RESEARCH.md. Introduce three small helpers, kept in the module that owns the state, with no callers yet: posix_fuzz_case_begin() - drop the staging queue at the start of a new testcase, posix_fuzz_case_pending() - true while either buffer still has bytes to deliver, posix_fuzz_case_abort() - wipe both buffers (used when a case exceeds the simulator tick budget). A follow-up commit wires these into LLVMFuzzerTestOneInput(). This commit is a pure code-addition refactor: no callers, no behaviour change, the build still emits the same object code for the existing entry points. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>

The libFuzzer harness used to stage the testcase bytes, raise the fuzz IRQ, and then unconditionally run the native_sim scheduler for CONFIG_ZEPHYR_POSIX_FUZZ_TICKS ticks before returning. That has two problems for reproducibility: * If the OS finishes draining the IPC much faster than the tick budget (the common case), we still burn the full budget, which slows exec/s without buying any coverage. * If the OS does NOT finish within the budget (deep handlers, long pipeline walks, large payloads), the staged input buffer plus the per-call fuzz_in[] queue carry over into the next testcase. That leaks state across cases and is the root cause of crashes that disappear when replayed individually. Split the budget into POSIX_FUZZ_DRAIN_QUANTA (=8) quanta and after each one ask the IPC layer whether anything is still pending; return as soon as the queue is empty, otherwise run the abort hook to wipe both the raw fuzz buffer and the staged IPC payload before the next call. Together with the hooks added in the previous commit this guarantees that LLVMFuzzerTestOneInput observes a clean staging state on entry regardless of what the previous case did. No protocol or coverage change is intended; the goal is reproducible crashes and slightly higher throughput on short inputs. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>

Copilot AI review requested due to automatic review settings June 3, 2026 15:48

tmleman requested review from dbaluta, kv2019i, lbetlej, lgirdwood, mmaka1 and plbossart as code owners June 3, 2026 15:48

Copilot started reviewing on behalf of tmleman June 3, 2026 15:48 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

tmleman added 2 commits June 3, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzzing: Improve testcase isolation by draining the IPC staging queue#10839

fuzzing: Improve testcase isolation by draining the IPC staging queue#10839
tmleman wants to merge 2 commits into
thesofproject:mainfrom
tmleman:topic/upstream/pr/fuzzing/enhancement/part3

tmleman commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tmleman commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants