Tamper-evident append-only fault audit log by mfaferek93 · Pull Request #487 · selfpatch/ros2_medkit

mfaferek93 · 2026-06-30T17:24:29Z

Closes #483

Optional (off by default) append-only, hash-chained audit log of fault state transitions in the fault manager: one immutable row per transition, sha256(prev_hash + canonical(event)) with a persisted chain head, a verify routine, retention that seals a segment anchor before pruning, and append-only triggers.

Threat model is documented honestly: the chain is unkeyed and lives in a single writable file, so it detects non-recomputing tampering and accidental loss; a key/signature or external anchoring is a follow-up.

Tested: 121 unit tests incl. truncation and tamper cases.

Copilot

Pull request overview

This PR adds an optional, tamper-evident (hash-chained) append-only audit log to ros2_medkit_fault_manager to make fault state-transition history independently verifiable (detecting non-recomputing edits/deletions), with configurable retention/rotation and test coverage for tamper/truncation scenarios.

Changes:

Introduces FaultAuditLog (SQLite-backed) with hash chaining, persisted head, verify routine, and sealed-anchor retention pruning.
Hooks the audit log into FaultManagerNode for write-path transitions (including timer-driven PREFAILED→CONFIRMED auto-confirmations) and adds/updates unit tests accordingly.
Updates dependencies/build/docs/config/changelog to include OpenSSL (EVP SHA-256) and document the threat model and parameters.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/ros2_medkit_fault_manager/src/fault_audit_log.cpp	Implements the SQLite-backed append-only, hash-chained audit log with verify + rotation/anchors.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_audit_log.hpp	Public API/types for audit events, records, head, and verification results.
src/ros2_medkit_fault_manager/src/fault_manager_node.cpp	Creates/configures the audit log and appends records on fault transitions, including timer auto-confirmation.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_manager_node.hpp	Declares audit-log plumbing and exposes test-only accessors used by new tests.
src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp	Changes time-based confirmation to return confirmed fault codes for auditing.
src/ros2_medkit_fault_manager/src/fault_storage.cpp	Updates in-memory storage to return confirmed fault codes for auditing.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_storage.hpp	Updates the `check_time_based_confirmation()` contract to return confirmed fault codes.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/sqlite_fault_storage.hpp	Updates override signature for the new confirmation return type.
src/ros2_medkit_fault_manager/test/test_fault_audit_log.cpp	Adds focused unit tests for append/verify/tamper/truncation/rotation and trigger enforcement.
src/ros2_medkit_fault_manager/test/test_fault_manager.cpp	Adds end-to-end node tests verifying audit logging (enabled/disabled/timer-confirm paths).
src/ros2_medkit_fault_manager/test/test_sqlite_storage.cpp	Updates tests to match new confirmation return type (vector of fault codes).
src/ros2_medkit_fault_manager/CMakeLists.txt	Links OpenSSL crypto, builds the new audit-log implementation and test target.
src/ros2_medkit_fault_manager/package.xml	Adds OpenSSL development dependency (`libssl-dev`).
src/ros2_medkit_fault_manager/config/fault_manager.yaml	Documents audit-log parameters and threat model in the default config.
src/ros2_medkit_fault_manager/README.md	Documents audit-log behavior, parameters, rotation, and threat model.
src/ros2_medkit_fault_manager/CHANGELOG.rst	Adds a forthcoming entry describing the new audit-log feature and behavior.

bburda

Additional findings outside the diff:

The audit log has no runtime read/verify/health surface. verify(), read(), head(), record_count(), and the dropped-writes / audit-healthy counters are reachable only as C++ getters (the test-only accessor); no runtime path calls verify(), and nothing exposes the log or its health outside the process. The README (line 119, and the fail_closed row of the parameter table) presents the dropped-writes counter as the safety net for the completeness gap, but an operator has no way to observe audit health or run verify() without stopping the node and using an offline tool that reimplements canonicalize(). The history and its verify/health result should be reachable over the gateway API (a read plus a verify/health resource), with an optional fault-manager ROS service for verify/health, so the log is observable at runtime.

mfaferek93 · 2026-07-01T19:06:43Z

Agreed, and this is intentional for this PR. #487 lands only the write-time primitive: the append-only hash chain plus the in-process verify() / read() / head APIs. Surfacing the log and its verify/health at runtime (a read plus a verify/health resource over the gateway API, and optionally a fault-manager service) is deliberately out of scope here and tracked as follow-up work.

I'll also tighten the README so the dropped-writes / audit-healthy signals aren't presented as operator-observable at this stage (they are in-process only until that runtime surface lands), and I'll address the concrete correctness findings in the diff in a follow-up commit.

Append-only, hash-chained audit log of fault state transitions in the fault manager. Each transition appends one immutable row with record_hash = sha256(prev_hash + canonical(event)) (OpenSSL EVP SHA-256), a persisted chain head that resumes across restarts, a verify routine, a read API, and retention that seals a segment anchor before pruning so the surviving tail stays verifiable. Off by default. Refs #483

verify() now reads audit_chain_head directly and fails when the head row is missing on a non-empty log, so deleting the newest row plus the head is caught. Timer-driven PREFAILED->CONFIRMED confirmations now call audit_transition. Adds append-only triggers (hardening, not a security boundary) and corrects the unkeyed/single-file threat model in the docs. Refs #483

raw_exec_rc no longer runs sqlite3_exec on a failed sqlite3_open; it records the failure, closes the handle and returns early. The audit-log enabled RCLCPP_INFO now formats int64_t retention/seq with PRId64 (<cinttypes>) to satisfy -Werror=format=2. Refs #483

Record auto-recovery as a distinct "healed" row (source auto_heal) so a fault's end is in the chain. Append failures now bump a dropped-writes health counter and clear an audit-healthy flag; audit_log.fail_closed makes them a hard error. Drop the never-written "ack" kind (clear == ack), protect audit_prune_guard with a trigger, add logging activate/deactivate markers, and tighten the threat-model README. Refs #483

canonicalize uses a non-throwing dump so non-UTF-8 content never breaks append; the sqlite handle is RAII so a throwing ctor cannot leak; rotation is best-effort after commit and never fails a durable append; anchors are pruned to the current boundary and guard-gated. Docs: fail_closed is not a rollback, health signals are in-process only, tail/prefix truncations are cheap; dropped the CIR citation. Refs #483

sqlite3_column_text is NUL-terminated, so std::string(text) truncated audit content at the first embedded NUL while bind_text stored the full bytes, making verify() falsely report a record_hash mismatch on untampered records. Read with sqlite3_column_bytes so the write/read round-trip stays byte-exact and the chain still verifies. Refs #483

Assert exactly one healed row under the HEALED latch, no duplicate confirmed rows on latched re-confirmation, and a healing_threshold == 0 single-PASSED heal audited exactly once. Refs #483

reclassify_healed_as_cleared() now returns the flipped fault codes on both backends so the node can audit each transition (source startup_reclassify) instead of silently mutating storage. Refs #483

Copilot AI review requested due to automatic review settings June 30, 2026 17:24

Copilot started reviewing on behalf of mfaferek93 June 30, 2026 17:24 View session

mfaferek93 marked this pull request as draft June 30, 2026 17:30

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread src/ros2_medkit_fault_manager/test/test_fault_audit_log.cpp

Comment thread src/ros2_medkit_fault_manager/src/fault_manager_node.cpp Outdated

mfaferek93 self-assigned this Jun 30, 2026

mfaferek93 force-pushed the feat/fault-audit-log branch from dc7b64e to 835b862 Compare June 30, 2026 18:01

mfaferek93 requested a review from bburda July 1, 2026 16:49

mfaferek93 marked this pull request as ready for review July 1, 2026 16:49

bburda reviewed Jul 1, 2026

View reviewed changes

mfaferek93 added 8 commits July 1, 2026 23:08

style: apply clang-format-18

d0b5124

test(fault-audit): cover latch and single-event heal audit semantics

3aeb2ff

Assert exactly one healed row under the HEALED latch, no duplicate confirmed rows on latched re-confirmation, and a healing_threshold == 0 single-PASSED heal audited exactly once. Refs #483

mfaferek93 force-pushed the feat/fault-audit-log branch from 6579aaf to 3aeb2ff Compare July 1, 2026 21:14

fix(fault-audit): audit startup HEALED->CLEARED reclassification

92ef4b3

reclassify_healed_as_cleared() now returns the flipped fault codes on both backends so the node can audit each transition (source startup_reclassify) instead of silently mutating storage. Refs #483

mfaferek93 requested a review from bburda July 2, 2026 06:08

bburda approved these changes Jul 2, 2026

View reviewed changes

mfaferek93 merged commit fdeb89b into main Jul 2, 2026
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tamper-evident append-only fault audit log#487

Tamper-evident append-only fault audit log#487
mfaferek93 merged 9 commits into
mainfrom
feat/fault-audit-log

mfaferek93 commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

bburda left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfaferek93 commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mfaferek93 commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

bburda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfaferek93 commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants