Skip to content

feat(node-message-broker): inbound dead-lettering + production config fail-fast (+ agent-guide docs)#19

Open
tada5hi wants to merge 3 commits into
masterfrom
docs/agents-refresh
Open

feat(node-message-broker): inbound dead-lettering + production config fail-fast (+ agent-guide docs)#19
tada5hi wants to merge 3 commits into
masterfrom
docs/agents-refresh

Conversation

@tada5hi

@tada5hi tada5hi commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Hardening + docs for the now-complete message broker. Bundles two filed follow-ups with the agent-guide refresh.

feat: inbound poison-message dead-lettering (#17)

The inbound loop previously logged-and-skipped any per-message failure and left it unacked — so a permanently undecryptable message was redelivered by the Hub every pull cycle forever. Now failures are classified and retries bounded:

  • Permanent (no analysisId, no ciphertext, unknown sender, decrypt/parse error) → dropped on the first attempt (acked away).
  • Transient (participant-resolution / webhook outage) → retried up to maxAttempts (default 5) redeliveries, then dead-lettered; still-retrying messages stay unacked. Per-id attempt count resets on success.
  • Dropped messages are logged at error and counted (droppedCount). Adds InboundProcessingError carrying the permanent/transient flag.

fix: production config fail-fast (#14)

The broker ships dev defaults for the whole security stack (localhost Authup, system/start123 credentials, no node key). Relying on them in production would silently authenticate against the wrong IdP (or degrade authorization to fake-identity mode) with no usable E2E key. assertProductionConfig now refuses to start when NODE_ENV=production and any of AUTHUP_URL / CLIENT_ID / CLIENT_SECRET / REALM / NODE_PRIVATE_KEY is unset, listing the missing vars. Dev/test keep the intended defaults.

docs: agent-guide refresh

.agents/structure.md + .agents/architecture.md (loaded into context every session via CLAUDE.md) now match the shipped code: the real core//adapters//app/modules tree, the ports table with ICryptoService/IAnalysisClientLookup/IPermissionCheckGateway (the collapsed IAnalysisPolicy removed), the request-permission-checker auth model, and the analysisId↔HKDF binding.

Tests

88 unit tests pass (+8: dead-letter classification/retry-cap/reset + production-config guard). Lint, typecheck, and the tsdown build are clean.

Closes #14, closes #17.

Summary by CodeRabbit

  • New Features

    • Added stricter startup validation for production environments to ensure required security settings are present.
    • Added retry limits and dead-letter handling for inbound message processing, with a new dropped-message count.
  • Bug Fixes

    • Improved handling of malformed, undecryptable, and unknown-sender messages by dropping them immediately.
    • Transient delivery failures are now retried before being marked as failed.
  • Tests

    • Expanded coverage for production config validation and inbound retry/dead-letter behavior.

…roker

These guides are loaded into context every session via CLAUDE.md and had drifted from
the code:

- structure.md: per-app layout now lists the real core/ (crypto, authz, messaging,
  inbound) + adapters/ (crypto, core, authz, http/middleware) + app/modules
  (core-client, inbound) tree.
- architecture.md: ports table replaces the collapsed `IAnalysisPolicy` with
  `ICryptoService`, `IAnalysisClientLookup`, and `IPermissionCheckGateway`, and notes
  `assertClientOwnsAnalysis` is a pure function. Authorization section reflects the
  capability check via the request permission checker (not introspection) + scope via
  server-core ownership. Crypto section documents the analysisId↔HKDF-info binding.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Updated broker security documentation and source tree docs, added production-only config validation, introduced inbound processing error and retry/dead-letter contracts, and changed inbound delivery handling plus tests for dead-lettering and retries.

Changes

Node message broker security and inbound delivery

Layer / File(s) Summary
Security model docs
.agents/architecture.md, .agents/structure.md
The broker architecture and structure docs describe the updated crypto, authorization, and core//adapters/ layout.
Production config guard
apps/node-message-broker/src/app/modules/config/guard.ts, apps/node-message-broker/src/app/modules/config/module.ts, apps/node-message-broker/test/unit/app/config-guard.spec.ts
Production config loading now validates required security environment keys, and unit tests cover production, missing, empty, and non-production cases.
Inbound error contract
apps/node-message-broker/src/core/inbound/errors.ts, apps/node-message-broker/src/core/inbound/index.ts, apps/node-message-broker/src/core/inbound/types.ts
Inbound processing now exposes a permanent/temporary error type and a maxAttempts option through the inbound public API.
Retry and dead-letter processing
apps/node-message-broker/src/core/inbound/processor.ts
InboundDeliveryProcessor tracks attempts and dropped messages, classifies failures as permanent or transient, and decides whether to ack or redeliver each message.
Inbound processor tests
apps/node-message-broker/test/unit/core/inbound/processor.spec.ts
The processor tests cover permanent dead-lettering, transient retries up to the cap, and retry counter reset after recovery.

Sequence Diagram(s)

sequenceDiagram
  participant Hub
  participant InboundDeliveryProcessor
  participant resolveSenderPublicKey
  participant deliverMessage
  Hub->>InboundDeliveryProcessor: processBatch(messages)
  InboundDeliveryProcessor->>resolveSenderPublicKey: resolve sender key
  InboundDeliveryProcessor->>deliverMessage: decrypt and post webhook
  deliverMessage-->>InboundDeliveryProcessor: success or InboundProcessingError
  alt permanent failure or attempts reached
    InboundDeliveryProcessor->>Hub: ack dead-lettered message id
  else transient failure below cap
    InboundDeliveryProcessor->>Hub: leave message unacked
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PrivateAIM/node#16: Also changes apps/node-message-broker/src/core/inbound/processor.ts and the inbound delivery loop, so it is directly related to this retry and delivery flow.

Poem

I hopped through configs by moonlight’s glow,
then nibbled on retries, nice and slow.
Dead letters drifted off with a thump,
while good little messages got their jump.
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main code changes: inbound dead-lettering, production config fail-fast, and related docs updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/agents-refresh

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: one or more packages not found in the registry.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

tada5hi added 2 commits June 26, 2026 18:06
…bounded retries

Previously any per-message failure was logged, skipped, and left unacked — so a
permanently undecryptable ("poison") message was redelivered by the Hub on every pull
cycle forever. Classify failures and bound the retries:

- Permanent failures (no analysisId, no ciphertext, unknown sender, decrypt/parse
  error) are dropped on the first attempt — acked away so the Hub stops redelivering.
- Transient failures (participant-resolution / webhook outage) are retried up to
  `maxAttempts` (default 5) redeliveries, then dead-lettered; a still-retrying message
  is left unacked. The per-id attempt count resets on success.

Dropped messages are logged at `error` and counted (`droppedCount`) for observability.
Adds an `InboundProcessingError` carrying the permanent/transient classification.

Closes #17.
…urity stack

The broker ships development defaults for the whole security stack — localhost Authup,
`system` / `start123` node-client credentials, no node key. Relying on them in
production would silently authenticate against the wrong identity provider (or degrade
the authorization middleware to fake-identity mode) and have no usable end-to-end key.

`assertProductionConfig` (run on the env-derived startup path) now refuses to start when
`NODE_ENV=production` and any of `AUTHUP_URL`, `CLIENT_ID`, `CLIENT_SECRET`, `REALM`,
`NODE_PRIVATE_KEY` is unset, listing the missing vars. Development and test keep the
intended defaults.

Closes #14.
@tada5hi tada5hi changed the title docs(agents): sync structure + architecture guides with the shipped broker feat(node-message-broker): inbound dead-lettering + production config fail-fast (+ agent-guide docs) Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/node-message-broker/src/app/modules/config/guard.ts (1)

20-47: 🔒 Security & Privacy | 🔵 Trivial

Guard misses explicitly-set dev-default values.

The guard treats only undefined/'' as missing. If a production env explicitly sets CLIENT_ID=system and CLIENT_SECRET=start123 (dev defaults defined in apps/node-message-broker/src/app/modules/config/constants.ts), the check passes while the broker still uses insecure credentials—the exact silent-auth scenario the doc warns about.

Remediation options:

  • Extend the check to reject known dev literals for credential keys when env === PRODUCTION:
    • CLIENT_ID === 'system' or CLIENT_SECRET === 'start123' (see ConfigDefaults in constants.ts).
  • Or require operators to unset dev values when switching to production and document this constraint.

HUB_URL/CORE_URL can remain omitted; mismatched URLs fail loudly during connection attempts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/node-message-broker/src/app/modules/config/guard.ts` around lines 20 -
47, The production guard in assertProductionConfig only treats undefined and
empty strings as missing, so it can still accept explicit dev-default
credentials. Update the missing/invalid check for the credential keys in
PRODUCTION_REQUIRED_ENV to also reject the known default literals from
ConfigDefaults, especially CLIENT_ID and CLIENT_SECRET, while leaving URL fields
behavior unchanged. Keep the fix inside assertProductionConfig and use the
existing config.env and env lookup flow so production startup fails fast when
dev defaults are explicitly set.
apps/node-message-broker/src/core/inbound/processor.ts (1)

42-48: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

attempts Map can grow unbounded for orphaned transient failures.

Entries are removed on success, permanent drop, and transient dead-letter, but a message that fails transiently and is then never redelivered (e.g., expired/claimed elsewhere) leaves its counter behind forever. Over a long-running process this is a slow leak. Consider an eviction/TTL or periodic prune if redelivery isn't guaranteed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/node-message-broker/src/core/inbound/processor.ts` around lines 42 - 48,
The attempts map in Processor can leak entries for messages that fail
transiently and never get redelivered, so add a cleanup strategy to prevent
unbounded growth. Update the transient-failure tracking in
Processor/handleMessage flow to either evict stale entries with a TTL or
periodically prune old message IDs, while still clearing them on success,
permanent drop, and dead-letter as it does today.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/node-message-broker/src/app/modules/config/guard.ts`:
- Around line 20-47: The production guard in assertProductionConfig only treats
undefined and empty strings as missing, so it can still accept explicit
dev-default credentials. Update the missing/invalid check for the credential
keys in PRODUCTION_REQUIRED_ENV to also reject the known default literals from
ConfigDefaults, especially CLIENT_ID and CLIENT_SECRET, while leaving URL fields
behavior unchanged. Keep the fix inside assertProductionConfig and use the
existing config.env and env lookup flow so production startup fails fast when
dev defaults are explicitly set.

In `@apps/node-message-broker/src/core/inbound/processor.ts`:
- Around line 42-48: The attempts map in Processor can leak entries for messages
that fail transiently and never get redelivered, so add a cleanup strategy to
prevent unbounded growth. Update the transient-failure tracking in
Processor/handleMessage flow to either evict stale entries with a TTL or
periodically prune old message IDs, while still clearing them on success,
permanent drop, and dead-letter as it does today.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d35ed92-6b66-47af-8b11-d8be60ba091e

📥 Commits

Reviewing files that changed from the base of the PR and between f679c28 and f633c8e.

📒 Files selected for processing (10)
  • .agents/architecture.md
  • .agents/structure.md
  • apps/node-message-broker/src/app/modules/config/guard.ts
  • apps/node-message-broker/src/app/modules/config/module.ts
  • apps/node-message-broker/src/core/inbound/errors.ts
  • apps/node-message-broker/src/core/inbound/index.ts
  • apps/node-message-broker/src/core/inbound/processor.ts
  • apps/node-message-broker/src/core/inbound/types.ts
  • apps/node-message-broker/test/unit/app/config-guard.spec.ts
  • apps/node-message-broker/test/unit/core/inbound/processor.spec.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant