feat(node-message-broker): inbound dead-lettering + production config fail-fast (+ agent-guide docs)#19
feat(node-message-broker): inbound dead-lettering + production config fail-fast (+ agent-guide docs)#19tada5hi wants to merge 3 commits into
Conversation
…roker These guides are loaded into context every session via CLAUDE.md and had drifted from the code: - structure.md: per-app layout now lists the real core/ (crypto, authz, messaging, inbound) + adapters/ (crypto, core, authz, http/middleware) + app/modules (core-client, inbound) tree. - architecture.md: ports table replaces the collapsed `IAnalysisPolicy` with `ICryptoService`, `IAnalysisClientLookup`, and `IPermissionCheckGateway`, and notes `assertClientOwnsAnalysis` is a pure function. Authorization section reflects the capability check via the request permission checker (not introspection) + scope via server-core ownership. Crypto section documents the analysisId↔HKDF-info binding.
📝 WalkthroughWalkthroughUpdated broker security documentation and source tree docs, added production-only config validation, introduced inbound processing error and retry/dead-letter contracts, and changed inbound delivery handling plus tests for dead-lettering and retries. ChangesNode message broker security and inbound delivery
Sequence Diagram(s)sequenceDiagram
participant Hub
participant InboundDeliveryProcessor
participant resolveSenderPublicKey
participant deliverMessage
Hub->>InboundDeliveryProcessor: processBatch(messages)
InboundDeliveryProcessor->>resolveSenderPublicKey: resolve sender key
InboundDeliveryProcessor->>deliverMessage: decrypt and post webhook
deliverMessage-->>InboundDeliveryProcessor: success or InboundProcessingError
alt permanent failure or attempts reached
InboundDeliveryProcessor->>Hub: ack dead-lettered message id
else transient failure below cap
InboundDeliveryProcessor->>Hub: leave message unacked
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed: one or more packages not found in the registry. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…bounded retries
Previously any per-message failure was logged, skipped, and left unacked — so a
permanently undecryptable ("poison") message was redelivered by the Hub on every pull
cycle forever. Classify failures and bound the retries:
- Permanent failures (no analysisId, no ciphertext, unknown sender, decrypt/parse
error) are dropped on the first attempt — acked away so the Hub stops redelivering.
- Transient failures (participant-resolution / webhook outage) are retried up to
`maxAttempts` (default 5) redeliveries, then dead-lettered; a still-retrying message
is left unacked. The per-id attempt count resets on success.
Dropped messages are logged at `error` and counted (`droppedCount`) for observability.
Adds an `InboundProcessingError` carrying the permanent/transient classification.
Closes #17.
…urity stack The broker ships development defaults for the whole security stack — localhost Authup, `system` / `start123` node-client credentials, no node key. Relying on them in production would silently authenticate against the wrong identity provider (or degrade the authorization middleware to fake-identity mode) and have no usable end-to-end key. `assertProductionConfig` (run on the env-derived startup path) now refuses to start when `NODE_ENV=production` and any of `AUTHUP_URL`, `CLIENT_ID`, `CLIENT_SECRET`, `REALM`, `NODE_PRIVATE_KEY` is unset, listing the missing vars. Development and test keep the intended defaults. Closes #14.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/node-message-broker/src/app/modules/config/guard.ts (1)
20-47: 🔒 Security & Privacy | 🔵 TrivialGuard misses explicitly-set dev-default values.
The guard treats only
undefined/''as missing. If a production env explicitly setsCLIENT_ID=systemandCLIENT_SECRET=start123(dev defaults defined inapps/node-message-broker/src/app/modules/config/constants.ts), the check passes while the broker still uses insecure credentials—the exact silent-auth scenario the doc warns about.Remediation options:
- Extend the check to reject known dev literals for credential keys when
env === PRODUCTION:
CLIENT_ID === 'system'orCLIENT_SECRET === 'start123'(seeConfigDefaultsinconstants.ts).- Or require operators to unset dev values when switching to production and document this constraint.
HUB_URL/CORE_URL can remain omitted; mismatched URLs fail loudly during connection attempts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/node-message-broker/src/app/modules/config/guard.ts` around lines 20 - 47, The production guard in assertProductionConfig only treats undefined and empty strings as missing, so it can still accept explicit dev-default credentials. Update the missing/invalid check for the credential keys in PRODUCTION_REQUIRED_ENV to also reject the known default literals from ConfigDefaults, especially CLIENT_ID and CLIENT_SECRET, while leaving URL fields behavior unchanged. Keep the fix inside assertProductionConfig and use the existing config.env and env lookup flow so production startup fails fast when dev defaults are explicitly set.apps/node-message-broker/src/core/inbound/processor.ts (1)
42-48: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value
attemptsMap can grow unbounded for orphaned transient failures.Entries are removed on success, permanent drop, and transient dead-letter, but a message that fails transiently and is then never redelivered (e.g., expired/claimed elsewhere) leaves its counter behind forever. Over a long-running process this is a slow leak. Consider an eviction/TTL or periodic prune if redelivery isn't guaranteed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/node-message-broker/src/core/inbound/processor.ts` around lines 42 - 48, The attempts map in Processor can leak entries for messages that fail transiently and never get redelivered, so add a cleanup strategy to prevent unbounded growth. Update the transient-failure tracking in Processor/handleMessage flow to either evict stale entries with a TTL or periodically prune old message IDs, while still clearing them on success, permanent drop, and dead-letter as it does today.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/node-message-broker/src/app/modules/config/guard.ts`:
- Around line 20-47: The production guard in assertProductionConfig only treats
undefined and empty strings as missing, so it can still accept explicit
dev-default credentials. Update the missing/invalid check for the credential
keys in PRODUCTION_REQUIRED_ENV to also reject the known default literals from
ConfigDefaults, especially CLIENT_ID and CLIENT_SECRET, while leaving URL fields
behavior unchanged. Keep the fix inside assertProductionConfig and use the
existing config.env and env lookup flow so production startup fails fast when
dev defaults are explicitly set.
In `@apps/node-message-broker/src/core/inbound/processor.ts`:
- Around line 42-48: The attempts map in Processor can leak entries for messages
that fail transiently and never get redelivered, so add a cleanup strategy to
prevent unbounded growth. Update the transient-failure tracking in
Processor/handleMessage flow to either evict stale entries with a TTL or
periodically prune old message IDs, while still clearing them on success,
permanent drop, and dead-letter as it does today.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0d35ed92-6b66-47af-8b11-d8be60ba091e
📒 Files selected for processing (10)
.agents/architecture.md.agents/structure.mdapps/node-message-broker/src/app/modules/config/guard.tsapps/node-message-broker/src/app/modules/config/module.tsapps/node-message-broker/src/core/inbound/errors.tsapps/node-message-broker/src/core/inbound/index.tsapps/node-message-broker/src/core/inbound/processor.tsapps/node-message-broker/src/core/inbound/types.tsapps/node-message-broker/test/unit/app/config-guard.spec.tsapps/node-message-broker/test/unit/core/inbound/processor.spec.ts
Summary
Hardening + docs for the now-complete message broker. Bundles two filed follow-ups with the agent-guide refresh.
feat: inbound poison-message dead-lettering (#17)
The inbound loop previously logged-and-skipped any per-message failure and left it unacked — so a permanently undecryptable message was redelivered by the Hub every pull cycle forever. Now failures are classified and retries bounded:
analysisId, no ciphertext, unknown sender, decrypt/parse error) → dropped on the first attempt (acked away).maxAttempts(default 5) redeliveries, then dead-lettered; still-retrying messages stay unacked. Per-id attempt count resets on success.errorand counted (droppedCount). AddsInboundProcessingErrorcarrying the permanent/transient flag.fix: production config fail-fast (#14)
The broker ships dev defaults for the whole security stack (localhost Authup,
system/start123credentials, no node key). Relying on them in production would silently authenticate against the wrong IdP (or degrade authorization to fake-identity mode) with no usable E2E key.assertProductionConfignow refuses to start whenNODE_ENV=productionand any ofAUTHUP_URL/CLIENT_ID/CLIENT_SECRET/REALM/NODE_PRIVATE_KEYis unset, listing the missing vars. Dev/test keep the intended defaults.docs: agent-guide refresh
.agents/structure.md+.agents/architecture.md(loaded into context every session viaCLAUDE.md) now match the shipped code: the realcore//adapters//app/modulestree, the ports table withICryptoService/IAnalysisClientLookup/IPermissionCheckGateway(the collapsedIAnalysisPolicyremoved), the request-permission-checker auth model, and theanalysisId↔HKDF binding.Tests
88 unit tests pass (+8: dead-letter classification/retry-cap/reset + production-config guard). Lint, typecheck, and the tsdown build are clean.
Closes #14, closes #17.
Summary by CodeRabbit
New Features
Bug Fixes
Tests