feat(data): giga-swamp phase 3 — namespace-aware layout + per-namespace locking (swamp-club#486)#1474
Conversation
…ce locking (swamp-club#486)
Phase 3 makes the datastore storage layout namespace-aware so repos sharing a
datastore land data under {datastore}/{namespace}/ and never block each other on
structural commands. Solo mode (empty namespace) is byte-identical to before.
- Path resolver: DefaultDatastorePathResolver prepends {namespace}/ as the
outermost datastore-tier segment via the single datastorePath() chokepoint.
- Catalog is repo-local at .swamp/data/_catalog.db via a centralized
catalogDbPath() helper (used by createCatalogStore and datastore_compact), so
repos sharing a datastore each own a private catalog and full-replace backfill
never clobbers another repo's rows.
- Global datastore lock becomes .locks/{namespace}.lock via a namespace-aware
createDatastoreLock + datastoreGlobalLockOptions, routed through every
global-lock site including the acquireModelLocks drain inspect so the
symmetric drain stays correct in namespaced mode. Per-model locks are left
un-namespaced (model ids are UUIDs, already globally unique).
- Namespace value threaded through the 13 direct FileSystemUnifiedDataRepository
construction sites (via namespaceFromResolver) and the 2 execution_service
sites plus their callers, so catalog stamping stays in lockstep with the
namespaced data path.
Verified: deno check/lint/fmt/test (6471 passed) + compiled-binary smoke for
solo, namespaced, and the external-filesystem catalog-relocation migration
(old catalog orphaned harmlessly, new one self-heals with identical results),
plus a live MinIO run confirming the namespace is the outermost S3 key prefix
and the local cache mirrors the remote layout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
None.
Verdict
PASS — This PR has no user-visible UX impact. It is a pure infrastructure change (namespace-aware storage layout, per-namespace locking, centralized catalog path helper). No new flags, commands, or help text are introduced; no output format changes in log or JSON mode. The only observable CLI difference is a minor improvement: LockTimeoutError now reports the correct lock key (.locks/{namespace}.lock) in namespaced repos instead of the hardcoded .datastore.lock. Solo mode is byte-identical to before.
There was a problem hiding this comment.
Code Review
Well-structured Phase 3 implementation. The namespace threading is comprehensive — all 13 FileSystemUnifiedDataRepository construction sites, both execution_service paths, and all global-lock sites are correctly updated. The catalog relocation from datastore-tier to repo-local (resolvePath → localPath) is a sound design decision with proper centralization via catalogDbPath(). Solo-mode backward compatibility is preserved throughout.
Blocking Issues
None.
Suggestions
-
Defense-in-depth: validate namespace at config parse time —
datastoreGlobalLockOptions()inrepo_context.ts:1205uses the rawconfig.namespacestring in path construction (.locks/${namespace}.lock) beforecreateNamespace()validates it at the composition root. TheNamespaceregex (/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/) rejects all path-traversal characters, so there's no exploitable vulnerability, but if a future code path ever callsdatastoreGlobalLockOptionsbefore hitting the composition root, the raw string would flow into lock paths unvalidated. Consider either (a) validating inresolve_datastore.tsat parse time, or (b) acceptingNamespaceinstead ofDatastoreConfigindatastoreGlobalLockOptionsto make the type system enforce validation order. -
Lock key uses string interpolation instead of
join()—repo_context.ts:1207:`.locks/${namespace}.lock`uses a forward-slash literal. Per CLAUDE.md, path operations should use@std/path. However, sincelockKeyis a logical key that also flows to cloud backends (S3) where forward slashes are canonical, this might be intentional. If it is, a brief inline comment noting the cross-backend rationale would prevent future reviewers from "fixing" it.
What looks good
- Test coverage: 11 unit tests + 3 integration test cases covering solo, namespaced, and two-namespace-shared-datastore scenarios. The
assertPathEqualshelper is used for cross-platform correctness. catalogDbPathcentralization: Eliminates the old drift-prone pattern wheredatastore compactindependently recomputed the catalog path. ThelocalPathusage ensures the catalog is never namespace-prefixed.namespaceFromResolverhelper: Clean extraction that keeps the 13 direct construction sites from diverging on namespace derivation.- Design doc updates: Thorough documentation of the namespace prefixing strategy, lock partitioning, and the known cosmetic artifact (empty un-namespaced dirs from per-model locks).
- Libswamp import boundary: All CLI commands import from
src/libswamp/mod.tsor infrastructure directly — no internal libswamp path violations. - License headers: Present on the new integration test file.
There was a problem hiding this comment.
Adversarial Review
Reviewed all 28 changed files. Traced namespace flow from config resolution → path resolver → repository factory → all 13 direct FileSystemUnifiedDataRepository construction sites + 5 WorkflowExecutionService callers + DefaultStepExecutor.buildDeps. Verified lock construction consistency across structural-command acquire, acquireModelLocks drain-inspect, TOCTOU retry, push-lock, and breakglass commands.
Critical / High
None.
Medium
- Stale PHASE 3 DEPENDENCY comment —
src/infrastructure/persistence/unified_data_repository.ts:84-96: The docstring on thenamespaceconstructor parameter says "PHASE 3 DEPENDENCY: those direct construction sites … must thread the configured namespace through … This is a tracked dependency, not an oversight." This PR is Phase 3 and completes that work — every direct site now passesnamespaceFromResolver(datastoreResolver). The comment is misleading; future readers will think the work is still pending. Should be updated to reflect completion.
Low
-
Namespace validation timing in
datastoreGlobalLockOptionsandDefaultDatastorePathResolver—src/cli/repo_context.ts:1202-1208,src/infrastructure/persistence/default_datastore_path_resolver.ts:52,84: Both useconfig.namespaceas a raw string (interpolated into.locks/${namespace}.lockandjoin(base, namespace, …)) beforecreateNamespace()validation runs increateRepositoryContext(line 356 ofrepository_factory.ts). InrequireInitializedRepo, the lock is acquired (viaregisterDatastoreSync) before the factory validates. An invalid namespace (e.g. containing/) would create a lock file at an unexpected path, thencreateNamespacewould throw. Mitigated by: (a) the config is a local file — this is self-inflicted; (b)createNamespacerejects all path-traversal characters (/,., spaces) via[a-z0-9][a-z0-9-]*; (c) the orphaned lock would eventually be cleaned up by staleness detection. Still, validating the namespace earlier (e.g. inresolveDatastoreConfigatsrc/cli/resolve_datastore.ts:290) would close the window entirely. -
waitForPerModelLocksscans un-namespaced root —src/cli/repo_context.ts:553,574: In a namespaced repo, per-model locks live at{ds}/data/{type}/{id}/.lock(un-namespaced), sowaitForPerModelLocks(datastoreConfig.path)drains writers across all namespaces, not just the structural command's own. This is documented as "conservative over-wait" indesign/datastores.mdand is correctness-safe (data dirs are partitioned, catalogs are repo-local, global locks are namespaced). Noting it for completeness.
Verdict
PASS — The namespace threading is consistent and complete across all code paths. The catalog relocation from resolvePath → localPath is correct and well-tested. Lock coordination properly uses datastoreGlobalLockOptions at every global-lock site. The integration test covers solo, namespaced, and shared-datastore scenarios. The one stale comment (Medium #1) is cosmetic; the validation timing (Low #1) is theoretical given the input source.
Summary
Phase 3 of giga-swamp (multi-repo shared datastores). It makes the datastore storage layout namespace-aware so repos sharing a datastore land data under
{datastore}/{namespace}/and don't block each other on structural commands. Solo mode (empty namespace) is byte-identical to before — the regression gate.Builds on Phase 1 (Namespace value object + config) and Phase 2 (catalog v4 + repository + DataRecord).
What changed
DefaultDatastorePathResolverprepends{namespace}/as the outermost datastore-tier segment via the singledatastorePath()chokepoint (@std/pathjoin; no prefix in solo mode).catalogDbPath()helper resolves_catalog.dbto.swamp/data/regardless of namespace (used bycreateCatalogStoreanddatastore compact, so the path can't drift). Each repo owns a private catalog, so per-repo full-replace backfill never clobbers another repo's rows..locks/{namespace}.lockvia a namespace-awarecreateDatastoreLock+datastoreGlobalLockOptions, routed through every global-lock site including theacquireModelLocksdrain inspect, so the symmetric-drain protocol stays correct in namespaced mode. Per-model locks are left un-namespaced (model ids are UUIDs → already globally unique).FileSystemUnifiedDataRepositoryconstruction sites (via a sharednamespaceFromResolver()helper) and the 2execution_servicesites + their callers, keeping catalog stamping in lockstep with the namespaced data path.Boundaries (deliberately out of scope)
No CEL/query changes (Phase 4), no CLI columns/namespace commands (Phase 5), no sync changes (Phase 6), no existing-data migration (Phase 7), no
DEFAULT_DATASTORE_SUBDIRS/markDirtychanges. Secrets/vault-bundles are repo-local by construction (never routed throughresolvePath). One known cosmetic artifact documented indesign/datastores.md: in namespaced mode the un-namespaced per-model lock leaves empty{ds}/data/{type}/{id}/dirs behind (no data, correctness-safe).Test Plan
deno check/deno lint/deno fmt --checkclean;deno run test6471 passed / 0 failed.catalogDbPathacross default/external-fs/namespaced/custom;namespaceFromResolver; namespacedFileLocklazy.locks/+ concurrent acquire;datastoreGlobalLockOptions) and a 3-case integration test (giga_swamp_namespace_layout_test.ts).{ds}/data/(no prefix), no.locks/,data query→namespace: "".{ds}/infra/data/...,data list/querystampednamespace: "infra", repo-local catalog,.locks/created.{ds}/data/_catalog.db); new binary transparently rebuilt it repo-local with identical query results and intact content; old catalog orphaned harmlessly.giga/infra/data/...); local cache mirrors the remote layout; catalog not synced.Closes swamp-club#486.
🤖 Generated with Claude Code