fix(graph): cloud pull sha + cross-agent VFS parity (Cursor/Codex/Hermes)#228
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Python and JavaScript extraction, Phase 1.5 cross-file resolution, VFS query/render endpoints (query/impact/neighborhood/layers/tour/path), Deeplake snapshot validation, hook wiring and safe shell output, install/build updates for tree-sitter-python, and extensive tests. ChangesMulti-language Graph Extraction and Resolution
Graph VFS and Query Surface Expansion
Hook Integration and Output Transport
Snapshot Validation and Build Configuration
Documentation and Packaging
Validation and Test Coverage
Estimated code review effort: Possibly Related PRs
Suggested Reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 28 files changed
Generated for commit 1cac9d2. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/graph/deeplake-pull.ts (1)
168-177:⚠️ Potential issue | 🟠 Major | ⚡ Quick winVerify the payload's
graphidentity matches the selected row.The SHA check only proves the payload is self-consistent; it does not prove it belongs to this
head/repoKey. A mis-keyed row whosegraph.commit_shaorgraph.repo_keydisagrees with the SELECT key will still be written to${head}.jsonand recorded in.last-build.json, which poisons the local cache with a different snapshot. Add explicit equality checks againstheadandrepoKeybefore the write path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/deeplake-pull.ts` around lines 168 - 177, The SHA check only ensures payload integrity but not that it belongs to the requested head/repoKey; add explicit identity checks before any write path: after computing and validating computedSha (using computeSnapshotSha256 and parsedSnapshot) verify parsedSnapshot.graph.commit_sha === head and parsedSnapshot.graph.repo_key === repoKey (or their canonical forms) and if either mismatches return errorOutcome("SELECT cloud row", new Error(...)) with a clear message indicating which field mismatched (commit_sha or repo_key) and the expected vs actual values so the mis-keyed row is rejected rather than written to `${head}.json` / `.last-build.json`.
🧹 Nitpick comments (1)
docs/GRAPH-ROADMAP.md (1)
9-9: ⚡ Quick winReplace machine-local
/tmpreference with a durable repo/PR artifact link.Line 9 points to
/tmp/codex-graph-output.md, which is not shareable/reproducible for reviewers. Prefer a committed file path or a PR comment/link.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/GRAPH-ROADMAP.md` at line 9, In docs/GRAPH-ROADMAP.md update the table entry that references `/tmp/codex-graph-output.md` (the line containing "graphify e Understand-Anything in `_graph-comparison/`; review Codex in `/tmp/codex-graph-output.md`") so it points to a durable, shareable artifact — either commit and reference a file in the repo (e.g., `_graph-comparison/codex-graph-output.md`) or replace the `/tmp/...` path with a PR comment/link or CI/artifact URL; ensure the new reference is accessible to reviewers and update the table cell text accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/GRAPH-ROADMAP.md`:
- Around line 159-173: Add language identifiers to the two fenced code blocks
that currently open without a language (the block containing the roadmap diagram
"Blocco A ... Blocco G" and the block starting "Query: 'pushSnapshot' — 2
matches...") so markdownlint rule MD040 is satisfied; replace the opening triple
backticks with triple backticks plus a language (use "text") for both fenced
blocks around the roadmap diagram and the query/example block.
In `@src/graph/deeplake-pull.ts`:
- Around line 158-166: After JSON.parse you must guard against non-object
results (e.g., JSON null) before accessing properties; update the code around
parsedSnapshot (the variable produced in the try/catch) to first check that
typeof parsedSnapshot === "object" && parsedSnapshot !== null and return
errorOutcome("parse cloud snapshot", new Error("snapshot not an object")) if
that fails, then perform the existing Array.isArray checks on (parsedSnapshot as
{ nodes?: unknown }).nodes and .links; ensure no property access occurs until
the non-null object guard passes so pullSnapshot returns the documented error
outcome for invalid payloads.
---
Outside diff comments:
In `@src/graph/deeplake-pull.ts`:
- Around line 168-177: The SHA check only ensures payload integrity but not that
it belongs to the requested head/repoKey; add explicit identity checks before
any write path: after computing and validating computedSha (using
computeSnapshotSha256 and parsedSnapshot) verify parsedSnapshot.graph.commit_sha
=== head and parsedSnapshot.graph.repo_key === repoKey (or their canonical
forms) and if either mismatches return errorOutcome("SELECT cloud row", new
Error(...)) with a clear message indicating which field mismatched (commit_sha
or repo_key) and the expected vs actual values so the mis-keyed row is rejected
rather than written to `${head}.json` / `.last-build.json`.
---
Nitpick comments:
In `@docs/GRAPH-ROADMAP.md`:
- Line 9: In docs/GRAPH-ROADMAP.md update the table entry that references
`/tmp/codex-graph-output.md` (the line containing "graphify e
Understand-Anything in `_graph-comparison/`; review Codex in
`/tmp/codex-graph-output.md`") so it points to a durable, shareable artifact —
either commit and reference a file in the repo (e.g.,
`_graph-comparison/codex-graph-output.md`) or replace the `/tmp/...` path with a
PR comment/link or CI/artifact URL; ensure the new reference is accessible to
reviewers and update the table cell text accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5b2e3eb7-e710-4ef5-a809-74edffa3eec2
📒 Files selected for processing (3)
docs/GRAPH-ROADMAP.mdsrc/graph/deeplake-pull.tstests/shared/graph/deeplake-pull.test.ts
Push stores snapshot_sha256 computed over the stable fields only (excludes observation), but pull was hashing the entire payload bytes (observation included) and comparing the two. Every real cloud row was therefore rejected with 'snapshot_sha256 mismatch' — team pull silently failed and only legacy rows without a populated sha column slipped through. Pull now parses the payload, validates nodes/links shape, and recomputes the hash via computeSnapshotSha256 (the same function push uses), comparing the stable-only hash against the column. Adds a regression suite covering: a row whose observation differs accepts, a tampered payload rejects, and non-JSON / shape-invalid payloads reject.
…d wiring
Closes the silent gap where the code graph only worked on Claude Code.
- New shared parser src/graph/graph-command.ts (tryGraphRead +
parseReadTargetPath): given a rewritten shell command, detects a
cat/head/tail/ls on the /graph/* virtual subtree and synthesizes the
body via handleGraphVfs. One source of truth so per-agent hooks stop
drifting.
- Wire it into Cursor, Codex, and Hermes pre-tool-use hooks (Claude Code
already had inline handling). Each returns its agent-correct decision
shape (cursor updated_input echo; codex/hermes {action:block}).
- Cursor SessionStart now emits graphContextLine (was missing).
- Cursor auto-build: add graph-on-stop to the cursor esbuild bundle (with
tree-sitter externalized) and register it on stop + sessionEnd in
install-cursor.ts buildHookConfig().
Hardened per two rounds of Codex review: only cat<path> or a single
trailing |head/|tail is a graph read (a |grep falls through); reject ..
path traversal; strip quotes and skip leading flags; require exactly one
file operand (multi-file reads fall through). Codex hook uses input.cwd.
Read-only skill teaching the model when/how to query the code graph via the memory mount (index.md / find/ / show/), with anti-patterns: Incoming (0) != dead code (intra-file calls only), the graph can be stale, find/ is lexical not semantic, and never run a build (hooks handle it). Cursor and pi have no skill loader and rely on the SessionStart inject instead.
286d630 to
6fe819b
Compare
…tour/path)
Four zero-LLM endpoints under the graph mount, each a pure function over the
snapshot, wired into handleGraphVfs:
- neighborhood/<file> symbols in a file + its cross-file neighbors (the
deterministic neighborMap; imports-driven for now since
cross-file calls aren't resolved yet).
- layers architectural grouping by path heuristic (layerOf).
- tour entry points + Kahn topological walkthrough + cyclic
remainder; partition covers every node exactly once.
- path/<from>/<to> BFS shortest path (directed, then undirected fallback),
deterministic tie-break.
Built in parallel via agents teams (one teammate per module, isolated
worktrees), then integrated + hardened over two rounds of Codex review:
entry points no longer suppressed by unresolved-source edges (tour); only
real nodes in another file count as cross-file (neighborhood); BFS skips
phantom (unresolved) endpoints and breaks ties stably (path); layerOf
normalizes a leading slash so root-level folders classify (layers).
Tests: per-module unit suites + dispatcher routing + the codex-review
regressions. Full suite 3800 green.
The per-file extractor resolved calls intra-file only. Now it also records
unresolved call sites (raw_calls) and import bindings, and a new post-extract
resolver turns them into cross-file 'calls' edges — deterministically, no LSP.
- types.ts: optional FileExtraction.raw_calls / import_bindings + RawCall,
ImportBinding (additive; old extractions/fixtures are a no-op).
- extract/typescript.ts: extractImportBindings parses named/aliased/namespace
clauses; extractCalls records a RawCall for foo()/ns.foo() that don't resolve
in-file. Intra-file edges unchanged (resolve-first, raw-call only in else).
- resolve/cross-file.ts: export index over exported top-level nodes; relative
module resolution (.ts/.tsx/index, .js->.ts); binds named + namespace calls
to their target export. Emits EXTRACTED edges, deduped.
- snapshot.ts: buildSnapshot appends resolved edges before sorting.
High-confidence only (two Codex rounds): default imports are NOT resolved
(default export not tracked), type-only imports (import type {...} and per-
specifier { type Foo }) are skipped while a value import { type as value } is
kept, bare/aliased/barrel/dynamic are left to Phase 1.5b.
Tests: cross-file.test.ts (resolver units) + cross-file-e2e.test.ts (real
extractor -> snapshot). Full suite 3824 green.
Imports were always emitted as module -> external:<specifier>. Now, after all files are extracted, repointImportEdges rewrites a relative import whose specifier resolves (reusing B1's resolveModule) to target the real <resolvedFile>::module node, so 'imports' edges become navigable within the repo. Bare (npm) and unresolvable specifiers keep external: so the our-code-vs-dependency distinction is preserved. Pure + deterministic (new array, no mutation); the target module node always exists because resolveModule only returns known source files. Codex review: no findings. Tests: repointImportEdges units + e2e (relative -> module node, bare -> kept). Full suite 3832 green.
Heritage edges were always emitted as unresolved:<file>:<name>:<kind> and never resolved (not even same-file). resolveHeritageEdges now repoints them to the real base-type node: a same-file class/interface/type/enum declaration (intra-file, incl. non-exported), or a NAMED import resolved to its exporting file. Default/namespace/bare imports and unknown bases keep the placeholder. Extracted a shared buildExportIndex (resolveCrossFileCalls refactored onto it). Pure + deterministic; runs in buildSnapshot before sort/hash. Tests: resolveHeritageEdges units (same-file, named cross-file, implements, unknown-kept, non-heritage untouched) + e2e through the real extractor (imported base, same-file base, external base kept unresolved). Full suite 3840 green. NOTE: Codex review of this block was cut off by a Codex usage-limit error mid-run (no verdict produced); verified by self-review against the same checklist + full test suite. Re-run Codex when the limit resets.
…dex)
Codex review of B3 (on the restored quota): type-only imports were being
DROPPED, so `class Impl implements Shape` where `Shape` came from
`import type { Shape }` couldn't resolve — yet extends/implements are
type-position and a type-only import is their legitimate source.
Fix: record type-only imports with an ImportBinding.type_only flag instead of
dropping them. Calls resolution skips type_only bindings (named AND namespace
paths — a second Codex pass caught the namespace ns.foo() gap); heritage
resolution accepts them. `import { type as value }` stays a value import.
Tests: type-only named/namespace skipped for calls; type-only import resolves
implements cross-file (e2e). Full suite 3845 green. Two Codex rounds.
… (B4) Optional GraphNode fields, all additive: - signature: one-line declaration (extractor). Kind-aware: body-bearing kinds cut at the real body node (so an object-literal return type isn't truncated); const/type_alias keep their RHS. Code-point-safe truncation. - fan_in / fan_out / is_entrypoint: derived in buildSnapshot AFTER cross-file edge resolution (node-metadata.ts), so degrees reflect the full graph. is_entrypoint = exported && fan_in === 0. - show/ renders sig + entrypoint + fan_in/out; the 'Incoming (0)' caveat is reworded (cross-file resolution is partial — not proof of dead code). doc_comment deferred (needs prev-sibling comment plumbing). Two Codex rounds (signature kind-awareness, surrogate-safe truncation, body-node cut). Full suite 3852 green.
The 2-in-1: find + show in a single read. Searches with the shared findMatches (same ranking as find/), takes the top 5 matches, and expands each with its 1-hop neighborhood grouped by relation + direction (--calls-->, <--imports--, etc.), bounded at 8 neighbors/relation. Surfaces B4 metadata (signature, fan_in/out, entrypoint) per match, and saves handles so a follow-up show/<N> works just like after find/. Refactor: extracted findMatches (renderFind now uses it — behavior unchanged, Codex-confirmed). Routed query/ in handleGraphVfs; advertised in dirListing + graph-command listing. Codex review: no findings. Tests: routing, expansion, handle reuse, no-match. Full suite 3856 green.
findMatches now tokenizes the pattern on whitespace/`+`. Multi-token (e.g.
`auth+middleware`, or quoted `"auth middleware"`) requires EVERY token to
appear in a node's id or label (AND), ranked by summed per-token rank. A
single token preserves the original substring behavior. `+` lets multi-token
queries pass the VFS path intercept unquoted (which needs a single shell
operand).
Codex review: one Low note that separator-adorned single tokens (" foo ")
now trim to `foo` — intentional and strictly better (ids have no leading
spaces). Tests: AND match, whitespace separator. Full suite 3858 green.
…ll (C7) The discovery surfaces still pointed only at find/show. Now: - index.md 'How to query' leads with query/ (the 2-in-1) and lists neighborhood/layers/tour/path; cross-file caveat updated. - SessionStart inject leads with query/<pattern> and states cross-file resolution scope (relative named imports resolved; bare/aliased/dynamic not), replacing the stale 'intra-file only' line. - hivemind-graph skill (claude-code/codex/hermes) cheat sheet leads with query/ as START HERE. Advertising-only (no logic); full suite 3858 green.
The TS grammar is a superset, so .js/.jsx/.mjs/.cjs parse with the same tree-sitter pipeline (.jsx via the tsx grammar). Changes: - NodeLanguage += 'javascript'; extractTypeScript labels JS files (and their nodes) javascript after extraction. - isSourceFile + graph-on-stop SOURCE_GLOBS now include js/jsx/mjs/cjs. - resolveModule resolves JS extensions and orders candidates by the specifier's explicit extension, then the importer's own family, then the other — so a JS importer / explicit .js binds to .js, not a sibling .ts (two Codex rounds). Tests: js/jsx extraction + labeling, cross-file calls between .js files, resolveModule family/explicit-ext ordering. Full suite 3864 green.
Answers 'if I change this symbol, what's affected?' — the transitive set of DEPENDENTS via reverse BFS over the resolved edges, grouped by shortest-path depth, with the relation each dependent was reached through. Cycle-safe (first-visit wins, target excluded), deterministic (edges + frontier + depth buckets sorted), bounded (cap 80, true total still reported). Only edges whose source is a real node are traversed (defensive consistency with tour/path/neighborhood; codex P3). Honest caveat: cross-file resolution is partial, so the impact set is a LOWER BOUND. Routed impact/<pattern> + advertised in listings. Tests: transitive depth, leaf=zero, cycle no-double-count, ambiguous/no-match, cross-relation. Full suite 3873 green.
Typo tolerance without a runtime dependency (chose D3 Levenshtein over D2 fuse.js to avoid bundling a dep). findMatches falls back to fuzzyMatches ONLY when single-token substring search returns zero hits — so existing behavior is untouched when matches exist. fuzzyMatches keeps labels within edit distance max(1, floor(len/4)) of the needle (needle>=3), sorted by (distance, id), capped 25. editDistance is Levenshtein with two rolling rows + a safe row-min early-exit (Codex verified the non-decreasing-row-min property) and a length-delta short-circuit. Codex review: no correctness findings. Tests: typo→suggestion, exact-hit does not trigger fuzzy, no-near-match→No matches. Full suite 3876 green.
graph-on-stop ran only on Claude Code + Cursor. Now also on Codex and Hermes (the two other agents with the VFS graph intercept), so they self-build the graph instead of only reading one a teammate built. - esbuild: graph-on-stop added to the codex + hermes bundles, with tree-sitter/tree-sitter-typescript externalized (mirrors CC/Cursor). - Codex: registered on Stop in BOTH config sources — static codex/hooks/hooks.json AND install-codex.ts buildHooksJson (codex has two; CLI installs use the latter — Codex review P1). One Stop block carries stop.js + graph-on-stop.js. HIVEMIND_BUNDLE_FILES gains graph-on-stop.js so reinstall dedupe strips stale entries (P2). - Hermes: registered on on_session_end (session-close) in install-hermes.ts. Shared hook is agent-agnostic (ignores stdin, uses process.cwd()) and gated, so the per-turn cost stays ~5ms. OpenClaw/pi excluded (no bash intercept → graph not queryable there). Two Codex rounds. Full suite 3878 green.
Adds tree-sitter-python (same grammar-dependency pattern as tree-sitter- typescript; the native bindings are verified/rebuilt by ensure-tree-sitter.mjs) and a Python extractor producing the same FileExtraction shape: - nodes: function (def), class, method (def in class body, incl. decorated), const (module-level X = ...), module. - edges: method_of, extends (real bases only — identifier or dotted attribute's final name; keyword_argument/metaclass skipped per codex), imports (external:<spec>), calls (intra-file foo() + self.m()). - exported = name not starting with '_' (Python public-by-convention). - signature cut at the body block (not the first ':', which would truncate at a param annotation — codex). Dispatch: new src/graph/extract/index.ts extractFile() routes .py/.pyi to the Python extractor, everything else to TS. discoverSourceFiles + graph-on-stop globs include .py/.pyi; tree-sitter-python externalized in all bundles that transitively import the extractor. Cross-file Python (dotted/'.'-relative imports) is deferred: specifiers aren't './' so the shared resolver safely skips Python cross-file calls/imports while INTRA-file heritage still resolves. Two Codex rounds. Full suite 3889 green.
…indings) Found by a live smoke test on real Claude Code (--plugin-dir): - index.md 'Limitations' still claimed 'TypeScript only' and "'calls' edges are intra-file only" — false now (cross-file resolution shipped in B1-B3, JS/Python in B6/B7). The agent literally saw cross-file callers two lines above that stale caveat. Reworded to match reality. - query/<pattern> listed a repeated neighbor once per edge (a fn calling errorOutcome 3x rendered it three times). Now deduped as 'errorOutcome ×3'. Tests updated. Full suite 3890 green.
echo ${JSON.stringify(body)} only escapes " and \, not backticks or $.
Inside a double-quoted echo the shell still ran command substitution
(`find/` -> executes find/) and variable expansion, corrupting bodies
with backticks/$ (e.g. graph/index.md help text) and leaking
"find/: No such file" to stderr. JSON.stringify also turned real
newlines into literal \n that echo printed verbatim.
Add safeEchoCommand(body) -> printf '%s\n' '<single-quoted body>',
which the shell treats fully literally (no substitution/expansion,
real newlines preserved). Apply to all 10 call sites.
Tests:
- safe-echo.test.ts: 17 cases exercising backtick/$/quote/%/backslash/
multiline through a real bash AND sh, verbatim, zero stderr.
- pre-tool-use-branches.test.ts: update 3 assertions that encoded the
old buggy behavior (literal l1\nl2, echo "3") to the correct output.
The SessionStart LOCAL CODE GRAPH inject still advertised "TypeScript only" while the graph gained a Python extractor (4bc6c65) and index.md already reports "TypeScript / JavaScript / Python". An e2e run surfaced the mismatch between the system-prompt block and index.md. Align session-context.ts with the canonical wording in vfs-handler.ts: header and limitations now say TS/JS/Python, and note Python is intra-file + structure only (cross-file resolution is a follow-up). Update the rationale comment and the assertion in session-context.test.ts.
…jection cases Per codex audit. Adds the edge cases the audit flagged as missing — empty string, trailing newline, leading dash (not a printf flag) — and two explicit quote-breakout security tests (one per shell): a payload crafted to escape the single quotes and run `touch <canary>` must be emitted verbatim with the injected command never executing. These pin the anti-injection guarantee so a regression in the '\'' escaping fails loudly. No source change.
CodeRabbit (Major): JSON.parse accepts the literal "null" (and other primitives), yielding a value with no `nodes`/`links`. The subsequent `parsedSnapshot.nodes` access threw an uncaught TypeError on `null`, instead of pullSnapshot returning its documented `error` outcome. Add a `=== null || typeof !== "object"` guard right after the parse, before the Array.isArray checks. Test: a cloud row whose snapshot_jsonb is the string "null" now resolves to an error outcome (no throw, no files written) rather than rejecting.
…+ skill/eval Beats vanilla Claude Code on the comparison/enumeration question class (DeepCodeBench/graphiti, N=16: avg recall 0.787 -> 0.876, B wins 7 / loses 2). - resolve/cross-file.ts: add resolvePythonModule + dispatch resolveModule by importer extension (.py -> Python resolver). Suffix-anchored dotted-module match, drop-on-ambiguous. Wires Python through the existing cross-file calls/imports/heritage passes (0->428 cross-file calls, 0->349 repointed imports on the graphiti benchmark). +9 unit tests. - session-context.ts: reframe the SessionStart inject — the graph is a navigation INDEX to locate files, then Read them (not a grep replacement). Fixes over-reliance; `cat` not `ls`; drops the stale "Python intra-file only" wording. Inject-text test assertions updated. - skills/hivemind-graph (claude-code/codex/hermes): document all 9 endpoints (add impact/neighborhood/layers/tour/path), fix stale "TypeScript only" -> TS/JS/Python, correct the cross-file/dead-code anti-pattern, index-then-Read workflow. - evals/: A/B eval harness (vanilla vs graph) over DeepCodeBench + the step-by-step IMPROVEMENTS_LOG with the measured results.
Hermes previously spawned the graph pull worker but never rendered the graphContextLine() inject, so the agent only learned about the graph via the hivemind-graph skill (conditional). Wire graphContextLine(cwd) into the Hermes SessionStart context so the structural index is always present. Trade-off: unlike claude-code/cursor, Hermes has no model-only context channel, so this block is user-visible in the TUI. Accepted — an always-present index outweighs the extra lines. Appends nothing when no graph exists yet (null).
|
@coderabbitai review |
✅ Action performedReview finished.
|
Conflicts resolved in src/hooks/pre-tool-use.ts: - unsupported-command guidance: take main's buildRetryGuidanceDecision() (branch was unchanged from base; main refactored to the shared helper). - ls output: combine main's new Read-tool branch (writeReadCacheFileFn -> buildReadDecision) with the branch's safeEchoCommand() hardening for the Bash return path. package-lock.json regenerated against the cleanly-merged package.json. Full suite: 4027 passing.
The A/B eval harness + datasets + results are developer tooling, not product code — untrack them and gitignore evals/ so they stay local without landing in the PR. Files remain on disk; nothing is deleted.
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai — triage of the open findings, none require a code change: 1. 2. 3. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# List open review threads/comments by coderabbitai on PR `#228`
gh api repos/activeloopai/hivemind/pulls/228/reviews --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, state: .state, body: .body[:80]}'Length of output: 510 🏁 Script executed: #!/bin/bash
# List review comments (inline) by coderabbitai on PR `#228`
gh api repos/activeloopai/hivemind/pulls/228/comments --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, path: .path, line: .line, body: .body[:100]}'Length of output: 980 🏁 Script executed: #!/bin/bash
# Get pull request node ID and review thread IDs via GraphQL
gh api graphql -f query='
{
repository(owner: "activeloopai", name: "hivemind") {
pullRequest(number: 228) {
id
reviewThreads(first: 20) {
nodes {
id
isResolved
comments(first: 1) {
nodes {
databaseId
body
}
}
}
}
}
}
}'Length of output: 4221 🏁 Script executed: #!/bin/bash
# Resolve the open docs/GRAPH-ROADMAP.md MD040 thread (user acknowledged + deferred)
gh api graphql -f query='
mutation {
resolveReviewThread(input: { threadId: "PRRT_kwDOR5LKos6GmrT0" }) {
thread {
id
isResolved
}
}
}'Length of output: 392
✏️ Learnings added
|
cross-file.ts embedded a literal NUL byte as the dedup-key separator, which made git treat the whole file as binary, so diff tools (and CodeRabbit) could not review it, including the resolvePythonModule logic. Replace the literal NUL with an escaped form (backslash-u-0000): byte-for-byte identical runtime value, but the source is now plain UTF-8 text and fully reviewable/diffable. No behavior change (59 cross-file tests pass).
|
@coderabbitai review Commit Please focus this review on |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
cat src/graph/resolve/cross-file.tsLength of output: 14768
🐛 Bug (Medium) —
|
…thonModule CodeRabbit (Medium): when a relative specifier has more leading dots than the importer has parent directories (e.g. `....foo` from `pkg/sub/mod.py`), the climb loop clamped `dir` to "." and then resolved against the repo root, so an invalid import could spuriously match a root-level `foo.py`. Track the climb count and return null when the dots aren't fully consumed. Also clarify that ambiguity on any matchPythonSuffix target form drops the whole resolution. Adds a regression test (over-climb -> null). No behavior change for valid imports.
|
@coderabbitai review Addressed your
This commit touches only |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
cat src/graph/resolve/cross-file.tsLength of output: 15146
🐛 Bug (over-climbing) — ✅ Confirmed correct.
💬 Nit (ambiguous early-return comment) — ✅ Confirmed present. ⚡ Perf — acknowledged as deferred. All three findings triaged; the two code changes look good. 🐇 ✨ 🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
package.json (1)
64-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInconsistent dependency placement:
tree-sitter-pythonindependenciesbut base package inoptionalDependencies.
tree-sitter-python(line 64) is a required dependency, buttree-sitter(line 69) is optional. Sincetree-sitter-pythonrequires the basetree-sitterpackage at runtime, Python graph extraction will fail whentree-sitteris not installed. Movetree-sitter-pythontooptionalDependenciesalongside the other tree-sitter packages for consistent behavior.Suggested fix
"dependencies": { "`@anthropic-ai/sdk`": "^0.97.1", "`@huggingface/transformers`": "^3.0.0", "`@modelcontextprotocol/sdk`": "^1.29.0", "deeplake": "^0.3.30", "js-yaml": "^4.1.1", "just-bash": "^2.14.0", - "tree-sitter-python": "^0.21.0", "yargs-parser": "^22.0.0", "zod": "^4.3.6" }, "optionalDependencies": { "tree-sitter": "^0.21.1", + "tree-sitter-python": "^0.21.0", "tree-sitter-typescript": "^0.23.2" },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` around lines 64 - 70, Move the tree-sitter-python entry from dependencies into optionalDependencies so it is installed/managed together with tree-sitter and tree-sitter-typescript; update package.json by removing "tree-sitter-python": "^0.21.0" from the top-level "dependencies" object and adding the same entry to the "optionalDependencies" object where "tree-sitter" and "tree-sitter-typescript" are declared to ensure Python graph extraction has the base runtime available when tree-sitter is optional.src/commands/graph.ts (1)
44-49:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign CLI/help text with multi-language extraction support.
The command now discovers/extracts JS/Python too, but usage and runtime log still say “TypeScript source files,” which is user-facing and misleading.
Suggested patch
-const USAGE = `hivemind graph — codebase-graph commands (Phase 1 — TypeScript only) +const USAGE = `hivemind graph — codebase-graph commands (Phase 1.5 — TypeScript/JavaScript/Python) @@ - Walk the project for TypeScript source files, extract symbols + edges, + Walk the project for supported source files (TypeScript/JavaScript/Python), extract symbols + edges, @@ - console.log(`Discovered ${sourceFiles.length} TypeScript source files. Extracting...`); + console.log(`Discovered ${sourceFiles.length} source files. Extracting...`);Also applies to: 444-445
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/graph.ts` around lines 44 - 49, Update the user-facing help and runtime messages that erroneously say "TypeScript source files" to reflect multi-language support (e.g., "source files (TypeScript, JavaScript, Python)" or "TypeScript/JavaScript/Python source files"). Change the USAGE constant string (symbol: USAGE) accordingly and search for any runtime log strings that contain "TypeScript source files" (for example in the graph build command or functions like buildGraph/discoverSources) and replace them with the new multi-language phrasing so help text and logs match actual behavior.
🧹 Nitpick comments (5)
tests/shared/graph/vfs-handler.test.ts (1)
463-468: ⚡ Quick winStrengthen the assertion to verify actual path/ output content.
The
typeof r.body === "string"assertion only confirms the response is a string, which is always true forkind: "ok". Per testing guidelines, prefer asserting on specific values. The path output should contain recognizable structure — e.g., node names from the fixture, "No path", or "ambiguous".💡 Suggested improvement
it("path/<from>/<to> renders a path or a clear no-path/ambiguous message", () => { seed(); const r = handleGraphVfs("path/foo/UserModel", cwd); expect(r.kind).toBe("ok"); - if (r.kind === "ok") expect(typeof r.body).toBe("string"); + if (r.kind === "ok") { + // Assert the output contains either path structure or a clear message + expect( + r.body.includes("foo") || r.body.includes("No path") || r.body.includes("ambiguous") + ).toBe(true); + } });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/vfs-handler.test.ts` around lines 463 - 468, The test uses handleGraphVfs("path/foo/UserModel", cwd) but only asserts typeof r.body === "string"; update the assertion to verify actual output content by checking that r.body contains expected node names or known messages (e.g., assert r.body includes "foo" or "UserModel", or matches "No path" or "ambiguous"). Locate the test case (the it block calling seed() and handleGraphVfs) and replace the loose typeof check with a concrete assertion such as toContain/toMatch against those expected substrings so the test validates real path/No path/ambiguous outputs.src/graph/extract/python.ts (1)
335-340: 💤 Low valueMinor:
declByNamekey extraction relies on no colons in path.Line 338 uses
node.id.split(":")[1]to extract the method key, which assumesrelativePathcontains no colons. This is documented as relative paths with forward slashes, so it's valid, but consider using a more robust separator (e.g.,::) or extracting the key during node construction to avoid future fragility.Current behavior is correct given the documented contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/extract/python.ts` around lines 335 - 340, pushNode currently derives the method lookup key by splitting node.id on ":" (node.id.split(":")[1]) which is fragile if relativePath ever contains colons; update the code that constructs GraphNode (or wherever node.id is formed) to store the lookup key explicitly (e.g., a new property like node.key or node.lookupName) and use that property here in pushNode instead of splitting node.id, or, if you prefer a minimal change, compute the key robustly here using the known id format (e.g., take the substring after the first ":" via index lookup rather than split) so extraction does not break when paths include ":" — adjust pushNode to read the new property (or use the index-based extraction) and update callers that create GraphNode accordingly.tests/shared/graph/layers.test.ts (1)
142-144: ⚡ Quick winStrengthen this truncation check with concrete file values.
expect(out).toContain("src/graph/file")is too broad and can pass on malformed output. Assert specific rendered paths (and keep the truncation phrase assertion) to make this regression-proof.Suggested test tightening
- expect(out).toContain("src/graph/file"); + expect(out).toContain("src/graph/file0.ts"); + expect(out).toContain("src/graph/file4.ts"); expect(out).toContain("and 2 more file");As per coding guidelines,
tests/**should “Prefer asserting on specific values (paths, messages) over generic substrings.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/layers.test.ts` around lines 142 - 144, The test currently asserts a vague substring against the output ("src/graph/file") which can mask malformed rendering; update the assertions on the variable out to check for the exact, concrete file paths that should be rendered (e.g., the specific entries your test setup inserts) instead of the generic fragment, while preserving the truncation assertion expect(out).toContain("and 2 more file") so the test still verifies truncation; locate the two expect calls that reference out in this test and replace the first loose substring check with assertions for the precise file path strings produced by the code under test.tests/shared/graph/graph-command.test.ts (1)
99-108: ⚡ Quick winMake the no-snapshot response assertion specific
Line 107 currently accepts multiple broad substrings (
/no-graph|Code Graph/), which weakens the contract and can mask regressions. Prefer asserting the exact expected rendered value (or a single stable, explicit message/path fragment agreed as contract).As per coding guidelines: "
tests/**: Prefer asserting on specific values (paths, messages) over generic substrings."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/graph-command.test.ts` around lines 99 - 108, The test for tryGraphRead("cat /graph/index.md", tmp) currently uses a loose regex expect(body).toMatch(/no-graph|Code Graph/); tighten this to assert the exact stable rendered message agreed as the contract (e.g., the exact "no-graph" rendered string or the canonical "Code Graph — no snapshot" text) so the test fails on regressions; update the assertion in the "dispatches a /graph/* read to the VFS..." case to compare body === "<expected exact string>" (or use toBe with that exact string) and ensure the expected value is the canonical no-snapshot message produced by handleGraphVfs/tryGraphRead.tests/shared/graph/impact.test.ts (1)
29-37: ⚡ Quick winMake these message assertions exact instead of broad substrings.
A few checks here are loose enough to pass through formatting/message regressions. Prefer exact line/value assertions for deterministic behavior.
Suggested tightening
it("lists transitive dependents grouped by depth", () => { const body = renderImpact(chain, "d"); - expect(body).toContain("Impact of d"); - expect(body).toContain("3 dependents"); - expect(body).toContain("depth 1 (1):"); + expect(body).toContain("Impact of d"); + expect(body).toContain("3 dependents (transitive), by depth:"); + expect(body).toContain(" depth 1 (1):"); expect(body).toMatch(/depth 1 \(1\):\s*\n\s*c/); expect(body).toContain("depth 2"); expect(body).toContain("depth 3"); }); @@ it("returns a no-match message for an unknown pattern", () => { const body = renderImpact(chain, "zzz"); - expect(body).toContain("No node matches"); + expect(body).toBe('No node matches "zzz". Try cat memory/graph/find/zzz to explore.'); });As per coding guidelines
tests/**: “Prefer asserting on specific values (paths, messages) over generic substrings.”Also applies to: 58-61
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/impact.test.ts` around lines 29 - 37, The test "lists transitive dependents grouped by depth" uses broad toContain checks; replace them with exact string/line assertions against the output of renderImpact(chain, "d") so the test is deterministic: assert equality or use toMatch with anchored regexes for the full expected block including the header "Impact of d", the exact count "3 dependents", the exact depth lines such as "depth 1 (1):" followed immediately by the dependent "c" on the next line, and exact "depth 2" and "depth 3" lines; update the corresponding loose assertions in the later test (the similar checks at 58-61) the same way. Ensure you reference and assert the full expected output string or tightly anchored regexes rather than substrings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/graph/graph-command.ts`:
- Around line 50-51: The code that extracts a read target uses
rewrittenCommand.replace(/\s+2>\S+/g, "") to strip only stderr redirection, so
functions parseReadTargetPath and tryGraphRead (and the logic around const cmd =
rewrittenCommand...) still see other redirection tokens (>, >>, <, >&, etc.) and
fail to detect VFS reads; update the sanitization step used before target
extraction to remove all common shell redirections (stdout, stderr, append,
input, and combined forms) from rewrittenCommand (and the same sanitization in
the other block around lines 104-112) so parseReadTargetPath/tryGraphRead
receive a cleaned cmd; ensure you update the code paths that create const cmd
and any pre-processing used by tryGraphRead/parseReadTargetPath so they all call
the new comprehensive redirection-stripping routine.
In `@src/graph/resolve/cross-file.ts`:
- Around line 32-49: buildExportIndex and the cross-file/heritage resolvers are
name-only which lets a symbol from the wrong kind win; update the data
structures and matchers to be kind-aware: change buildExportIndex to index
exported top-level nodes by (label, kind) instead of label-only (preserve
first-wins per (label,kind)), update localIndex/exportIndex consumers
(resolveCrossFileCalls and resolveHeritageEdges) to look up by both name and
kind, modify parseUnresolved to return the kind from strings like
"unresolved:<file>:<name>:<kind>", and in resolveCrossFileCalls restrict matches
to value-callable kinds (e.g., function/class/const/enum) so type-only kinds
cannot satisfy call targets.
In `@src/hooks/cursor/pre-tool-use.ts`:
- Around line 73-77: Replace the fixed heredoc delimiter usage that constructs
echoCmd with a collision-safe generated delimiter: compute a random/unique token
that is guaranteed not to appear in graphBody (and reuse the same token for the
grep branch), then build the heredoc using that generated token (e.g., `cat
<<'DELIM'` ... `DELIM`) so no user-controlled content can terminate the heredoc
early; update references to echoCmd and any grep-related command construction in
pre-tool-use.ts to use the new token-generation and reuse logic.
In `@src/hooks/graph-on-stop.ts`:
- Around line 82-85: Update the comment that documents SOURCE_GLOBS to reflect
that the glob list includes Python files (e.g., *.py and *.pyi) in addition to
TypeScript and JavaScript; find the constant SOURCE_GLOBS in graph-on-stop.ts
and change the comment text from "TypeScript + JavaScript (B7), excluding
.d.ts." to something like "TypeScript, JavaScript, and Python (B7), excluding
.d.ts." so the comment matches the actual globs.
In `@src/hooks/hermes/session-start.ts`:
- Around line 214-216: The variable additional in session-start.ts currently
appends graphNote directly to baseWithGoals (and to baseWithGoals + rulesBlock)
which can create run-on text; update the concatenation for additional so it
inserts an explicit newline before graphNote (e.g., ensure the expression builds
`${baseWithGoals}\n${graphNote}` or
`${baseWithGoals}\n\n${rulesBlock}\n${graphNote}` depending on rulesBlock) so
that baseWithGoals/rulesBlock and graphNote are always separated by a newline;
adjust the expression that defines additional accordingly (references:
additional, baseWithGoals, rulesBlock, graphNote).
In `@src/hooks/pre-tool-use.ts`:
- Around line 478-488: The not-found branch currently constructs an unsafe
host-shell command via buildAllowDecision(`echo ${JSON.stringify(notFound)}`,
...); replace that echoed literal with a command produced by safeEchoCommand so
untrusted virtualPath characters cannot be interpreted by the shell;
specifically, call safeEchoCommand(notFound) (or
safeEchoCommand(JSON.stringify(notFound)) if you want the same quoting) and pass
its return value into buildAllowDecision while leaving the Read-path
(writeReadCacheFileFn + buildReadDecision) unchanged.
In `@tests/claude-code/safe-echo.test.ts`:
- Around line 68-69: The test is brittle because it asserts a regex on shell
error text; instead assert deterministically that the sentinel file does not
exist. Replace the runVia(...) + regex assertion with a platform-independent
existence check (e.g., use Node's fs.existsSync or fs.promises.stat) against the
canary variable and assert it returns false (or rejects/not found). Update the
test that currently calls runVia(shell, `ls '${canary}' ...`) and the
expect(...) that inspects lsOut+lsErr to use the fs-based existence check so the
assertion is exact and locale-independent.
---
Outside diff comments:
In `@package.json`:
- Around line 64-70: Move the tree-sitter-python entry from dependencies into
optionalDependencies so it is installed/managed together with tree-sitter and
tree-sitter-typescript; update package.json by removing "tree-sitter-python":
"^0.21.0" from the top-level "dependencies" object and adding the same entry to
the "optionalDependencies" object where "tree-sitter" and
"tree-sitter-typescript" are declared to ensure Python graph extraction has the
base runtime available when tree-sitter is optional.
In `@src/commands/graph.ts`:
- Around line 44-49: Update the user-facing help and runtime messages that
erroneously say "TypeScript source files" to reflect multi-language support
(e.g., "source files (TypeScript, JavaScript, Python)" or
"TypeScript/JavaScript/Python source files"). Change the USAGE constant string
(symbol: USAGE) accordingly and search for any runtime log strings that contain
"TypeScript source files" (for example in the graph build command or functions
like buildGraph/discoverSources) and replace them with the new multi-language
phrasing so help text and logs match actual behavior.
---
Nitpick comments:
In `@src/graph/extract/python.ts`:
- Around line 335-340: pushNode currently derives the method lookup key by
splitting node.id on ":" (node.id.split(":")[1]) which is fragile if
relativePath ever contains colons; update the code that constructs GraphNode (or
wherever node.id is formed) to store the lookup key explicitly (e.g., a new
property like node.key or node.lookupName) and use that property here in
pushNode instead of splitting node.id, or, if you prefer a minimal change,
compute the key robustly here using the known id format (e.g., take the
substring after the first ":" via index lookup rather than split) so extraction
does not break when paths include ":" — adjust pushNode to read the new property
(or use the index-based extraction) and update callers that create GraphNode
accordingly.
In `@tests/shared/graph/graph-command.test.ts`:
- Around line 99-108: The test for tryGraphRead("cat /graph/index.md", tmp)
currently uses a loose regex expect(body).toMatch(/no-graph|Code Graph/);
tighten this to assert the exact stable rendered message agreed as the contract
(e.g., the exact "no-graph" rendered string or the canonical "Code Graph — no
snapshot" text) so the test fails on regressions; update the assertion in the
"dispatches a /graph/* read to the VFS..." case to compare body === "<expected
exact string>" (or use toBe with that exact string) and ensure the expected
value is the canonical no-snapshot message produced by
handleGraphVfs/tryGraphRead.
In `@tests/shared/graph/impact.test.ts`:
- Around line 29-37: The test "lists transitive dependents grouped by depth"
uses broad toContain checks; replace them with exact string/line assertions
against the output of renderImpact(chain, "d") so the test is deterministic:
assert equality or use toMatch with anchored regexes for the full expected block
including the header "Impact of d", the exact count "3 dependents", the exact
depth lines such as "depth 1 (1):" followed immediately by the dependent "c" on
the next line, and exact "depth 2" and "depth 3" lines; update the corresponding
loose assertions in the later test (the similar checks at 58-61) the same way.
Ensure you reference and assert the full expected output string or tightly
anchored regexes rather than substrings.
In `@tests/shared/graph/layers.test.ts`:
- Around line 142-144: The test currently asserts a vague substring against the
output ("src/graph/file") which can mask malformed rendering; update the
assertions on the variable out to check for the exact, concrete file paths that
should be rendered (e.g., the specific entries your test setup inserts) instead
of the generic fragment, while preserving the truncation assertion
expect(out).toContain("and 2 more file") so the test still verifies truncation;
locate the two expect calls that reference out in this test and replace the
first loose substring check with assertions for the precise file path strings
produced by the code under test.
In `@tests/shared/graph/vfs-handler.test.ts`:
- Around line 463-468: The test uses handleGraphVfs("path/foo/UserModel", cwd)
but only asserts typeof r.body === "string"; update the assertion to verify
actual output content by checking that r.body contains expected node names or
known messages (e.g., assert r.body includes "foo" or "UserModel", or matches
"No path" or "ambiguous"). Locate the test case (the it block calling seed() and
handleGraphVfs) and replace the loose typeof check with a concrete assertion
such as toContain/toMatch against those expected substrings so the test
validates real path/No path/ambiguous outputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 918a860e-a946-40d9-9144-651118e1c980
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json,!**/package-lock.json
📒 Files selected for processing (54)
.gitignoreclaude-code/skills/hivemind-graph/SKILL.mdcodex/hooks/hooks.jsoncodex/skills/hivemind-graph/SKILL.mdesbuild.config.mjshermes/skills/hivemind-graph/SKILL.mdpackage.jsonscripts/ensure-tree-sitter.mjssrc/cli/install-codex.tssrc/cli/install-cursor.tssrc/cli/install-hermes.tssrc/commands/graph.tssrc/graph/deeplake-pull.tssrc/graph/extract/index.tssrc/graph/extract/python.tssrc/graph/extract/typescript.tssrc/graph/graph-command.tssrc/graph/node-metadata.tssrc/graph/render/impact.tssrc/graph/render/layers.tssrc/graph/render/neighborhood.tssrc/graph/render/path.tssrc/graph/render/tour.tssrc/graph/resolve/cross-file.tssrc/graph/session-context.tssrc/graph/snapshot.tssrc/graph/types.tssrc/graph/vfs-handler.tssrc/hooks/codex/pre-tool-use.tssrc/hooks/cursor/pre-tool-use.tssrc/hooks/cursor/session-start.tssrc/hooks/graph-on-stop.tssrc/hooks/hermes/pre-tool-use.tssrc/hooks/hermes/session-start.tssrc/hooks/pre-tool-use.tstests/claude-code/pre-tool-use-branches.test.tstests/claude-code/safe-echo.test.tstests/cli/cli-install-codex-fs.test.tstests/cli/cli-install-hermes.test.tstests/shared/graph/cross-file-e2e.test.tstests/shared/graph/cross-file.test.tstests/shared/graph/deeplake-pull.test.tstests/shared/graph/graph-command.test.tstests/shared/graph/impact.test.tstests/shared/graph/javascript.test.tstests/shared/graph/layers.test.tstests/shared/graph/neighborhood.test.tstests/shared/graph/node-metadata.test.tstests/shared/graph/path.test.tstests/shared/graph/python-cross-file.test.tstests/shared/graph/python.test.tstests/shared/graph/session-context.test.tstests/shared/graph/tour.test.tstests/shared/graph/vfs-handler.test.ts
✅ Files skipped from review due to trivial changes (8)
- .gitignore
- claude-code/skills/hivemind-graph/SKILL.md
- hermes/skills/hivemind-graph/SKILL.md
- tests/cli/cli-install-codex-fs.test.ts
- src/graph/render/neighborhood.ts
- codex/skills/hivemind-graph/SKILL.md
- tests/claude-code/pre-tool-use-branches.test.ts
- codex/hooks/hooks.json
🚧 Files skipped from review as they are similar to previous changes (2)
- src/graph/deeplake-pull.ts
- tests/shared/graph/deeplake-pull.test.ts
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
package.json (1)
64-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInconsistent dependency placement:
tree-sitter-pythonindependenciesbut base package inoptionalDependencies.
tree-sitter-python(line 64) is a required dependency, buttree-sitter(line 69) is optional. Sincetree-sitter-pythonrequires the basetree-sitterpackage at runtime, Python graph extraction will fail whentree-sitteris not installed. Movetree-sitter-pythontooptionalDependenciesalongside the other tree-sitter packages for consistent behavior.Suggested fix
"dependencies": { "`@anthropic-ai/sdk`": "^0.97.1", "`@huggingface/transformers`": "^3.0.0", "`@modelcontextprotocol/sdk`": "^1.29.0", "deeplake": "^0.3.30", "js-yaml": "^4.1.1", "just-bash": "^2.14.0", - "tree-sitter-python": "^0.21.0", "yargs-parser": "^22.0.0", "zod": "^4.3.6" }, "optionalDependencies": { "tree-sitter": "^0.21.1", + "tree-sitter-python": "^0.21.0", "tree-sitter-typescript": "^0.23.2" },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` around lines 64 - 70, Move the tree-sitter-python entry from dependencies into optionalDependencies so it is installed/managed together with tree-sitter and tree-sitter-typescript; update package.json by removing "tree-sitter-python": "^0.21.0" from the top-level "dependencies" object and adding the same entry to the "optionalDependencies" object where "tree-sitter" and "tree-sitter-typescript" are declared to ensure Python graph extraction has the base runtime available when tree-sitter is optional.src/commands/graph.ts (1)
44-49:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign CLI/help text with multi-language extraction support.
The command now discovers/extracts JS/Python too, but usage and runtime log still say “TypeScript source files,” which is user-facing and misleading.
Suggested patch
-const USAGE = `hivemind graph — codebase-graph commands (Phase 1 — TypeScript only) +const USAGE = `hivemind graph — codebase-graph commands (Phase 1.5 — TypeScript/JavaScript/Python) @@ - Walk the project for TypeScript source files, extract symbols + edges, + Walk the project for supported source files (TypeScript/JavaScript/Python), extract symbols + edges, @@ - console.log(`Discovered ${sourceFiles.length} TypeScript source files. Extracting...`); + console.log(`Discovered ${sourceFiles.length} source files. Extracting...`);Also applies to: 444-445
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/graph.ts` around lines 44 - 49, Update the user-facing help and runtime messages that erroneously say "TypeScript source files" to reflect multi-language support (e.g., "source files (TypeScript, JavaScript, Python)" or "TypeScript/JavaScript/Python source files"). Change the USAGE constant string (symbol: USAGE) accordingly and search for any runtime log strings that contain "TypeScript source files" (for example in the graph build command or functions like buildGraph/discoverSources) and replace them with the new multi-language phrasing so help text and logs match actual behavior.
🧹 Nitpick comments (5)
tests/shared/graph/vfs-handler.test.ts (1)
463-468: ⚡ Quick winStrengthen the assertion to verify actual path/ output content.
The
typeof r.body === "string"assertion only confirms the response is a string, which is always true forkind: "ok". Per testing guidelines, prefer asserting on specific values. The path output should contain recognizable structure — e.g., node names from the fixture, "No path", or "ambiguous".💡 Suggested improvement
it("path/<from>/<to> renders a path or a clear no-path/ambiguous message", () => { seed(); const r = handleGraphVfs("path/foo/UserModel", cwd); expect(r.kind).toBe("ok"); - if (r.kind === "ok") expect(typeof r.body).toBe("string"); + if (r.kind === "ok") { + // Assert the output contains either path structure or a clear message + expect( + r.body.includes("foo") || r.body.includes("No path") || r.body.includes("ambiguous") + ).toBe(true); + } });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/vfs-handler.test.ts` around lines 463 - 468, The test uses handleGraphVfs("path/foo/UserModel", cwd) but only asserts typeof r.body === "string"; update the assertion to verify actual output content by checking that r.body contains expected node names or known messages (e.g., assert r.body includes "foo" or "UserModel", or matches "No path" or "ambiguous"). Locate the test case (the it block calling seed() and handleGraphVfs) and replace the loose typeof check with a concrete assertion such as toContain/toMatch against those expected substrings so the test validates real path/No path/ambiguous outputs.src/graph/extract/python.ts (1)
335-340: 💤 Low valueMinor:
declByNamekey extraction relies on no colons in path.Line 338 uses
node.id.split(":")[1]to extract the method key, which assumesrelativePathcontains no colons. This is documented as relative paths with forward slashes, so it's valid, but consider using a more robust separator (e.g.,::) or extracting the key during node construction to avoid future fragility.Current behavior is correct given the documented contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/extract/python.ts` around lines 335 - 340, pushNode currently derives the method lookup key by splitting node.id on ":" (node.id.split(":")[1]) which is fragile if relativePath ever contains colons; update the code that constructs GraphNode (or wherever node.id is formed) to store the lookup key explicitly (e.g., a new property like node.key or node.lookupName) and use that property here in pushNode instead of splitting node.id, or, if you prefer a minimal change, compute the key robustly here using the known id format (e.g., take the substring after the first ":" via index lookup rather than split) so extraction does not break when paths include ":" — adjust pushNode to read the new property (or use the index-based extraction) and update callers that create GraphNode accordingly.tests/shared/graph/layers.test.ts (1)
142-144: ⚡ Quick winStrengthen this truncation check with concrete file values.
expect(out).toContain("src/graph/file")is too broad and can pass on malformed output. Assert specific rendered paths (and keep the truncation phrase assertion) to make this regression-proof.Suggested test tightening
- expect(out).toContain("src/graph/file"); + expect(out).toContain("src/graph/file0.ts"); + expect(out).toContain("src/graph/file4.ts"); expect(out).toContain("and 2 more file");As per coding guidelines,
tests/**should “Prefer asserting on specific values (paths, messages) over generic substrings.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/layers.test.ts` around lines 142 - 144, The test currently asserts a vague substring against the output ("src/graph/file") which can mask malformed rendering; update the assertions on the variable out to check for the exact, concrete file paths that should be rendered (e.g., the specific entries your test setup inserts) instead of the generic fragment, while preserving the truncation assertion expect(out).toContain("and 2 more file") so the test still verifies truncation; locate the two expect calls that reference out in this test and replace the first loose substring check with assertions for the precise file path strings produced by the code under test.tests/shared/graph/graph-command.test.ts (1)
99-108: ⚡ Quick winMake the no-snapshot response assertion specific
Line 107 currently accepts multiple broad substrings (
/no-graph|Code Graph/), which weakens the contract and can mask regressions. Prefer asserting the exact expected rendered value (or a single stable, explicit message/path fragment agreed as contract).As per coding guidelines: "
tests/**: Prefer asserting on specific values (paths, messages) over generic substrings."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/graph-command.test.ts` around lines 99 - 108, The test for tryGraphRead("cat /graph/index.md", tmp) currently uses a loose regex expect(body).toMatch(/no-graph|Code Graph/); tighten this to assert the exact stable rendered message agreed as the contract (e.g., the exact "no-graph" rendered string or the canonical "Code Graph — no snapshot" text) so the test fails on regressions; update the assertion in the "dispatches a /graph/* read to the VFS..." case to compare body === "<expected exact string>" (or use toBe with that exact string) and ensure the expected value is the canonical no-snapshot message produced by handleGraphVfs/tryGraphRead.tests/shared/graph/impact.test.ts (1)
29-37: ⚡ Quick winMake these message assertions exact instead of broad substrings.
A few checks here are loose enough to pass through formatting/message regressions. Prefer exact line/value assertions for deterministic behavior.
Suggested tightening
it("lists transitive dependents grouped by depth", () => { const body = renderImpact(chain, "d"); - expect(body).toContain("Impact of d"); - expect(body).toContain("3 dependents"); - expect(body).toContain("depth 1 (1):"); + expect(body).toContain("Impact of d"); + expect(body).toContain("3 dependents (transitive), by depth:"); + expect(body).toContain(" depth 1 (1):"); expect(body).toMatch(/depth 1 \(1\):\s*\n\s*c/); expect(body).toContain("depth 2"); expect(body).toContain("depth 3"); }); @@ it("returns a no-match message for an unknown pattern", () => { const body = renderImpact(chain, "zzz"); - expect(body).toContain("No node matches"); + expect(body).toBe('No node matches "zzz". Try cat memory/graph/find/zzz to explore.'); });As per coding guidelines
tests/**: “Prefer asserting on specific values (paths, messages) over generic substrings.”Also applies to: 58-61
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/graph/impact.test.ts` around lines 29 - 37, The test "lists transitive dependents grouped by depth" uses broad toContain checks; replace them with exact string/line assertions against the output of renderImpact(chain, "d") so the test is deterministic: assert equality or use toMatch with anchored regexes for the full expected block including the header "Impact of d", the exact count "3 dependents", the exact depth lines such as "depth 1 (1):" followed immediately by the dependent "c" on the next line, and exact "depth 2" and "depth 3" lines; update the corresponding loose assertions in the later test (the similar checks at 58-61) the same way. Ensure you reference and assert the full expected output string or tightly anchored regexes rather than substrings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/graph/graph-command.ts`:
- Around line 50-51: The code that extracts a read target uses
rewrittenCommand.replace(/\s+2>\S+/g, "") to strip only stderr redirection, so
functions parseReadTargetPath and tryGraphRead (and the logic around const cmd =
rewrittenCommand...) still see other redirection tokens (>, >>, <, >&, etc.) and
fail to detect VFS reads; update the sanitization step used before target
extraction to remove all common shell redirections (stdout, stderr, append,
input, and combined forms) from rewrittenCommand (and the same sanitization in
the other block around lines 104-112) so parseReadTargetPath/tryGraphRead
receive a cleaned cmd; ensure you update the code paths that create const cmd
and any pre-processing used by tryGraphRead/parseReadTargetPath so they all call
the new comprehensive redirection-stripping routine.
In `@src/graph/resolve/cross-file.ts`:
- Around line 32-49: buildExportIndex and the cross-file/heritage resolvers are
name-only which lets a symbol from the wrong kind win; update the data
structures and matchers to be kind-aware: change buildExportIndex to index
exported top-level nodes by (label, kind) instead of label-only (preserve
first-wins per (label,kind)), update localIndex/exportIndex consumers
(resolveCrossFileCalls and resolveHeritageEdges) to look up by both name and
kind, modify parseUnresolved to return the kind from strings like
"unresolved:<file>:<name>:<kind>", and in resolveCrossFileCalls restrict matches
to value-callable kinds (e.g., function/class/const/enum) so type-only kinds
cannot satisfy call targets.
In `@src/hooks/cursor/pre-tool-use.ts`:
- Around line 73-77: Replace the fixed heredoc delimiter usage that constructs
echoCmd with a collision-safe generated delimiter: compute a random/unique token
that is guaranteed not to appear in graphBody (and reuse the same token for the
grep branch), then build the heredoc using that generated token (e.g., `cat
<<'DELIM'` ... `DELIM`) so no user-controlled content can terminate the heredoc
early; update references to echoCmd and any grep-related command construction in
pre-tool-use.ts to use the new token-generation and reuse logic.
In `@src/hooks/graph-on-stop.ts`:
- Around line 82-85: Update the comment that documents SOURCE_GLOBS to reflect
that the glob list includes Python files (e.g., *.py and *.pyi) in addition to
TypeScript and JavaScript; find the constant SOURCE_GLOBS in graph-on-stop.ts
and change the comment text from "TypeScript + JavaScript (B7), excluding
.d.ts." to something like "TypeScript, JavaScript, and Python (B7), excluding
.d.ts." so the comment matches the actual globs.
In `@src/hooks/hermes/session-start.ts`:
- Around line 214-216: The variable additional in session-start.ts currently
appends graphNote directly to baseWithGoals (and to baseWithGoals + rulesBlock)
which can create run-on text; update the concatenation for additional so it
inserts an explicit newline before graphNote (e.g., ensure the expression builds
`${baseWithGoals}\n${graphNote}` or
`${baseWithGoals}\n\n${rulesBlock}\n${graphNote}` depending on rulesBlock) so
that baseWithGoals/rulesBlock and graphNote are always separated by a newline;
adjust the expression that defines additional accordingly (references:
additional, baseWithGoals, rulesBlock, graphNote).
In `@src/hooks/pre-tool-use.ts`:
- Around line 478-488: The not-found branch currently constructs an unsafe
host-shell command via buildAllowDecision(`echo ${JSON.stringify(notFound)}`,
...); replace that echoed literal with a command produced by safeEchoCommand so
untrusted virtualPath characters cannot be interpreted by the shell;
specifically, call safeEchoCommand(notFound) (or
safeEchoCommand(JSON.stringify(notFound)) if you want the same quoting) and pass
its return value into buildAllowDecision while leaving the Read-path
(writeReadCacheFileFn + buildReadDecision) unchanged.
In `@tests/claude-code/safe-echo.test.ts`:
- Around line 68-69: The test is brittle because it asserts a regex on shell
error text; instead assert deterministically that the sentinel file does not
exist. Replace the runVia(...) + regex assertion with a platform-independent
existence check (e.g., use Node's fs.existsSync or fs.promises.stat) against the
canary variable and assert it returns false (or rejects/not found). Update the
test that currently calls runVia(shell, `ls '${canary}' ...`) and the
expect(...) that inspects lsOut+lsErr to use the fs-based existence check so the
assertion is exact and locale-independent.
---
Outside diff comments:
In `@package.json`:
- Around line 64-70: Move the tree-sitter-python entry from dependencies into
optionalDependencies so it is installed/managed together with tree-sitter and
tree-sitter-typescript; update package.json by removing "tree-sitter-python":
"^0.21.0" from the top-level "dependencies" object and adding the same entry to
the "optionalDependencies" object where "tree-sitter" and
"tree-sitter-typescript" are declared to ensure Python graph extraction has the
base runtime available when tree-sitter is optional.
In `@src/commands/graph.ts`:
- Around line 44-49: Update the user-facing help and runtime messages that
erroneously say "TypeScript source files" to reflect multi-language support
(e.g., "source files (TypeScript, JavaScript, Python)" or
"TypeScript/JavaScript/Python source files"). Change the USAGE constant string
(symbol: USAGE) accordingly and search for any runtime log strings that contain
"TypeScript source files" (for example in the graph build command or functions
like buildGraph/discoverSources) and replace them with the new multi-language
phrasing so help text and logs match actual behavior.
---
Nitpick comments:
In `@src/graph/extract/python.ts`:
- Around line 335-340: pushNode currently derives the method lookup key by
splitting node.id on ":" (node.id.split(":")[1]) which is fragile if
relativePath ever contains colons; update the code that constructs GraphNode (or
wherever node.id is formed) to store the lookup key explicitly (e.g., a new
property like node.key or node.lookupName) and use that property here in
pushNode instead of splitting node.id, or, if you prefer a minimal change,
compute the key robustly here using the known id format (e.g., take the
substring after the first ":" via index lookup rather than split) so extraction
does not break when paths include ":" — adjust pushNode to read the new property
(or use the index-based extraction) and update callers that create GraphNode
accordingly.
In `@tests/shared/graph/graph-command.test.ts`:
- Around line 99-108: The test for tryGraphRead("cat /graph/index.md", tmp)
currently uses a loose regex expect(body).toMatch(/no-graph|Code Graph/);
tighten this to assert the exact stable rendered message agreed as the contract
(e.g., the exact "no-graph" rendered string or the canonical "Code Graph — no
snapshot" text) so the test fails on regressions; update the assertion in the
"dispatches a /graph/* read to the VFS..." case to compare body === "<expected
exact string>" (or use toBe with that exact string) and ensure the expected
value is the canonical no-snapshot message produced by
handleGraphVfs/tryGraphRead.
In `@tests/shared/graph/impact.test.ts`:
- Around line 29-37: The test "lists transitive dependents grouped by depth"
uses broad toContain checks; replace them with exact string/line assertions
against the output of renderImpact(chain, "d") so the test is deterministic:
assert equality or use toMatch with anchored regexes for the full expected block
including the header "Impact of d", the exact count "3 dependents", the exact
depth lines such as "depth 1 (1):" followed immediately by the dependent "c" on
the next line, and exact "depth 2" and "depth 3" lines; update the corresponding
loose assertions in the later test (the similar checks at 58-61) the same way.
Ensure you reference and assert the full expected output string or tightly
anchored regexes rather than substrings.
In `@tests/shared/graph/layers.test.ts`:
- Around line 142-144: The test currently asserts a vague substring against the
output ("src/graph/file") which can mask malformed rendering; update the
assertions on the variable out to check for the exact, concrete file paths that
should be rendered (e.g., the specific entries your test setup inserts) instead
of the generic fragment, while preserving the truncation assertion
expect(out).toContain("and 2 more file") so the test still verifies truncation;
locate the two expect calls that reference out in this test and replace the
first loose substring check with assertions for the precise file path strings
produced by the code under test.
In `@tests/shared/graph/vfs-handler.test.ts`:
- Around line 463-468: The test uses handleGraphVfs("path/foo/UserModel", cwd)
but only asserts typeof r.body === "string"; update the assertion to verify
actual output content by checking that r.body contains expected node names or
known messages (e.g., assert r.body includes "foo" or "UserModel", or matches
"No path" or "ambiguous"). Locate the test case (the it block calling seed() and
handleGraphVfs) and replace the loose typeof check with a concrete assertion
such as toContain/toMatch against those expected substrings so the test
validates real path/No path/ambiguous outputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 918a860e-a946-40d9-9144-651118e1c980
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json,!**/package-lock.json
📒 Files selected for processing (54)
.gitignoreclaude-code/skills/hivemind-graph/SKILL.mdcodex/hooks/hooks.jsoncodex/skills/hivemind-graph/SKILL.mdesbuild.config.mjshermes/skills/hivemind-graph/SKILL.mdpackage.jsonscripts/ensure-tree-sitter.mjssrc/cli/install-codex.tssrc/cli/install-cursor.tssrc/cli/install-hermes.tssrc/commands/graph.tssrc/graph/deeplake-pull.tssrc/graph/extract/index.tssrc/graph/extract/python.tssrc/graph/extract/typescript.tssrc/graph/graph-command.tssrc/graph/node-metadata.tssrc/graph/render/impact.tssrc/graph/render/layers.tssrc/graph/render/neighborhood.tssrc/graph/render/path.tssrc/graph/render/tour.tssrc/graph/resolve/cross-file.tssrc/graph/session-context.tssrc/graph/snapshot.tssrc/graph/types.tssrc/graph/vfs-handler.tssrc/hooks/codex/pre-tool-use.tssrc/hooks/cursor/pre-tool-use.tssrc/hooks/cursor/session-start.tssrc/hooks/graph-on-stop.tssrc/hooks/hermes/pre-tool-use.tssrc/hooks/hermes/session-start.tssrc/hooks/pre-tool-use.tstests/claude-code/pre-tool-use-branches.test.tstests/claude-code/safe-echo.test.tstests/cli/cli-install-codex-fs.test.tstests/cli/cli-install-hermes.test.tstests/shared/graph/cross-file-e2e.test.tstests/shared/graph/cross-file.test.tstests/shared/graph/deeplake-pull.test.tstests/shared/graph/graph-command.test.tstests/shared/graph/impact.test.tstests/shared/graph/javascript.test.tstests/shared/graph/layers.test.tstests/shared/graph/neighborhood.test.tstests/shared/graph/node-metadata.test.tstests/shared/graph/path.test.tstests/shared/graph/python-cross-file.test.tstests/shared/graph/python.test.tstests/shared/graph/session-context.test.tstests/shared/graph/tour.test.tstests/shared/graph/vfs-handler.test.ts
✅ Files skipped from review due to trivial changes (8)
- .gitignore
- claude-code/skills/hivemind-graph/SKILL.md
- hermes/skills/hivemind-graph/SKILL.md
- tests/cli/cli-install-codex-fs.test.ts
- src/graph/render/neighborhood.ts
- codex/skills/hivemind-graph/SKILL.md
- tests/claude-code/pre-tool-use-branches.test.ts
- codex/hooks/hooks.json
🚧 Files skipped from review as they are similar to previous changes (2)
- src/graph/deeplake-pull.ts
- tests/shared/graph/deeplake-pull.test.ts
🛑 Comments failed to post (7)
src/graph/graph-command.ts (1)
50-51:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle standard shell redirections before target extraction.
parseReadTargetPath/tryGraphReadonly strip2>.... Common reads likecat /graph/index.md >/tmp/xorhead /graph/index.md < /tmp/inleave redirection tokens as operands, so this path returnsnulland misses VFS interception.Suggested patch
+function stripSimpleRedirections(cmd: string): string { + // Best-effort: remove simple stdin/stdout/stderr redirects used in read commands. + return cmd.replace(/\s+(?:\d+>>?|\d+>|>>|>|<)\s*\S+/g, ""); +} + export function parseReadTargetPath(rewrittenCommand: string): string | null { - const cmd = rewrittenCommand.replace(/\s+2>\S+/g, "").trim(); + const cmd = stripSimpleRedirections(rewrittenCommand).trim(); @@ export function tryGraphRead(rewrittenCommand: string, cwd: string): string | null { - const ls = rewrittenCommand.replace(/\s+2>\S+/g, "").trim().match(/^ls\s+(?:-\S+\s+)*(\S+)\s*$/); + const ls = stripSimpleRedirections(rewrittenCommand).trim().match(/^ls\s+(?:-\S+\s+)*(\S+)\s*$/);Also applies to: 104-112
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/graph-command.ts` around lines 50 - 51, The code that extracts a read target uses rewrittenCommand.replace(/\s+2>\S+/g, "") to strip only stderr redirection, so functions parseReadTargetPath and tryGraphRead (and the logic around const cmd = rewrittenCommand...) still see other redirection tokens (>, >>, <, >&, etc.) and fail to detect VFS reads; update the sanitization step used before target extraction to remove all common shell redirections (stdout, stderr, append, input, and combined forms) from rewrittenCommand (and the same sanitization in the other block around lines 104-112) so parseReadTargetPath/tryGraphRead receive a cleaned cmd; ensure you update the code paths that create const cmd and any pre-processing used by tryGraphRead/parseReadTargetPath so they all call the new comprehensive redirection-stripping routine.src/graph/resolve/cross-file.ts (1)
32-49:
⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail FILE="src/graph/resolve/cross-file.ts" # Show file header + the referenced regions with line numbers sed -n '1,120p' "$FILE" echo "----" sed -n '120,220p' "$FILE" echo "----" sed -n '220,320p' "$FILE" echo "----" # Also grep for the key functions to see their actual signatures/uses rg -n "buildExportIndex|parseUnresolved|resolveOne|resolveHeritageEdges|calls|implements|extends|EXPORTABLE_KINDS|HERITAGE_KINDS" "$FILE"Repository: activeloopai/hivemind
Length of output: 14689
🏁 Script executed:
#!/bin/bash set -euo pipefail rg -n "unresolved:<file>:|UNRESOLVED_PREFIX|unresolved:" src | head -n 200Repository: activeloopai/hivemind
Length of output: 1172
🏁 Script executed:
#!/bin/bash set -euo pipefail rg -n "UNRESOLVED_PREFIX|unresolved:" src | head -n 200Repository: activeloopai/hivemind
Length of output: 1172
🏁 Script executed:
#!/bin/bash set -euo pipefail FILE_TS="src/graph/extract/typescript.ts" FILE_PY="src/graph/extract/python.ts" FILE_CACHE="src/graph/cache.ts" FILE_SNAPSHOT="src/graph/snapshot.ts" FILE_CROSS="src/graph/resolve/cross-file.ts" echo "---- $FILE_TS (around unresolved generator)" sed -n '700,780p' "$FILE_TS" echo "---- $FILE_CROSS (parseUnresolved + resolveHeritageEdges context)" sed -n '140,280p' "$FILE_CROSS" echo "---- $FILE_CACHE (unresolved relocation semantics)" sed -n '180,260p' "$FILE_CACHE" echo "---- $FILE_SNAPSHOT (unresolved placeholder usage)" sed -n '50,110p' "$FILE_SNAPSHOT"Repository: activeloopai/hivemind
Length of output: 13515
Make cross-file export/heritage resolution kind-aware to prevent wrong edge targets.
In
src/graph/resolve/cross-file.ts,buildExportIndexis a name-only (source_file -> label -> id) “first wins” map and is used byresolveCrossFileCalls, so if a module exports multiple declarations with the same name but differentkind(type/value spaces), thecallsedge can bind to the wrong node. Similarly,resolveHeritageEdgesreceivesunresolved:<file>:<name>:<kind>, butparseUnresolveddiscards<kind>and both the same-filelocalIndexand cross-fileexportIndexresolve heritage by name only, soextends/implementsedges can also target the wrong kind.Suggested direction
- function buildExportIndex(nodes: readonly GraphNode[]): Map<string, Map<string, string>> { + function buildExportIndex(nodes: readonly GraphNode[]): Map<string, Map<string, Map<GraphNode["kind"], string>>> { ... - if (!m.has(n.label)) m.set(n.label, n.id); // first wins (no clashes in valid TS) + let byKind = m.get(n.label); + if (!byKind) { byKind = new Map(); m.set(n.label, byKind); } + if (!byKind.has(n.kind)) byKind.set(n.kind, n.id); } - function parseUnresolved(target: string): { file: string; name: string } | null + function parseUnresolved(target: string): { file: string; name: string; kind: GraphNode["kind"] } | nullUse kind-aware selection:
- for
resolveCrossFileCalls, restrict to value-callable kinds (e.g.,function/class/const/enum) instead of allowing type-only kinds to win onlabel;- for
resolveHeritageEdges, returnkindfromparseUnresolvedand keylocalIndex/exportIndexby(name, kind)rather than name-only.<details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@src/graph/resolve/cross-file.tsaround lines 32 - 49, buildExportIndex and
the cross-file/heritage resolvers are name-only which lets a symbol from the
wrong kind win; update the data structures and matchers to be kind-aware: change
buildExportIndex to index exported top-level nodes by (label, kind) instead of
label-only (preserve first-wins per (label,kind)), update localIndex/exportIndex
consumers (resolveCrossFileCalls and resolveHeritageEdges) to look up by both
name and kind, modify parseUnresolved to return the kind from strings like
"unresolved:::", and in resolveCrossFileCalls restrict matches
to value-callable kinds (e.g., function/class/const/enum) so type-only kinds
cannot satisfy call targets.</details> <!-- fingerprinting:phantom:poseidon:hawk --> <!-- This is an auto-generated comment by CodeRabbit --> </blockquote></details> <details> <summary>src/hooks/cursor/pre-tool-use.ts (1)</summary><blockquote> 73-77: _⚠️ Potential issue_ | _🔴 Critical_ | _⚡ Quick win_ **Use a collision-safe literal-output builder for heredoc commands.** The fixed delimiter allows command injection if `graphBody` contains a standalone `__HIVEMIND_RESULT__` line; trailing lines would execute in the host shell. Please switch to a generated delimiter that is guaranteed absent from payload, and reuse it for both graph and grep branches. <details> <summary>🔧 Proposed fix</summary> ```diff +function toLiteralCatCommand(body: string): string { + let marker = "__HIVEMIND_RESULT__"; + while (new RegExp(`(^|\\n)${marker}(\\n|$)`).test(body)) { + marker = `__HIVEMIND_RESULT__${Math.random().toString(36).slice(2)}`; + } + return `cat <<'${marker}'\n${body}\n${marker}`; +} + const graphBody = tryGraphRead(rewritten, input.cwd ?? process.cwd()); if (graphBody !== null) { log(`graph vfs intercept: ${command.slice(0, 80)}`); - const echoCmd = `cat <<'__HIVEMIND_RESULT__'\n${graphBody}\n__HIVEMIND_RESULT__`; + const echoCmd = toLiteralCatCommand(graphBody); process.stdout.write(JSON.stringify({ permission: "allow", updated_input: { command: echoCmd }, agent_message: "[Hivemind graph]", })); return; } @@ - const echoCmd = `cat <<'__HIVEMIND_RESULT__'\n${result}\n__HIVEMIND_RESULT__`; + const echoCmd = toLiteralCatCommand(result);As per coding guidelines, the key invariant is that a memory-touching flow must never end up executing unintended host-shell commands.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/cursor/pre-tool-use.ts` around lines 73 - 77, Replace the fixed heredoc delimiter usage that constructs echoCmd with a collision-safe generated delimiter: compute a random/unique token that is guaranteed not to appear in graphBody (and reuse the same token for the grep branch), then build the heredoc using that generated token (e.g., `cat <<'DELIM'` ... `DELIM`) so no user-controlled content can terminate the heredoc early; update references to echoCmd and any grep-related command construction in pre-tool-use.ts to use the new token-generation and reuse logic.src/hooks/graph-on-stop.ts (1)
82-85:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the SOURCE_GLOBS comment to include Python.
The comment says TypeScript+JavaScript, but the glob list now also includes
*.pyand*.pyi. Keeping this aligned avoids future maintenance drift.Suggested patch
- * in src/commands/graph.ts — TypeScript + JavaScript (B7), excluding .d.ts. + * in src/commands/graph.ts — TypeScript + JavaScript + Python, excluding .d.ts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/graph-on-stop.ts` around lines 82 - 85, Update the comment that documents SOURCE_GLOBS to reflect that the glob list includes Python files (e.g., *.py and *.pyi) in addition to TypeScript and JavaScript; find the constant SOURCE_GLOBS in graph-on-stop.ts and change the comment text from "TypeScript + JavaScript (B7), excluding .d.ts." to something like "TypeScript, JavaScript, and Python (B7), excluding .d.ts." so the comment matches the actual globs.src/hooks/hermes/session-start.ts (1)
214-216:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd an explicit newline before
graphNoteto avoid run-on context text.
additionalcurrently concatenatesgraphNotedirectly, which can merge lines in the user-visible Hermes context.🔧 Proposed fix
- const graphNote = graphContextLine(cwd) ?? ""; - const additional = (rulesBlock - ? `${baseWithGoals}\n\n${rulesBlock}` - : baseWithGoals) + graphNote; + const graphNote = graphContextLine(cwd); + const baseAdditional = rulesBlock + ? `${baseWithGoals}\n\n${rulesBlock}` + : baseWithGoals; + const additional = graphNote + ? `${baseAdditional}\n${graphNote}` + : baseAdditional;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/hermes/session-start.ts` around lines 214 - 216, The variable additional in session-start.ts currently appends graphNote directly to baseWithGoals (and to baseWithGoals + rulesBlock) which can create run-on text; update the concatenation for additional so it inserts an explicit newline before graphNote (e.g., ensure the expression builds `${baseWithGoals}\n${graphNote}` or `${baseWithGoals}\n\n${rulesBlock}\n${graphNote}` depending on rulesBlock) so that baseWithGoals/rulesBlock and graphNote are always separated by a newline; adjust the expression that defines additional accordingly (references: additional, baseWithGoals, rulesBlock, graphNote).src/hooks/pre-tool-use.ts (1)
478-488:
⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUse
safeEchoCommandin the not-found path branch.Line 487 still emits
echo ${JSON.stringify(notFound)}. BecausenotFoundincludesvirtualPath(tool-input-derived), backticks/$()can be evaluated by the host shell in this fallback.🔧 Suggested fix
- return buildAllowDecision(`echo ${JSON.stringify(notFound)}`, `[DeepLake] not found: ${virtualPath}`); + return buildAllowDecision(safeEchoCommand(notFound), `[DeepLake] not found: ${virtualPath}`);As per coding guidelines, “the key invariant to flag is that a memory-touching command must never be handed to the real host shell.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.// The path was normalized to a concrete file but no VFS row exists — that // is "not found", NOT an unsupported command. Falling through to the // generic retry guidance would loop the agent on an already-valid shape. logFn(`virtual path not found: ${virtualPath}`); const notFound = `${virtualPath}: No such file or directory`; if (input.tool_name === "Read") { const file_path = writeReadCacheFileFn(input.session_id, virtualPath, notFound); return buildReadDecision(file_path, `[DeepLake] not found: ${virtualPath}`); } return buildAllowDecision(safeEchoCommand(notFound), `[DeepLake] not found: ${virtualPath}`); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/pre-tool-use.ts` around lines 478 - 488, The not-found branch currently constructs an unsafe host-shell command via buildAllowDecision(`echo ${JSON.stringify(notFound)}`, ...); replace that echoed literal with a command produced by safeEchoCommand so untrusted virtualPath characters cannot be interpreted by the shell; specifically, call safeEchoCommand(notFound) (or safeEchoCommand(JSON.stringify(notFound)) if you want the same quoting) and pass its return value into buildAllowDecision while leaving the Read-path (writeReadCacheFileFn + buildReadDecision) unchanged.tests/claude-code/safe-echo.test.ts (1)
68-69:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert canary absence with an exact sentinel value, not a generic error regex.
The current check depends on shell/locale error text. Prefer a fixed-value existence check to make this deterministic.
🔧 Suggested fix
- const { stdout: lsOut, stderr: lsErr } = runVia(shell, `ls '${canary}' 2>&1 || true`); - expect(lsOut + lsErr).toMatch(/No such file or directory/); + const { stdout: existsState } = runVia(shell, `[ -e '${canary}' ] && printf 'exists' || printf 'missing'`); + expect(existsState).toBe("missing");As per coding guidelines, “tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.const { stdout: existsState } = runVia(shell, `[ -e '${canary}' ] && printf 'exists' || printf 'missing'`); expect(existsState).toBe("missing");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/claude-code/safe-echo.test.ts` around lines 68 - 69, The test is brittle because it asserts a regex on shell error text; instead assert deterministically that the sentinel file does not exist. Replace the runVia(...) + regex assertion with a platform-independent existence check (e.g., use Node's fs.existsSync or fs.promises.stat) against the canary variable and assert it returns false (or rejects/not found). Update the test that currently calls runVia(shell, `ls '${canary}' ...`) and the expect(...) that inspects lsOut+lsErr to use the fs-based existence check so the assertion is exact and locale-independent.
…respect) Dependency / build-output directories could pollute the code graph: discovery only skipped a small hardcoded set + dotdirs, so non-dot Python virtualenvs (venv/, env/) and their site-packages were parsed into the graph. - src/graph/ignore-config.ts: user-editable ~/.deeplake/graph-ignore.json, seeded on first build with a broad default set (node_modules, venv, env, __pycache__, site-packages, dist, build, ...), listing directory names to skip (matched by basename). Conservative about source-like names (packages/bin/lib are NOT defaulted). + pathHasIgnoredSegment helper. - src/commands/graph.ts: discovery now prefers `git ls-files --cached --others --exclude-standard`, so the repo's .gitignore is honored EXACTLY (anchoring, nested rules, subtrees) rather than a naive basename parse; the name set is applied on top as a safety net for tracked junk. Manual walk() stays the fallback for non-git repos. Also fixes the stale "TypeScript source files" log. - Single discoverSourceFiles path, shared by every agent's auto-build (graph-on-stop -> runBuildCommand) -> ignore is uniform across assistants. Verified e2e: junk under venv/ env/ third_party/ leaves the snapshot unchanged (node count + zero junk markers). +13 unit tests.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/commands/graph.ts (1)
50-55:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate CLI help text to reflect multi-language support.
isSourceFile()now includes JS/Python extensions, but the help text still says “TypeScript only / TypeScript source files,” which is now inaccurate.Suggested diff
-const USAGE = `hivemind graph — codebase-graph commands (Phase 1 — TypeScript only) +const USAGE = `hivemind graph — codebase-graph commands (Phase 1) Usage: hivemind graph build [--cwd <path>] - Walk the project for TypeScript source files, extract symbols + edges, + Walk the project for supported source files (TS/JS/Python), extract symbols + edges, and write a snapshot to ~/.hivemind/graphs/<repo-key>/snapshots/<commit-sha>.json.Also applies to: 691-697
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/graph.ts` around lines 50 - 55, The CLI help text in the USAGE constant (and the other help string that still says “TypeScript only”) is outdated because isSourceFile() now supports JS and Python; update the wording in USAGE and the duplicate help block (the other string that mentions “TypeScript source files”) to reflect multi-language support (e.g., “multi-language (TypeScript, JavaScript, Python)” or “TypeScript/JavaScript/Python”) so the usage message accurately describes which source files are processed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/commands/graph.ts`:
- Around line 50-55: The CLI help text in the USAGE constant (and the other help
string that still says “TypeScript only”) is outdated because isSourceFile() now
supports JS and Python; update the wording in USAGE and the duplicate help block
(the other string that mentions “TypeScript source files”) to reflect
multi-language support (e.g., “multi-language (TypeScript, JavaScript, Python)”
or “TypeScript/JavaScript/Python”) so the usage message accurately describes
which source files are processed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 89f11590-b9a2-443c-be4b-edf173a917d6
📒 Files selected for processing (3)
src/commands/graph.tssrc/graph/ignore-config.tstests/shared/graph/ignore-config.test.ts
CodeRabbit (Minor): the `hivemind graph` USAGE text and file-header comment still said "TypeScript only / TypeScript source files" after JS + Python support landed. Update both to "TypeScript / JavaScript / Python". No behavior change.
|
@coderabbitai Addressed in the latest commit: the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
rg -n "TypeScript" src/commands/graph.tsLength of output: 276
No stale "TypeScript only" references remain in 🐇 ✨ 🧠 Learnings used |
CodeQL flagged a check-then-use race: loadGraphIgnore did existsSync(path) then later writeFileSync(path). Read the config directly instead (reacting to the read result, not a prior existsSync), and seed defaults with `flag: "wx"` — an atomic create-exclusive that fails if the file already exists, so a concurrent build can never be clobbered. Behavior preserved (9 tests pass).
…s to >=90 The PR coverage comment was red on branches (aggregate 80%). Close the gaps on the two files this PR introduces and enforce them per-file (vitest.config.ts): - ignore-config.ts: 75% -> 100% branches (per-field type fallbacks for a present config with non-array ignoreDirs / non-boolean respectGitignore). - cross-file.ts: 85.7% -> 90.2% branches (relative `.`/package __init__.py and __init__.pyi resolution, relative-resolves-to-null fall-through, empty specifier guard). Now 100% lines. Both added to the enforced thresholds block at 90/90/90/90.
kaghni
left a comment
There was a problem hiding this comment.
@coderabbitai pi and openclaw are missing for hivemind-graph skills and corresponding graph implmentation if needed can you create an issue for that?
| @@ -17,7 +17,7 @@ import { createRequire } from 'node:module'; | |||
|
|
|||
There was a problem hiding this comment.
@coderabbitai what tree-sitter is and is used for?
There was a problem hiding this comment.
@kaghni, great question! Here's a concise overview:
What is tree-sitter?
tree-sitter is an incremental parsing library and parser generator. It builds a concrete Abstract Syntax Tree (AST) from source code — a structured, hierarchical representation of the code's syntax (functions, classes, expressions, imports, etc.).
Key characteristics:
- Fast & incremental: can re-parse only the changed portions of a file.
- Error-tolerant: produces a partial tree even for syntactically invalid code.
- Language-agnostic: each language ships as a separate grammar package (e.g.,
tree-sitter-typescript,tree-sitter-python). - Native (C/C++) core with bindings for many languages including Node.js — which is why
scripts/ensure-tree-sitter.mjsexists to handle platform-specific compilation on e.g. Linux ARM64 where prebuilt binaries may not be available.
How is it used in this PR?
This PR uses tree-sitter to power the code graph extraction pipeline:
| File | Role |
|---|---|
src/graph/extract/typescript.ts |
Parses .ts/.tsx/.js/.jsx files into an AST, then walks it to extract nodes (functions, classes, types) and edges (calls, imports, inheritance) |
src/graph/extract/python.ts |
Same but for .py/.pyi files using the Python grammar |
src/graph/extract/index.ts |
Dispatches to the right extractor based on file extension |
The extracted AST data feeds into the graph snapshot stored in Deeplake, which agents (Cursor, Codex, Hermes) then query via the VFS (memory/graph/) to answer structural/relational questions about the codebase — without reading raw source files.
In short: tree-sitter turns source code into a structured graph of symbols and their relationships.
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
| // graph-on-stop.js (code-graph auto-build, G3). Single block (not two) | ||
| // mirrors the static codex/hooks/hooks.json and keeps one entry per | ||
| // event for the merge/dedupe logic. | ||
| Stop: [stopBlockWithGraph(30)], |
There was a problem hiding this comment.
@coderabbitai so do we fire graph on every stop? why?
There was a problem hiding this comment.
@kaghni — yes, it fires on every Stop, and it's intentional. Codex doesn't expose a dedicated session_end event; Stop fires after every assistant turn and is the only signal available for post-session work (the same reason stop.js also fires unconditionally — see the existing learning in context).
The cost is kept low by an internal git-diff threshold gate inside graph-on-stop (the SOURCE_GLOBS check in src/hooks/graph-on-stop.ts): the hook inspects whether any TS/JS/Python source files changed since the last build. If nothing has changed, it exits in well under a second without rebuilding anything. An actual graph rebuild only happens when the diff crosses the configured source-file count threshold.
So the pattern is:
- Fire on every Stop — because there's no other hook to attach to.
- Gate the expensive work — skip the rebuild cheaply when sources haven't changed.
This mirrors what Cursor and Hermes do with their own session-end equivalents.
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 98
File: codex/bundle/stop.js:1402-1408
Timestamp: 2026-05-06T23:01:21.197Z
Learning: In `codex/bundle/stop.js` (and `src/hooks/codex/stop.ts`), `forceSessionEndTrigger` is called unconditionally on every Codex `Stop` event by deliberate design. Codex has no separate `session_end` hook — `Stop` fires on every assistant turn and is the only available signal for session completion. Using `tryStopCounterTrigger` instead would silently break skill mining on short sessions (counter never reaches threshold, session closes, worker never fires). The per-projectKey worker lock (`tryAcquireWorkerLock` with a 10-min stale TTL in `src/skilify/state.ts`) makes the over-triggering cheap: subsequent Stops within the same session find nothing past the watermark and exit in <1s. This matches the pattern used by Cursor, Hermes, and Pi agents.
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
There was a problem hiding this comment.
I think it can be too much is there other way closer to session end
There was a problem hiding this comment.
@kaghni — unfortunately Codex doesn't expose anything closer to session end. The full hook menu is:
| Hook | Fires when |
|---|---|
SessionStart |
Session opens |
UserPromptSubmit |
User sends a message |
PreToolUse / PostToolUse |
Around each tool call |
Stop |
After every assistant turn |
There's no SessionEnd, SessionClose, or equivalent. Stop is the last available signal — the same reason stop.js uses it too.
The practical mitigation that's already in place is the git-diff gate inside graph-on-stop.ts: on each Stop it runs a quick git diff against the last-build marker. If no TS/JS/Python source files changed, the script exits in well under a second without touching the graph. An actual rebuild only happens when source files have changed since the last build.
So the overhead on a typical "stop with no code changes" is just one node process spin-up + a git diff. That said, if even that feels like too much churn, a debounce/cooldown approach would be the cleanest middle ground — write a timestamp to a state file on each successful build and skip the git diff entirely if the last build was less than, say, 5 minutes ago. That caps process spin-up cost to at most once per cooldown window while still guaranteeing a build fires eventually. Would you like me to open a follow-up issue for that, or sketch out what the change would look like?
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 98
File: codex/bundle/stop.js:1402-1408
Timestamp: 2026-05-06T23:01:21.197Z
Learning: In `codex/bundle/stop.js` (and `src/hooks/codex/stop.ts`), `forceSessionEndTrigger` is called unconditionally on every Codex `Stop` event by deliberate design. Codex has no separate `session_end` hook — `Stop` fires on every assistant turn and is the only available signal for session completion. Using `tryStopCounterTrigger` instead would silently break skill mining on short sessions (counter never reaches threshold, session closes, worker never fires). The per-projectKey worker lock (`tryAcquireWorkerLock` with a 10-min stale TTL in `src/skilify/state.ts`) makes the over-triggering cheap: subsequent Stops within the same session find nothing past the watermark and exit in <1s. This matches the pattern used by Cursor, Hermes, and Pi agents.
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
There was a problem hiding this comment.
@efenocchi can you check if Stop hook runs on every agent assitant turn it might be too much
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
@CodeRabbit to keep the graph consistent maybe is better to move the logic at the stop hook in the other AI assistants as well. the price should be contained since the graph to be recomputed is just for the modified files. Am I wrong? having a different logic between different AI assistants doesn't look good to me, but I might be wrong
There was a problem hiding this comment.
@efenocchi — your intuition is right that the trigger frequency is asymmetric across agents today, and it's worth understanding exactly where:
| Agent | Graph-on-stop trigger | Fires every turn? |
|---|---|---|
| Codex | Stop |
✅ Yes (no session_end exists) |
| Cursor | stop + sessionEnd |
✅ Yes (both hooks wired) |
| Hermes | on_session_end only |
❌ No — session end only |
So the real gap is Hermes: it builds the graph once at session close, while Codex and Cursor rebuild after every turn that touched source files. For a long Hermes session with lots of edits, the graph may lag until the very end.
Your reasoning on cost is correct — the git-diff gate in graph-on-stop.ts means each firing is cheap when nothing changed. The actual rebuild is incremental in the sense that only files changed since the last build marker trigger re-extraction, so re-firing on every Hermes turn would not be wildly expensive.
Concrete suggestion: wire graph-on-stop.js into Hermes' pre_tool_call or a per-turn equivalent (if Hermes exposes one), matching the Codex/Cursor pattern. If Hermes has no per-turn hook other than on_session_end, then the current design is already using the best available signal — the asymmetry is forced by the agent's API surface, not a deliberate design choice.
Would you like me to:
- Check what per-turn hooks Hermes exposes to confirm whether parity is achievable, and open a follow-up issue to track the Hermes alignment?
- Or sketch what
install-hermes.tswould look like if a per-turn hook is available?
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 98
File: codex/bundle/stop.js:1402-1408
Timestamp: 2026-05-06T23:01:21.197Z
Learning: In `codex/bundle/stop.js` (and `src/hooks/codex/stop.ts`), `forceSessionEndTrigger` is called unconditionally on every Codex `Stop` event by deliberate design. Codex has no separate `session_end` hook — `Stop` fires on every assistant turn and is the only available signal for session completion. Using `tryStopCounterTrigger` instead would silently break skill mining on short sessions (counter never reaches threshold, session closes, worker never fires). The per-projectKey worker lock (`tryAcquireWorkerLock` with a 10-min stale TTL in `src/skilify/state.ts`) makes the over-triggering cheap: subsequent Stops within the same session find nothing past the watermark and exit in <1s. This matches the pattern used by Cursor, Hermes, and Pi agents.
There was a problem hiding this comment.
@CodeRabbit is it done asynchronously for all the ai assistants, right?
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check graph-on-stop for spawn/exec/async patterns
fd 'graph-on-stop' --type f
echo "---"
cat src/hooks/graph-on-stop.ts 2>/dev/null || fd 'graph-on-stop.ts' --exec cat {}
echo "---"
# Also check if there's a spawn with detached flag anywhere in hooks
rg -n "detached|unref|spawn|fork|background" src/hooks/ -A2Length of output: 56655
@efenocchi — mostly yes, but the mechanism differs slightly per agent:
| Agent | Async mechanism | How it avoids blocking the user |
|---|---|---|
| Claude Code | Explicit async: true in hooks.json |
Agent launches the process and immediately moves on; hook runs to completion in its own lifetime |
| Codex | 30s timeout on the Stop hook; no explicit async flag |
Codex's Stop fires after the assistant turn; the gate exits in ~5ms on a skip, well before any timeout |
| Cursor | stop + sessionEnd hooks |
Cursor's hook system runs commands non-blocking |
| Hermes | on_session_end |
Session is already closing, so it's inherently non-blocking |
The key design note from the source doc (Lines 42–44 of graph-on-stop.ts):
Process model: runs in-process (no detached spawn). With
async: true, Claude Code launches us and moves on; we run to completion in our own lifetime and exit. No daemon, no PID file, no cross-platform issues.
So there's no detached background daemon for any agent — it's a regular Node process per invocation. The "non-blocking" guarantee on the user side comes from a combination of:
- The 5ms skip path (rate-limit / no-source-changes gate) — the vast majority of
Stopfirings hit this and exit almost instantly. - The
async: trueflag (Claude Code) or the agent's own post-turn hook model (Cursor/Hermes) that doesn't tie hook completion to the next user interaction.
For Codex specifically, since it lacks a native async flag, the practical safety net is purely the gate speed — but a slow rebuild (cold, ~2.5s) during a Codex session could theoretically delay the next prompt if Codex serializes hook execution. That's an area worth watching if users report sluggishness after large commits.
🧬 Code Graph Analysis Results
src/cli/install-codex.ts
Snippet: Hook command wiring (events → commands)
Lines 13-49
function hookCmd(bundleFile: string, timeout: number, matcher?: string): Record<string, unknown> {
const block: Record<string, unknown> = {
hooks: [{
type: "command",
command: `node "${join(PLUGIN_DIR, "bundle", bundleFile)}"`,
timeout,
}],
};
if (matcher) block.matcher = matcher;
return block;
}
function buildHooksJson(): Record<string, unknown> {
return {
hooks: {
SessionStart: [hookCmd("session-start.js", 10, "startup|resume")],
UserPromptSubmit: [hookCmd("capture.js", 10)],
PreToolUse: [hookCmd("pre-tool-use.js", 10, "Bash")],
PostToolUse: [hookCmd("capture.js", 15)],
// One Stop matcher-block with TWO commands — stop.js (capture) +
// graph-on-stop.js (code-graph auto-build, G3). Single block (not two)
// mirrors the static codex/hooks/hooks.json and keeps one entry per
// event for the merge/dedupe logic.
Stop: [stopBlockWithGraph(30)],
},
};
}
/** Stop block carrying both the capture stop hook and the graph auto-build. */
function stopBlockWithGraph(timeout: number): Record<string, unknown> {
return {
hooks: [
{ type: "command", command: `node "${join(PLUGIN_DIR, "bundle", "stop.js")}"`, timeout },
{ type: "command", command: `node "${join(PLUGIN_DIR, "bundle", "graph-on-stop.js")}"`, timeout },
],
};
}Snippet: Codex feature flag enabling (sync CLI call)
Lines 191-210
function tryEnableCodexHooks(): void {
// codex 0.130.0 renamed the `codex_hooks` feature flag to `hooks`. The legacy
// key still works but prints a deprecation warning on every startup, so we
// enable the new name and strip the legacy key if it lingers from an older
// install of this plugin.
//
// The strip is gated on a successful enable: on pre-0.130 codex (or if the
// codex CLI isn't on PATH) the `hooks` feature is unknown and the call
// throws — in that case we must leave any existing `codex_hooks = true`
// entry alone, otherwise we'd silently disable hooks on the user's box.
let enabled = false;
try {
execFileSync("codex", ["features", "enable", "hooks"], { stdio: "ignore" });
enabled = true;
} catch {
// codex CLI may not be on PATH (e.g., running under a separate user) or
// the codex version pre-dates the rename; not fatal.
}
if (enabled) stripLegacyCodexHooksKey();
}Snippet: Install flow (writes hooks.json, links skills, version stamp)
Lines 228-260
export function installCodex(): void {
const srcBundle = join(pkgRoot(), "codex", "bundle");
const srcSkills = join(pkgRoot(), "codex", "skills");
if (!existsSync(srcBundle)) {
throw new Error(`Codex bundle missing at ${srcBundle}. Run 'npm run build' first.`);
}
ensureDir(PLUGIN_DIR);
copyDir(srcBundle, join(PLUGIN_DIR, "bundle"));
if (existsSync(srcSkills)) copyDir(srcSkills, join(PLUGIN_DIR, "skills"));
tryEnableCodexHooks();
// Idempotent: only rewrite hooks.json when the merged result actually
// changed. An unconditional rewrite (even byte-identical) changes the file
// Codex fingerprints and re-triggers its "Hooks need review" trust prompt on
// every install/update. Skipping the no-op write keeps the user from being
// re-prompted each time.
if (!writeJsonIfChanged(HOOKS_PATH, mergeHooksJson(buildHooksJson()))) {
log(` Codex hooks.json unchanged — skipped rewrite (no re-trust prompt)`);
}
ensureDir(AGENTS_SKILLS_DIR);
const skillTarget = join(PLUGIN_DIR, "skills", "deeplake-memory");
if (existsSync(skillTarget)) {
symlinkForce(skillTarget, SKILL_LINK);
} else {
warn(` Codex skill source missing at ${skillTarget}; skipping symlink`);
}
writeVersionStamp(PLUGIN_DIR, getVersion());
log(` Codex installed -> ${PLUGIN_DIR}`);
}src/cli/util.ts
Snippet: Write hooks.json only if content changed
Lines 73-83
export function writeJsonIfChanged(path: string, obj: unknown): boolean {
const next = JSON.stringify(obj, null, 2) + "\n";
if (existsSync(path)) {
try {
if (readFileSync(path, "utf-8") === next) return false; // unchanged → no write
} catch { /* unreadable → fall through and rewrite */ }
}
ensureDir(dirname(path));
writeFileSync(path, next);
return true;
}src/cli/version.ts
Snippet: Version stamp source
Lines 5-12
export function getVersion(): string {
try {
const pkg = JSON.parse(readFileSync(join(pkgRoot(), "package.json"), "utf-8"));
return pkg.version ?? "0.0.0";
} catch {
return "0.0.0";
}
}🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 98
File: codex/bundle/stop.js:1402-1408
Timestamp: 2026-05-06T23:01:21.197Z
Learning: In `codex/bundle/stop.js` (and `src/hooks/codex/stop.ts`), `forceSessionEndTrigger` is called unconditionally on every Codex `Stop` event by deliberate design. Codex has no separate `session_end` hook — `Stop` fires on every assistant turn and is the only available signal for session completion. Using `tryStopCounterTrigger` instead would silently break skill mining on short sessions (counter never reaches threshold, session closes, worker never fires). The per-projectKey worker lock (`tryAcquireWorkerLock` with a 10-min stale TTL in `src/skilify/state.ts`) makes the over-triggering cheap: subsequent Stops within the same session find nothing past the watermark and exit in <1s. This matches the pattern used by Cursor, Hermes, and Pi agents.
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
| @@ -0,0 +1,394 @@ | |||
| /** | |||
There was a problem hiding this comment.
@coderabbitai why we have python extracter what's the usecase for hivemind
There was a problem hiding this comment.
@kaghni Great question! Here's the context:
What is Hivemind's graph feature?
Hivemind builds a deterministic, AST-derived code graph of your repository and stores it in Deeplake (a local tensor/vector DB). AI coding agents (Cursor, Codex, Hermes, Claude Code) can then query this graph through a virtual filesystem (/graph/*) to answer structural questions about the codebase — without having to read raw source files.
The graph captures:
- Nodes: functions, classes, methods, types, modules
- Edges:
calls,imports,extends,implements,method_of - Metadata: fan-in/fan-out degree, entrypoint flags, signatures
Why a Python extractor specifically?
This PR (#228) is part of a multi-language expansion (TypeScript → TypeScript + JavaScript + Python). The Python extractor (src/graph/extract/python.ts) uses tree-sitter-python to parse Python source and emit the same FileExtraction shape as the TypeScript extractor, so the rest of the pipeline (cross-file resolution, snapshot build, VFS queries) is language-agnostic.
Practical use case:
When an agent wants to know "what calls my_function across the whole repo?" or "what does MyClass inherit from?", it queries /graph/neighborhood/my_function.py instead of grep-scanning every file. This is:
- Faster (pre-indexed at build time)
- Structural (AST-accurate, not text-search)
- Token-efficient (returns a compact summary, not full file contents)
Without Python support, any repo with Python files would have an incomplete graph — missing Python symbols and their relationships to TS/JS code.
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
| @@ -30,6 +30,11 @@ import { readLastBuild } from "./last-build.js"; | |||
| import { repoDir } from "./snapshot.js"; | |||
There was a problem hiding this comment.
@coderabbitai what vfs-handler is what it does what's it's usage
There was a problem hiding this comment.
@kaghni — here's a breakdown:
src/graph/vfs-handler.ts — What It Is
It's a Virtual File System (VFS) handler for the code graph. It intercepts agent tool calls (like cat, head, tail) that target paths under ~/.deeplake/memory/graph/ and synthesizes responses on the fly from the local snapshot — no real files exist at those paths.
What It Does
The main exported function is:
handleGraphVfs(subpath: string, cwd: string): GraphVfsResultIt receives the portion of the path after .../graph/ and routes it to a renderer:
| Endpoint | What it renders |
|---|---|
index.md |
Graph overview: commit, node/edge counts, top files, edge kinds |
find/<pattern> |
Substring search on node IDs + labels; saves numbered handles |
show/<N or pattern> |
Node detail + 1-hop neighbors (callers/callees/imports/heritage) |
query/<pattern> |
Combined find + 1-hop expand of top matches (start here) |
impact/<pattern> |
Transitive dependents (blast radius) of a symbol |
neighborhood/<file> |
All symbols in a file + its cross-file neighbors |
layers |
Architectural layer breakdown by path heuristic |
tour |
Dependency-ordered walkthrough of the whole graph |
path/<from>/<to> |
Shortest path between two symbol patterns |
All rendering is local — it reads the .json snapshot file from disk (written by hivemind graph build or pull) and never makes network calls.
How It's Used
It's wired into the pre-tool-use hooks for each agent. The flow is:
sequenceDiagram
participant Agent as AI Agent (Cursor/Codex/Hermes)
participant Hook as pre-tool-use hook
participant GC as graph-command.ts (tryGraphRead)
participant VFS as vfs-handler.ts (handleGraphVfs)
participant Disk as Local snapshot .json
Agent->>Hook: Tool call: cat ~/.deeplake/memory/graph/query/foo
Hook->>GC: tryGraphRead(rewrittenCmd, cwd)
GC->>VFS: handleGraphVfs("query/foo", cwd)
VFS->>Disk: readFileSync(snapPath)
VFS-->>GC: { kind: "ok", body: "..." }
GC-->>Hook: synthesized text
Hook-->>Agent: block + return synthesized response
Specifically:
src/graph/graph-command.ts(tryGraphRead) parses the command and callshandleGraphVfssrc/hooks/cursor/pre-tool-use.ts,src/hooks/codex/pre-tool-use.ts, andsrc/hooks/hermes/pre-tool-use.tsall calltryGraphReadand block the real tool execution when it returns content
This lets agents query the code graph by simply reading virtual paths — no special API needed.
🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 0
File: :0-0
Timestamp: 2026-06-05T04:21:07.760Z
Learning: In activeloopai/hivemind src/graph/deeplake-pull.ts (TypeScript), the `pullSnapshot` function's SQL SELECT is already keyed on `(repo_slug, commit_sha)` matching the requested `(repoKey, head)` pair (lines 126-128). An additional in-code payload identity check (comparing payload's graph.commit_sha/graph.repo_key against the query parameters) is intentionally omitted as non-required defense-in-depth; the `snapshot_sha256` check (lines 179-183) covers payload integrity. Do not flag the absence of this identity cross-check as a correctness gap.
Summary
Code-graph fixes + cross-agent parity (no docs in this PR — the planning PRD is kept local only).
fix(graph): cloud pull sha verification — P1. Push stored
snapshot_sha256over the stable fields only (excludesobservation); pull was hashing the entire payload and comparing the two, so every real cloud row was rejected withsnapshot_sha256 mismatchand team pull silently failed. Pull now parses the payload, validatesnodes/links, and recomputes viacomputeSnapshotSha256(the same function push uses).feat(graph): cross-agent VFS parity — the code graph previously worked only on Claude Code. New shared parser
src/graph/graph-command.ts(tryGraphRead) is wired into Cursor, Codex, and Hermes pre-tool-use hooks; Cursor also gets the SessionStart graph inject and the auto-build hook (graph-on-stop) in its bundle + hook registration. Hardened over two rounds of Codex review (pipe-swallow, path traversal, quoted paths/flags, multi-file reads, per-agent cwd).feat(graph):
hivemind-graphskill for claude-code / codex / hermes (Cursor & pi have no skill loader and rely on the inject).Tests
tests/shared/graph/deeplake-pull.test.ts(+5 regression) andtests/shared/graph/graph-command.test.ts(16) green.npm run buildgreen. Pre-existing unrelated teardown error insession-notifications-hook.test.tsonly.Session Context
Session transcript
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores