Skip to content

index_repository(mode="moderate") silently drops entire subtrees from the indexed graph #411

@Patch76

Description

@Patch76

Summary

index_repository(mode="moderate") produces a graph that's missing entire subtrees with no warning or error. Running the same call with mode="full" (otherwise identical inputs) restores the dropped content. The drop is silent — both modes report status: "indexed" with non-zero node/edge counts.

Reproduction

Repository under test: a Python project where the indexed sub-tree was src/ha_mcp/ of homeassistant-ai/ha-mcp@82050348, containing 87 Python files across ~11 sub-directories (notably a tools/ directory with ~47 files).

Steps:

  1. delete_project(project="<P>")
  2. index_repository(repo_path="<absolute-path-to-src/ha_mcp>", mode="moderate") → returns {status: "indexed", nodes: 1294, edges: 3060}
  3. get_architecture(project="<P>", aspects=["file_tree"])tools/ directory entirely absent from the returned file_tree (only auth/, client/, policy/, transforms/, utils/, dashboard_screenshot/, plus top-level files appear).
  4. delete_project(project="<P>")
  5. index_repository(repo_path="<same path>", mode="full") → returns {status: "indexed", nodes: 2372, edges: 7570}
  6. get_architecture(project="<P>", aspects=["file_tree"])tools/ present with all 47 files, file_tree complete.

Net delta: moderate produced ~54 % of the nodes and ~40 % of the edges that full produced, silently — both calls reported successful indexing.

Expected

moderate mode is documented as filtering files (noise reduction), not as a structural drop of major source directories. A 47-file production source directory should not be silently filtered out. At minimum the response should expose which paths/files were excluded (count + sample), so callers can detect the drop.

Actual

tools/ is entirely missing from the graph after the moderate indexing. Subsequent queries like search_graph(file_pattern="tools/<anything>.py") return total: 0, indistinguishable from "this file doesn't exist in the repo", even though the file is plainly present in repo_path.

Why it matters

Reviewers or agents relying on moderate-indexed graph queries get false-negative answers for symbols in the dropped subtrees (zero callers, zero definitions, zero hits) with no signal that the answer is unreliable. For a PR-review workflow this means: a changed file in tools/ would appear "uncalled / safe to change" when in reality the protocol just can't see it.

.cbmignore for the test repo only contained common Python noise (.venv/, __pycache__/, dist/, build/, node_modules/, *.egg-info/, .git/) — nothing that explains dropping tools/.

Environment

  • codebase-memory-mcp --version0.7.0 (latest at filing; release 2026-05-30)
  • Linux 6.12.85-haos (Home Assistant OS, Alpine-based container)
  • stdio MCP transport via Claude Code
  • Indexed repo_path: /data/home/projects/claude-code-ha/ha-mcp/src/ha_mcp (87 Python files per get_architecture in full mode)
  • Same path under mode="moderate": 54 of those 87 files present (the missing 33 are exactly src/ha_mcp/tools/)

Workaround

Use mode="full" exclusively on this codebase. Walltime delta is small (~3 s either way) but moderate loses ~46 % of nodes for no observable speed advantage.

Suggested fix direction

Whatever heuristic moderate uses to filter (file count threshold? size threshold? path-pattern match?), make the dropped path list visible in the index_repository response (e.g. excluded_paths: ["tools/"] or excluded_count: 33), so the caller can decide whether to retry with full. Silent drops are the worst failure mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingparsing/qualityGraph extraction bugs, false positives, missing edges

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions