Skip to content

fix: prefer mental model abstraction in reflect to avoid unnecessary low-level retrieval#2012

Open
Oxygen56 wants to merge 1 commit into
vectorize-io:mainfrom
Oxygen56:fix/reflect-retrieval-optimization-1971
Open

fix: prefer mental model abstraction in reflect to avoid unnecessary low-level retrieval#2012
Oxygen56 wants to merge 1 commit into
vectorize-io:mainfrom
Oxygen56:fix/reflect-retrieval-optimization-1971

Conversation

@Oxygen56
Copy link
Copy Markdown

@Oxygen56 Oxygen56 commented Jun 5, 2026

Summary

The reflect operation was unconditionally forcing the full hierarchical retrieval chain (search_mental_models -> search_observations -> recall) before allowing the agent to answer, even when mental model abstraction would provide adequate context. This PR adds budget-aware short-circuit logic to prefer higher-level abstractions when appropriate.

Problem

In run_reflect_agent, the forced_sequence always forced search_mental_models, then search_observations, then recall before allowing auto mode. Even when a fresh, relevant mental model could answer the query, the agent was forced to continue to lower-level retrieval, increasing latency, cost, and potentially duplicating already-synthesized knowledge.

Changes

  1. Budget-aware short-circuit in forced sequence (agent.py lines 599-618):

    • low budget: Force only search_mental_models, then allow auto mode
    • mid budget: Skip lower-level retrieval when mental models are fresh and non-empty
    • high budget: Preserve the full hierarchical verification path (existing behavior)
  2. Mental model freshness tracking (agent.py line 425):

    • Added mental_models_sufficient flag to track when retrieved mental models are sufficient
  3. Freshness evaluation (agent.py lines 982-990):

    • After search_mental_models returns, check for mental models with is_stale=False and non-empty content
    • Set mental_models_sufficient = True when at least one model meets these criteria

Behavior

Budget Before After
low Force all 3 retrieval layers Force only search_mental_models, then auto
mid Force all 3 retrieval layers Force search_mental_models; skip lower layers if fresh results found
high Force all 3 retrieval layers Force all 3 retrieval layers (unchanged)

The default budget (low) now completes in fewer iterations when mental models are available, reducing latency and cost while maintaining answer quality.

Backward Compatibility

  • high budget: Unchanged behavior (full verification chain)
  • mid budget: Conditional short-circuit only when mental models are truly fresh
  • low budget: Changed default to prefer speed, consistent with documented budget semantics ("Prioritize speed over completeness")

Fixes #1971

The reflect operation was forcing lower-level retrieval even when
the mental model abstraction would provide adequate context. This
change adds budget-aware short-circuit logic:

- Low budget: force only search_mental_models, then allow auto
- Mid budget: skip lower retrieval when mental models are fresh
- High budget: preserve full hierarchical verification path

The check evaluates is_stale=False and non-empty content on mental
models returned by search_mental_models. When at least one model meets
these criteria, further forced retrieval steps are skipped.

Fixes vectorize-io#1971

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reflect Forces Lower-Level Retrieval Even When a Mental Model May Be Sufficient

1 participant