fix: prefer mental model abstraction in reflect to avoid unnecessary low-level retrieval#2012
Open
Oxygen56 wants to merge 1 commit into
Open
Conversation
The reflect operation was forcing lower-level retrieval even when the mental model abstraction would provide adequate context. This change adds budget-aware short-circuit logic: - Low budget: force only search_mental_models, then allow auto - Mid budget: skip lower retrieval when mental models are fresh - High budget: preserve full hierarchical verification path The check evaluates is_stale=False and non-empty content on mental models returned by search_mental_models. When at least one model meets these criteria, further forced retrieval steps are skipped. Fixes vectorize-io#1971 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The reflect operation was unconditionally forcing the full hierarchical retrieval chain (search_mental_models -> search_observations -> recall) before allowing the agent to answer, even when mental model abstraction would provide adequate context. This PR adds budget-aware short-circuit logic to prefer higher-level abstractions when appropriate.
Problem
In
run_reflect_agent, theforced_sequencealways forcedsearch_mental_models, thensearch_observations, thenrecallbefore allowing auto mode. Even when a fresh, relevant mental model could answer the query, the agent was forced to continue to lower-level retrieval, increasing latency, cost, and potentially duplicating already-synthesized knowledge.Changes
Budget-aware short-circuit in forced sequence (agent.py lines 599-618):
lowbudget: Force onlysearch_mental_models, then allow auto modemidbudget: Skip lower-level retrieval when mental models are fresh and non-emptyhighbudget: Preserve the full hierarchical verification path (existing behavior)Mental model freshness tracking (agent.py line 425):
mental_models_sufficientflag to track when retrieved mental models are sufficientFreshness evaluation (agent.py lines 982-990):
search_mental_modelsreturns, check for mental models withis_stale=Falseand non-empty contentmental_models_sufficient = Truewhen at least one model meets these criteriaBehavior
lowsearch_mental_models, then automidsearch_mental_models; skip lower layers if fresh results foundhighThe default budget (
low) now completes in fewer iterations when mental models are available, reducing latency and cost while maintaining answer quality.Backward Compatibility
highbudget: Unchanged behavior (full verification chain)midbudget: Conditional short-circuit only when mental models are truly freshlowbudget: Changed default to prefer speed, consistent with documented budget semantics ("Prioritize speed over completeness")Fixes #1971