filter reasoning tokens out of Edit/Apply output and chat utility calls#12420
Open
achmelev wants to merge 3 commits into
Open
filter reasoning tokens out of Edit/Apply output and chat utility calls#12420achmelev wants to merge 3 commits into
achmelev wants to merge 3 commits into
Conversation
Add renderChatMessageWithoutThinking() to messageContent.ts which returns an empty string for role:"thinking" chunks instead of rendering them as plain text. Replace renderChatMessage() with renderChatMessageWithoutThinking() at all call sites where thinking content must not appear in output: - streamLines() in diff/util.ts (Apply and Edit-with-rules path) - _streamComplete() in all provider implementations (Edit-without-rules path) - BaseLLM.chat() in llm/index.ts (title generation, repo map summarisation, context retrieval tool selection, next-edit prediction, conversation compaction)
…gacy slash commands Extends the renderChatMessageWithoutThinking fix to six remaining call sites: recursiveStream (diff buffer), toMarkDown (history export), and the four built-in-legacy slash commands (commit, review, draftIssue, onboard) that streamed chunks directly to the UI.
…er integrity The buffer in recursiveStream must be a faithful copy of all model output so the recursive continuation path can resume from the correct position. Thinking content is already filtered downstream by streamLines() in diff/util.ts, so no thinking leaks to the diff pipeline or UI.
Contributor
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Problem
When using a reasoning model (Anthropic extended thinking, DeepSeek R1, xAI Grok, or any provider that emits
delta.reasoning_content/thinking_delta) for Edit (Cmd+I) or Apply, the model's internal reasoning is written into the file alongside the actual code changes.Continue converts all provider-specific reasoning formats into an internal
ChatMessagewithrole: "thinking". The bug is thatrenderChatMessage()incore/util/messageContent.tshandles"thinking"in the same switch branch as"assistant":This makes thinking content indistinguishable from code output at every point downstream. The existing post-processing filters in
streamDiffLines.ts(filterEnglishLinesAtStart,filterCodeBlockLines, etc.) operate only on string content and have no awareness of message roles — they provide no meaningful protection. Thereasoning: falseLLM option is a best-effort hint to the provider and does not suppress chunks that have already arrived.The result: reasoning text enters the diff pipeline tagged as
DiffLine { type: "new" }and is accepted into the file as code.This is tracked in GitHub issue #11590 ("Edit Model failed to apply, but just paste its reasoning process to my code").
The same root cause affects several internal one-shot utility calls that use
BaseLLM.chat()— a non-streaming wrapper aroundstreamChat()that accumulates all chunks into a single completion string. Without filtering, thinking content is silently merged into that string and corrupts the results of: chat title generation, repo map summarisation, context retrieval tool selection, next-edit prediction, and conversation compaction.Fix
New function
A new
renderChatMessageWithoutThinking()is added tocore/util/messageContent.ts:renderChatMessage()is left unchanged becausegui/src/pages/gui/Chat.tsxlegitimately reads thinking content to populate theThinkingBlockPeekcollapsible UI component.Call site changes — three categories
Category 1 — Streaming chat, line-oriented output (
core/diff/util.ts)streamLines()is the conversion point fromChatMessagechunks to strings for all Edit/Apply paths: the main edit flow (streamDiffLines→recursiveStream→llm.streamChat()), lazy apply (streamLazyApply), and lazy replace. A single fix here covers all three:Category 2 — Streaming complete, provider uses Chat API internally (9 provider files)
When Edit runs without rules,
llm.streamComplete()is called. Each provider implements_streamComplete()by calling its Chat API internally and convertingChatMessagechunks to strings before yielding them. The fix is applied at that conversion point in each provider:OpenAI.ts,Anthropic.ts,Bedrock.ts,Gemini.ts,VertexAI.ts,Cohere.ts,Cloudflare.ts,Flowise.ts,CustomLLM.tsCategory 3 — Non-streaming chat (
core/llm/index.ts)BaseLLM.chat()is a non-streaming wrapper aroundstreamChat()used for one-shot utility calls (title generation, repo map summarisation, tool selection, next-edit prediction, conversation compaction). Without the fix, thinking content would be merged into the returned completion string.Additional call sites (follow-up)
The same fix is applied to
core/util/historyUtils.ts(toMarkDown()) and the four built-in legacy slash commands (/commit,/review,/draftIssue,/onboard), which all streamChatMessagechunks directly to the chat UI.Call site intentionally not changed
core/edit/recursiveStream.tsaccumulates streamed chunks into an internal buffer intended as a faithful reproduction of all model output, for use by a recursive continuation mechanism (currently inactive) that re-prompts the model when it hits its token limit mid-edit. Stripping thinking content from this buffer would corrupt the continuation context. Thinking is filtered downstream bystreamLines()(Category 1 above), so nothing reaches the diff pipeline or the UI regardless.AI Code Review
@continue-reviewChecklist
Screen recording or screenshot
Not applicable
Tests
No tests added, existing tests work (except for tests requiring API key, which couldn't be verified)
Summary by cubic
Prevents reasoning/“thinking” tokens from being inserted into files and utility outputs by filtering them at render time. Fixes cases where Edit/Apply and internal helpers pasted model reasoning into code (fixes #11590).
renderChatMessageWithoutThinking()to droprole: "thinking"chunks.streamLines()(Edit/Apply), all providers’_streamComplete()paths,BaseLLM.chat(),toMarkDown(), and legacy slash commands (/commit,/review,/draftIssue,/onboard).renderChatMessage()unchanged for chat UI; leftrecursiveStreambuffer unfiltered to preserve continuation context.Written for commit ef340a0. Summary will update on new commits. Review in cubic