Skip to content

filter reasoning tokens out of Edit/Apply output and chat utility calls#12420

Open
achmelev wants to merge 3 commits into
continuedev:mainfrom
achmelev:PRFilteringReasoningTokens
Open

filter reasoning tokens out of Edit/Apply output and chat utility calls#12420
achmelev wants to merge 3 commits into
continuedev:mainfrom
achmelev:PRFilteringReasoningTokens

Conversation

@achmelev
Copy link
Copy Markdown

@achmelev achmelev commented May 15, 2026

Description

Problem

When using a reasoning model (Anthropic extended thinking, DeepSeek R1, xAI Grok, or any provider that emits delta.reasoning_content / thinking_delta) for Edit (Cmd+I) or Apply, the model's internal reasoning is written into the file alongside the actual code changes.

Continue converts all provider-specific reasoning formats into an internal ChatMessage with role: "thinking". The bug is that renderChatMessage() in core/util/messageContent.ts handles "thinking" in the same switch branch as "assistant":

case "thinking":   // ← falls through to the same return as "assistant"
case "assistant":
  return stripImages(message.content);

This makes thinking content indistinguishable from code output at every point downstream. The existing post-processing filters in streamDiffLines.ts (filterEnglishLinesAtStart, filterCodeBlockLines, etc.) operate only on string content and have no awareness of message roles — they provide no meaningful protection. The reasoning: false LLM option is a best-effort hint to the provider and does not suppress chunks that have already arrived.

The result: reasoning text enters the diff pipeline tagged as DiffLine { type: "new" } and is accepted into the file as code.

This is tracked in GitHub issue #11590 ("Edit Model failed to apply, but just paste its reasoning process to my code").

The same root cause affects several internal one-shot utility calls that use BaseLLM.chat() — a non-streaming wrapper around streamChat() that accumulates all chunks into a single completion string. Without filtering, thinking content is silently merged into that string and corrupts the results of: chat title generation, repo map summarisation, context retrieval tool selection, next-edit prediction, and conversation compaction.


Fix

New function

A new renderChatMessageWithoutThinking() is added to core/util/messageContent.ts:

export function renderChatMessageWithoutThinking(message: ChatMessage): string {
  if (message.role === "thinking") return "";
  return renderChatMessage(message);
}

renderChatMessage() is left unchanged because gui/src/pages/gui/Chat.tsx legitimately reads thinking content to populate the ThinkingBlockPeek collapsible UI component.

Call site changes — three categories

Category 1 — Streaming chat, line-oriented output (core/diff/util.ts)

streamLines() is the conversion point from ChatMessage chunks to strings for all Edit/Apply paths: the main edit flow (streamDiffLinesrecursiveStreamllm.streamChat()), lazy apply (streamLazyApply), and lazy replace. A single fix here covers all three:

// before:
const chunk = typeof update === "string" ? update : renderChatMessage(update);
// after:
const chunk = typeof update === "string" ? update : renderChatMessageWithoutThinking(update);

Category 2 — Streaming complete, provider uses Chat API internally (9 provider files)

When Edit runs without rules, llm.streamComplete() is called. Each provider implements _streamComplete() by calling its Chat API internally and converting ChatMessage chunks to strings before yielding them. The fix is applied at that conversion point in each provider:

OpenAI.ts, Anthropic.ts, Bedrock.ts, Gemini.ts, VertexAI.ts, Cohere.ts, Cloudflare.ts, Flowise.ts, CustomLLM.ts

// before:
yield renderChatMessage(chunk);
// after:
yield renderChatMessageWithoutThinking(chunk);

Category 3 — Non-streaming chat (core/llm/index.ts)

BaseLLM.chat() is a non-streaming wrapper around streamChat() used for one-shot utility calls (title generation, repo map summarisation, tool selection, next-edit prediction, conversation compaction). Without the fix, thinking content would be merged into the returned completion string.

// before:
completion += renderChatMessage(message);
// after:
completion += renderChatMessageWithoutThinking(message);

Additional call sites (follow-up)

The same fix is applied to core/util/historyUtils.ts (toMarkDown()) and the four built-in legacy slash commands (/commit, /review, /draftIssue, /onboard), which all stream ChatMessage chunks directly to the chat UI.

Call site intentionally not changed

core/edit/recursiveStream.ts accumulates streamed chunks into an internal buffer intended as a faithful reproduction of all model output, for use by a recursive continuation mechanism (currently inactive) that re-prompts the model when it hits its token limit mid-edit. Stripping thinking content from this buffer would corrupt the continuation context. Thinking is filtered downstream by streamLines() (Category 1 above), so nothing reaches the diff pipeline or the UI regardless.

AI Code Review

  • Team members only: AI review runs automatically when PR is opened or marked ready for review
  • Team members can also trigger a review by commenting @continue-review

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Screen recording or screenshot

Not applicable

Tests

No tests added, existing tests work (except for tests requiring API key, which couldn't be verified)


Summary by cubic

Prevents reasoning/“thinking” tokens from being inserted into files and utility outputs by filtering them at render time. Fixes cases where Edit/Apply and internal helpers pasted model reasoning into code (fixes #11590).

  • Bug Fixes
    • Added renderChatMessageWithoutThinking() to drop role: "thinking" chunks.
    • Applied to streamLines() (Edit/Apply), all providers’ _streamComplete() paths, BaseLLM.chat(), toMarkDown(), and legacy slash commands (/commit, /review, /draftIssue, /onboard).
    • Kept renderChatMessage() unchanged for chat UI; left recursiveStream buffer unfiltered to preserve continuation context.

Written for commit ef340a0. Summary will update on new commits. Review in cubic

achmelev added 3 commits May 15, 2026 17:42
Add renderChatMessageWithoutThinking() to messageContent.ts which returns
an empty string for role:"thinking" chunks instead of rendering them as
plain text.

Replace renderChatMessage() with renderChatMessageWithoutThinking() at all
call sites where thinking content must not appear in output:
- streamLines() in diff/util.ts (Apply and Edit-with-rules path)
- _streamComplete() in all provider implementations (Edit-without-rules path)
- BaseLLM.chat() in llm/index.ts (title generation, repo map summarisation,
  context retrieval tool selection, next-edit prediction, conversation
  compaction)
…gacy slash commands

Extends the renderChatMessageWithoutThinking fix to six remaining call
sites: recursiveStream (diff buffer), toMarkDown (history export), and
the four built-in-legacy slash commands (commit, review, draftIssue,
onboard) that streamed chunks directly to the UI.
…er integrity

The buffer in recursiveStream must be a faithful copy of all model output
so the recursive continuation path can resume from the correct position.
Thinking content is already filtered downstream by streamLines() in
diff/util.ts, so no thinking leaks to the diff pipeline or UI.
@achmelev achmelev requested a review from a team as a code owner May 15, 2026 15:50
@achmelev achmelev requested review from sestinj and removed request for a team May 15, 2026 15:50
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@achmelev
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 17 files

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Edit Model failed to apply, but just paste its reasoning process to my code.

1 participant