Skip to content

Token reduction + routing reliability, provider/cost fixes, native ABI guard#73

Open
vishalveerareddy123 wants to merge 10 commits into
mainfrom
feat/token-reduction-rtk
Open

Token reduction + routing reliability, provider/cost fixes, native ABI guard#73
vishalveerareddy123 wants to merge 10 commits into
mainfrom
feat/token-reduction-rtk

Conversation

@vishalveerareddy123

@vishalveerareddy123 vishalveerareddy123 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Token-reduction features plus reliability fixes uncovered while running Claude Code through Lynkr end-to-end.

Token reduction (A1–A5, inspired by 9router)

  • RTK compressors: grep (per-file match cap), dedup_log (collapse repeated lines), smart_truncate (head/tail fallback).
  • Request bypass: canned responses for Claude CLI housekeeping (Warmup / count / title-prefill / isNewTopic) — no provider round-trip. CLI-only, always on.
  • MCP tool dedup: strip built-in WebSearch/WebFetch when Exa/Tavily MCP tools are present. Always on.
  • Caveman: opt-in terse-output system-prompt injector (off by default).

Routing reliability

  • Pass client tools through in client/passthrough mode — previously the model's tool calls were dropped as "hallucinated", stalling agentic sessions.
  • De-escalate risk false-positives: drop over-broad session/token substring keywords that forced ordinary requests to COMPLEX.
  • Session→provider affinity: keep a tool-bearing conversation on one provider so tool_call_id linkage doesn't break (fixes Azure/Moonshot 400s).

Providers, cost & telemetry

  • Moonshot: remap deprecated moonshot-v1-auto; pin temperature:1 + top_p:0.95 for kimi-k2.x thinking models.
  • OpenAI-format converter: replace empty-string content with a space (Kimi "tokenization failed").
  • Telemetry: request_text/response_text columns (+ migration), captured text + per-request cost_usd on success/fallback — builds the routing-ML training corpus.
  • Deterministic cost resolution: replace the registry's bidirectional fuzzy substring match (confidently-wrong prices) with an ordered ladder — override → exact → prefix-strip → alias → date-normalize → longest-prefix → unknown. Unknown models record cost_usd=null (+ warn) instead of a guess. Adds MODEL_PRICE_OVERRIDES.
  • Native ABI guard: postinstall rebuilds native deps on a Node-version mismatch (better-sqlite3 was silently disabling telemetry/sessions/memory after a Node bump).

Dashboard

  • Configured Providers now derives from TIER_* ∩ configured credentials, shows per-provider tiers, and flags tiers pointing at an unconfigured provider.

Tests

  • New: token-reduction.test.js (17), session-affinity.test.js, model-registry-cost.test.js; all in test:unit.
  • Verified live end-to-end (200, SIMPLE→ollama, cost + text recorded in telemetry).

Follow-up (not in this PR)

  • Trivial-turn fast path: unwrap the <system-reminder> scaffolding before the greeting/force-local check so a wrapped "Hi" routes SIMPLE instead of COMPLEX.

🤖 Generated with Claude Code

vishal veerareddy and others added 10 commits June 8, 2026 21:40
…aveman

Port category-A token-reduction features inspired by 9router:

- tool-result-compressor: add grep (per-file match cap), dedup_log (collapse
  repeated lines), and smart_truncate (head/tail fallback) compressors.
- bypass: short-circuit Claude CLI housekeeping (Warmup, count, title prefill,
  isNewTopic title extraction) with a canned response — no provider round-trip.
  Always on; CLI-only so real work is never affected.
- tool-dedup: strip built-in WebSearch/WebFetch when an equivalent MCP tool
  (Exa/Tavily) is present. Always on.
- caveman: opt-in terse-output system-prompt injector (lite/full/ultra) to
  reduce output tokens; off by default.

Adds test/token-reduction.test.js (17 cases).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er affinity

- risk-analyzer: remove the over-broad 'session' and 'token' substring keywords
  that force-escalated ordinary requests to COMPLEX (they matched benign paths
  like src/sessions/* and tokenizer.js). Real secrets stay covered by
  secret/credential/api-key. Genuine auth/security paths still flag high.
- session-affinity: new module + wiring in determineProviderSmart. When a
  conversation already carries tool_use/tool_result history, reuse the provider
  it first routed to, so tool_call_id linkage doesn't break across providers
  (the Azure/Moonshot 400 "tool_call_id not found" errors). Fresh turns still
  route per-tier.

Adds test/session-affinity.test.js.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Gate smart tool selection to server mode only. In client/passthrough mode
  the client (Claude Code) owns tool execution, so stripping its tools made the
  model emit calls for removed tools that were then dropped as "hallucinated",
  stalling the session. Tools now pass through intact.
- Wire request bypass (early), MCP tool dedup, and caveman injection into
  processMessage / runAgentLoop.
- Thread session id (_sessionId) for provider affinity.
- config: add caveman flags (CAVEMAN_ENABLED/LEVEL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…I guard

Providers:
- Moonshot: remap deprecated moonshot-v1-auto (400 "tokenization failed") to a
  fixed model; pin temperature=1 and top_p=0.95 for kimi-k2.x thinking models
  (they reject other values).
- openrouter-utils: replace empty-string message content with a space so
  Kimi/OpenAI-compatible providers don't 400 on "tokenization failed".

Telemetry / cost:
- telemetry: add request_text + response_text columns (+ migration for existing
  DBs); capture prompt/response text (truncated) and per-request cost_usd
  (model-registry pricing x tokens) on success + fallback paths. Builds the
  training corpus for the routing ML models.

Native modules:
- scripts/check-native.js + postinstall: detect a better-sqlite3 ABI mismatch
  (Node upgrade) and rebuild native optional deps. Best-effort, never fails
  install. Fixes telemetry/sessions/memory silently going dark after a Node bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Derive the Configured Providers list from TIER_* + MODEL_PROVIDER intersected
with configured credentials, instead of listing every provider that merely has
an endpoint set (which always showed llamacpp/lmstudio via their default
endpoints). Show the tiers each provider serves, and flag tiers pointing at a
provider with no credentials.

Also documents the caveman env flags in .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing match

Replace the bidirectional `includes()` fuzzy match (which returned confidently
wrong prices for unrelated names) with a deterministic ladder:

  override → exact → provider-prefix-strip → alias → date-normalize
  → longest-prefix (one-directional, length-bounded) → unknown

- Unknown models now return `{ ...DEFAULT_COST, unknown: true }` and
  computeCostUsd records cost_usd=null (+ warn once) instead of a fabricated
  guess, so the bandit/cost-optimizer aren't poisoned.
- Add MODEL_PRICE_OVERRIDES env so operators can pin prices the registry lacks.
- Each result carries a `resolution` tag for debuggability.

Adds test/model-registry-cost.test.js (incl. a regression test that a name
containing a real key as a substring no longer false-matches).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ures

- Benchmark Results: switch to the Lynkr-vs-LiteLLM numbers from
  BENCHMARK_REPORT.md (53% / 87.6% token reduction with cost, 171ms cache,
  tier-routing comparison, ~50% monthly cost projection) + run date/versions.
- Advanced Features: document always-on optimizations (smart tool selection,
  RTK compression, MCP tool dedup, request bypass), opt-in caveman terse mode,
  and the cost-tracking / MODEL_PRICE_OVERRIDES behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ TOON)

Document the in-process tool_result compression that was previously undocumented:
- RTK pattern compressors (test/git/grep/lint/build/container/json/dir/file +
  dedup_log/smart_truncate), tier-aware size thresholds, and tee lossless
  recovery via GET /tee/:id.
- TOON binary JSON encoding with its config (TOON_ENABLED/MIN_BYTES/FAIL_OPEN).
Adds an overview row and fixes the phase-count labels.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Correct the curl installer's printed port from 8080 to 8081 (matches
  .env.example/README; the installer copies .env.example which uses 8081, so
  the old instructions were wrong for real installs).
- `npm install --production` → `--omit=dev` (keeps optionalDependencies; the
  postinstall native-ABI guard runs here).
- After install, probe better-sqlite3 and tell the user whether telemetry /
  memory / sessions are enabled, with a `npm run rebuild-native` hint if the
  native module isn't loadable. Lynkr still runs without it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the GitHub Pages homepage version indicators (JSON-LD softwareVersion
and the hero badge) from 9.3.2 to 9.4.6 to match the latest npm release.
The benchmark-provenance line is left at v9.3.2 since that records the version
the published benchmarks were actually run on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant