Token reduction + routing reliability, provider/cost fixes, native ABI guard#73
Open
vishalveerareddy123 wants to merge 10 commits into
Open
Token reduction + routing reliability, provider/cost fixes, native ABI guard#73vishalveerareddy123 wants to merge 10 commits into
vishalveerareddy123 wants to merge 10 commits into
Conversation
…aveman Port category-A token-reduction features inspired by 9router: - tool-result-compressor: add grep (per-file match cap), dedup_log (collapse repeated lines), and smart_truncate (head/tail fallback) compressors. - bypass: short-circuit Claude CLI housekeeping (Warmup, count, title prefill, isNewTopic title extraction) with a canned response — no provider round-trip. Always on; CLI-only so real work is never affected. - tool-dedup: strip built-in WebSearch/WebFetch when an equivalent MCP tool (Exa/Tavily) is present. Always on. - caveman: opt-in terse-output system-prompt injector (lite/full/ultra) to reduce output tokens; off by default. Adds test/token-reduction.test.js (17 cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er affinity - risk-analyzer: remove the over-broad 'session' and 'token' substring keywords that force-escalated ordinary requests to COMPLEX (they matched benign paths like src/sessions/* and tokenizer.js). Real secrets stay covered by secret/credential/api-key. Genuine auth/security paths still flag high. - session-affinity: new module + wiring in determineProviderSmart. When a conversation already carries tool_use/tool_result history, reuse the provider it first routed to, so tool_call_id linkage doesn't break across providers (the Azure/Moonshot 400 "tool_call_id not found" errors). Fresh turns still route per-tier. Adds test/session-affinity.test.js. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Gate smart tool selection to server mode only. In client/passthrough mode the client (Claude Code) owns tool execution, so stripping its tools made the model emit calls for removed tools that were then dropped as "hallucinated", stalling the session. Tools now pass through intact. - Wire request bypass (early), MCP tool dedup, and caveman injection into processMessage / runAgentLoop. - Thread session id (_sessionId) for provider affinity. - config: add caveman flags (CAVEMAN_ENABLED/LEVEL). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…I guard Providers: - Moonshot: remap deprecated moonshot-v1-auto (400 "tokenization failed") to a fixed model; pin temperature=1 and top_p=0.95 for kimi-k2.x thinking models (they reject other values). - openrouter-utils: replace empty-string message content with a space so Kimi/OpenAI-compatible providers don't 400 on "tokenization failed". Telemetry / cost: - telemetry: add request_text + response_text columns (+ migration for existing DBs); capture prompt/response text (truncated) and per-request cost_usd (model-registry pricing x tokens) on success + fallback paths. Builds the training corpus for the routing ML models. Native modules: - scripts/check-native.js + postinstall: detect a better-sqlite3 ABI mismatch (Node upgrade) and rebuild native optional deps. Best-effort, never fails install. Fixes telemetry/sessions/memory silently going dark after a Node bump. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Derive the Configured Providers list from TIER_* + MODEL_PROVIDER intersected with configured credentials, instead of listing every provider that merely has an endpoint set (which always showed llamacpp/lmstudio via their default endpoints). Show the tiers each provider serves, and flag tiers pointing at a provider with no credentials. Also documents the caveman env flags in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing match
Replace the bidirectional `includes()` fuzzy match (which returned confidently
wrong prices for unrelated names) with a deterministic ladder:
override → exact → provider-prefix-strip → alias → date-normalize
→ longest-prefix (one-directional, length-bounded) → unknown
- Unknown models now return `{ ...DEFAULT_COST, unknown: true }` and
computeCostUsd records cost_usd=null (+ warn once) instead of a fabricated
guess, so the bandit/cost-optimizer aren't poisoned.
- Add MODEL_PRICE_OVERRIDES env so operators can pin prices the registry lacks.
- Each result carries a `resolution` tag for debuggability.
Adds test/model-registry-cost.test.js (incl. a regression test that a name
containing a real key as a substring no longer false-matches).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ures - Benchmark Results: switch to the Lynkr-vs-LiteLLM numbers from BENCHMARK_REPORT.md (53% / 87.6% token reduction with cost, 171ms cache, tier-routing comparison, ~50% monthly cost projection) + run date/versions. - Advanced Features: document always-on optimizations (smart tool selection, RTK compression, MCP tool dedup, request bypass), opt-in caveman terse mode, and the cost-tracking / MODEL_PRICE_OVERRIDES behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ TOON) Document the in-process tool_result compression that was previously undocumented: - RTK pattern compressors (test/git/grep/lint/build/container/json/dir/file + dedup_log/smart_truncate), tier-aware size thresholds, and tee lossless recovery via GET /tee/:id. - TOON binary JSON encoding with its config (TOON_ENABLED/MIN_BYTES/FAIL_OPEN). Adds an overview row and fixes the phase-count labels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Correct the curl installer's printed port from 8080 to 8081 (matches .env.example/README; the installer copies .env.example which uses 8081, so the old instructions were wrong for real installs). - `npm install --production` → `--omit=dev` (keeps optionalDependencies; the postinstall native-ABI guard runs here). - After install, probe better-sqlite3 and tell the user whether telemetry / memory / sessions are enabled, with a `npm run rebuild-native` hint if the native module isn't loadable. Lynkr still runs without it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the GitHub Pages homepage version indicators (JSON-LD softwareVersion and the hero badge) from 9.3.2 to 9.4.6 to match the latest npm release. The benchmark-provenance line is left at v9.3.2 since that records the version the published benchmarks were actually run on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Token-reduction features plus reliability fixes uncovered while running Claude Code through Lynkr end-to-end.
Token reduction (A1–A5, inspired by 9router)
grep(per-file match cap),dedup_log(collapse repeated lines),smart_truncate(head/tail fallback).isNewTopic) — no provider round-trip. CLI-only, always on.Routing reliability
session/tokensubstring keywords that forced ordinary requests to COMPLEX.tool_call_idlinkage doesn't break (fixes Azure/Moonshot 400s).Providers, cost & telemetry
moonshot-v1-auto; pintemperature:1+top_p:0.95for kimi-k2.x thinking models.request_text/response_textcolumns (+ migration), captured text + per-requestcost_usdon success/fallback — builds the routing-ML training corpus.cost_usd=null(+ warn) instead of a guess. AddsMODEL_PRICE_OVERRIDES.postinstallrebuilds native deps on a Node-version mismatch (better-sqlite3 was silently disabling telemetry/sessions/memory after a Node bump).Dashboard
TIER_*∩ configured credentials, shows per-provider tiers, and flags tiers pointing at an unconfigured provider.Tests
token-reduction.test.js(17),session-affinity.test.js,model-registry-cost.test.js; all intest:unit.Follow-up (not in this PR)
<system-reminder>scaffolding before the greeting/force-local check so a wrapped "Hi" routes SIMPLE instead of COMPLEX.🤖 Generated with Claude Code