Skip to content

feat: smart federation routing (load + VRAM + model + prefix affinity + cross-server sync) and unified Cluster UI#10124

Open
localai-bot wants to merge 16 commits into
masterfrom
feat/p2p-federation-clusterrouting
Open

feat: smart federation routing (load + VRAM + model + prefix affinity + cross-server sync) and unified Cluster UI#10124
localai-bot wants to merge 16 commits into
masterfrom
feat/p2p-federation-clusterrouting

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

@localai-bot localai-bot commented Jun 1, 2026

Smart, prefix-cache-aware routing for the p2p federation server, sharing the routing brain with the NATS distributed mode, plus a unified Cluster UI. Builds on #10123 (the pkg/clusterrouting extraction).

What

The p2p federation server graduates from a blind round-robin TCP proxy into a model-aware, prefix-cache-aware, load+VRAM-aware L7 router - using the same pkg/clusterrouting policy the NATS side uses, and reusing the existing core/services/nodes/prefixcache engine (extractor, radix index, and Sync) rather than reimplementing any of it. The two distributed UIs are folded into one Cluster page.

How

Stage 1 - load + VRAM-aware (L4): gossip each node's free GPU VRAM (schema.NodeData.AvailableVRAM, from xsysinfo); select via clusterrouting.PickBestReplica (least in-flight, then most free VRAM). Adds a clock-injectable NodeData.IsOnlineAt.

Stage 2 - L7 + model + prefix affinity: convert the proxy to an L7 HTTP-terminating proxy: buffer the request under a configurable cap (--upload-limit, 413 over), choose a peer, stream the response (SSE keeps flowing), bypass websocket upgrades to a duplex copy, and force Connection: close on non-duplex forwards to avoid a keep-alive leak. Model-aware: peers gossip their model set (NodeData.Models); candidates are filtered to peers serving the requested model (empty set = eligible, so mixed-version swarms aren't starved). Prefix affinity: reuse prefixcache.ExtractChain/Index to find the warm peer for a prompt prefix, then make the final pick with a new load-guarded clusterrouting.PickWithAffinity (keeps the VRAM tier). Observed after each serve.

Stage 3 - cross-server affinity sync (opt-in): with --affinity-sync, federation servers share affinity by reusing prefixcache.Sync over a new genericChannelPublisher on edgevpn's generic channel (node.EnableGenericHub + a generic-channel handler; only sync-enabled servers join, so workers/members see no traffic). Same observe/invalidate event structs as the NATS path. Also adds the eviction ticker the federation index was missing (prevents unbounded tree growth, on or off).

Stage 4 - unified Cluster UI: fold /app/p2p ("swarm") and /app/nodes ("nodes") into one /app/cluster page with a shared summary and two collapsible, capability-gated sections (Distributed / Swarm). The two large pages are reused verbatim via an embedded prop (no rewrite); a new useP2PMode hook mirrors useDistributedMode. Old routes redirect; one sidebar entry; i18n across 5 languages. No backend / capabilities.js / Go-embed changes.

Notes / scope

  • Federation is L4, so model-awareness genuinely required the L7 conversion (stage 2, not stage 1).
  • No duplication: prefixcache, pkg/radixtree, the messaging events, and both existing UI pages already existed and are reused; the only new selection primitive is PickWithAffinity (to retain VRAM that prefixcache.Select lacks).
  • Backward-compatible gossip: NodeData is JSON-marshaled, so old peers omit the new fields (read as zero/nil = eligible / lowest VRAM tier).
  • --affinity-sync is off by default (single federation server is the common case); enable it on every server that should cohere.

Testing

  • Go: new Ginkgo specs at every layer (pkg/clusterrouting, core/schema, core/p2p candidate/model/affinity/L7/sync). go test -race ./core/p2p/ ./core/schema/ ./pkg/clusterrouting/... ./core/services/nodes/prefixcache/ green; golangci-lint 0 issues on the changed Go packages.
  • UI: Playwright e2e specs for the Cluster page + old-route redirects (the React app has no unit/RTL harness). npm run build (Vite) compiles clean; eslint adds 0 new errors. The full Playwright e2e + coverage run needs CI / make test-ui (browsers + the Go mock server).
  • The libp2p byte-forwarding (proxyHTTPToPeer), the live generic-channel broadcast, and the UI e2e are integration-only; their decision/message/render logic is unit-tested or read-verified. Recommended manual smoke before merge: single-server L7, two-server --affinity-sync, and the /app/cluster page + redirects.

Assisted-by: Claude Code:claude-opus-4-8

mudler added 8 commits June 1, 2026 07:54
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
… VRAM)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ting

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…raction

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ffinity routing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ractModel param

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@localai-bot localai-bot changed the title feat(p2p): route federation with shared clusterrouting policy (load + VRAM) feat(p2p): model-aware, prefix-cache-aware federation routing (load + VRAM + affinity) Jun 1, 2026
mudler added 2 commits June 1, 2026 22:22
… sync

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…c channel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@localai-bot localai-bot changed the title feat(p2p): model-aware, prefix-cache-aware federation routing (load + VRAM + affinity) feat(p2p): model-aware, prefix-cache-aware federation routing (load + VRAM + affinity + cross-server sync) Jun 1, 2026
mudler added 6 commits June 1, 2026 23:03
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…e sidebar entry

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@localai-bot localai-bot changed the title feat(p2p): model-aware, prefix-cache-aware federation routing (load + VRAM + affinity + cross-server sync) feat: smart federation routing (load + VRAM + model + prefix affinity + cross-server sync) and unified Cluster UI Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants