Skip to content

feat(message_bus): add QUIC, TCP-TLS, WS, WSS transports for SDK clients#3192

Open
hubcio wants to merge 1 commit intomasterfrom
feat/message-bus-multi-transport
Open

feat(message_bus): add QUIC, TCP-TLS, WS, WSS transports for SDK clients#3192
hubcio wants to merge 1 commit intomasterfrom
feat/message-bus-multi-transport

Conversation

@hubcio
Copy link
Copy Markdown
Contributor

@hubcio hubcio commented Apr 28, 2026

Replica plane stays TCP forever: VSR FIFO + view-change timing,
fd-delegation, writev batching all rely on plaintext between trusted
replicas. SDK-client plane gains four transports alongside TCP:

  • QUIC: shard-0 terminal (compio-quic CID demux), 1 bidi stream per
    peer, 0-RTT off + listener defense-in-depth reject.
  • TCP-TLS: rustls 1.3, no client auth, 0-RTT off, compio-tls behind
    unified TransportConn::run with bounded close_grace shutdown.
  • WS: compio-ws over plaintext TCP; pre-upgrade fd cross-shard
    handover keeps fd-delegation on plain TCP only.
  • WSS: WebSocketStream over TlsStream; both handshakes run on the
    per-connection install task.

Shared: TransportListener / TransportConn trait family; WebSocketConfig

  • close_grace threaded through MessageBusConfig and applied uniformly
    across TCP-TLS, WS, WSS; bounded safe-shutdown (no select! over
    stream.shutdown); single-task pump per WS/WSS using compio-ws
    cancel-safe read. Bus auth thin: both planes connect unauthenticated;
    server-ng gates via LOGIN_USER / LOGIN_WITH_PAT and future
    LOGIN_REPLICA. Ping announces replica_id only; no subprotocol, no
    ALPN, no MAC. Per-connection metadata flows via
    IggyMessageBus::client_meta; ShardFramePayload setup variants carry
    ClientConnMeta end to end.

Replica plane stays TCP forever: VSR FIFO + view-change timing,
fd-delegation, writev batching all rely on plaintext between trusted
replicas. SDK-client plane gains four transports alongside TCP:

- QUIC: shard-0 terminal (compio-quic CID demux), 1 bidi stream per
  peer, 0-RTT off + listener defense-in-depth reject.
- TCP-TLS: rustls 1.3, no client auth, 0-RTT off, compio-tls behind
  unified TransportConn::run with bounded close_grace shutdown.
- WS: compio-ws over plaintext TCP; pre-upgrade fd cross-shard
  handover keeps fd-delegation on plain TCP only.
- WSS: WebSocketStream over TlsStream; both handshakes run on the
  per-connection install task.

Shared: TransportListener / TransportConn trait family; WebSocketConfig
+ close_grace threaded through MessageBusConfig and applied uniformly
across TCP-TLS, WS, WSS; bounded safe-shutdown (no select! over
stream.shutdown); single-task pump per WS/WSS using compio-ws
cancel-safe read. Bus auth thin: both planes connect unauthenticated;
server-ng gates via LOGIN_USER / LOGIN_WITH_PAT and future
LOGIN_REPLICA. Ping announces replica_id only; no subprotocol, no
ALPN, no MAC. Per-connection metadata flows via
IggyMessageBus::client_meta; ShardFramePayload setup variants carry
ClientConnMeta end to end.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 87.78999% with 300 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.76%. Comparing base (eb20ac5) to head (c732238).

Files with missing lines Patch % Lines
core/message_bus/src/replica/io.rs 82.89% 39 Missing and 7 partials ⚠️
core/message_bus/src/transports/wss.rs 86.97% 32 Missing and 5 partials ⚠️
core/message_bus/src/transports/quic.rs 87.87% 32 Missing and 4 partials ⚠️
core/message_bus/src/transports/ws.rs 87.01% 27 Missing and 3 partials ⚠️
core/message_bus/src/installer/replica.rs 87.75% 15 Missing and 3 partials ⚠️
core/message_bus/src/transports/tcp_tls.rs 93.51% 12 Missing and 5 partials ⚠️
core/server-ng/src/dedup.rs 90.00% 7 Missing and 10 partials ⚠️
core/shard/src/coordinator.rs 0.00% 17 Missing ⚠️
core/message_bus/src/installer/mod.rs 44.82% 16 Missing ⚠️
core/message_bus/src/transports/tls/mod.rs 90.09% 6 Missing and 4 partials ⚠️
... and 13 more
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3192       +/-   ##
=============================================
- Coverage     74.08%   63.76%   -10.33%     
  Complexity      943      943               
=============================================
  Files          1159     1175       +16     
  Lines        102033    94258     -7775     
  Branches      79084    71326     -7758     
=============================================
- Hits          75593    60103    -15490     
- Misses        23770    31300     +7530     
- Partials       2670     2855      +185     
Components Coverage Δ
Rust Core 61.64% <87.78%> (-13.67%) ⬇️
Java SDK 60.14% <ø> (ø)
C# SDK 69.07% <ø> (-0.31%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.53% <ø> (ø)
Go SDK 39.43% <ø> (ø)
Files with missing lines Coverage Δ
core/configs/src/server_ng_config/defaults.rs 100.00% <ø> (ø)
core/configs/src/server_ng_config/displays.rs 0.00% <ø> (ø)
core/configs/src/server_ng_config/message_bus.rs 100.00% <ø> (ø)
core/configs/src/server_ng_config/server_ng.rs 40.98% <ø> (-2.77%) ⬇️
core/message_bus/src/config.rs 100.00% <100.00%> (ø)
core/message_bus/src/connector.rs 94.31% <100.00%> (+1.46%) ⬆️
core/message_bus/src/installer/conn_info.rs 100.00% <100.00%> (ø)
core/message_bus/src/installer/quic.rs 100.00% <100.00%> (ø)
core/message_bus/src/installer/ws.rs 100.00% <100.00%> (ø)
...e/message_bus/src/lifecycle/connection_registry.rs 90.79% <ø> (+1.02%) ⬆️
... and 28 more

... and 201 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@atharvalade atharvalade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found these during first round of review... I'll continue to review later. Overall seems good

if body.len() < iggy_binary_protocol::HEADER_SIZE {
return Err(FrameDecodeError::BadHeader);
}
let total_size = u32::from_le_bytes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framing.rs has a const _: () = { assert!(offset_of!(GenericHeader, size) == 48) } guard but this function duplicates the same 48..52 magic without one. If GenericHeader layout ever shifts, TCP/QUIC get a compile error while WS/WSS silently read the wrong bytes.

in_tx,
rx,
shutdown,
label,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_pump drops max_message_size into .. and decode_consensus_frame hardcodes framing::MAX_MESSAGE_SIZE. TCP and QUIC paths honor the per-bus config value, so an operator who lowers max_message_size gets enforcement on TCP/QUIC but not WS/WSS.

.per_client
.entry(client)
.or_insert_with(|| PerClient::with_capacity(self.per_client_capacity));
if state.find(request).is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup treats TTL-expired Done entries as Fresh, but mark_in_flight calls find() without a TTL check. A client retrying after TTL expiry sees Fresh from lookup then gets false from mark_in_flight because the physical slot still exists. The retry is silently dropped.

let (server_out, server_in, server_shutdown, server_handle) = drive(server_conn);
let (client_out, client_in, client_shutdown, client_handle) = drive(client_conn);

client_out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP-TLS's drive_close calls tls.shutdown() which sends close_notify, but WSS just does ws.close() + drop. The peer's rustls sees an unexpected EOF on the record layer, which can trigger false-positive alerts in TLS-aware load balancers or WAFs sitting in front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants