Conversation
Move spec-like documents from root and docs/internal into a structured layout under docs/superpowers with timestamped filenames: - specs/: active design documents (SSC PRD, technical spec, EdgeZero migration, auction orchestration flow, production readiness report) - archive/: completed or historical specs (optimization, sequence, publisher IDs audit)
Design spec for streaming HTTP responses through the publisher proxy when Next.js is disabled. Covers two implementation steps: 1. Make the streaming pipeline chunk-emitting (HtmlRewriterAdapter, gzip, encoder finalization) 2. Wire up Fastly StreamingBody via stream_to_client() with entry point migration from #[fastly::main] to undecorated main() Includes streaming gate logic, error handling, rollback strategy, and testing plan. Verified against Fastly SDK 0.11.12 API.
Add before/after measurement protocol using Chrome DevTools MCP tools: network timing capture, Lighthouse audits, performance traces, memory snapshots, and automated body hash comparison for correctness.
…ation test - Add debug-level logging to process_chunks showing total bytes read and written per pipeline invocation - Add brotli-to-brotli round-trip test to cover the into_inner() finalization path - Add test proving HtmlWithPostProcessing accumulates output when post-processors are registered while streaming path passes through
- Group std imports together (cell, io, rc) before external crates - Document supported compression combinations on PipelineConfig - Remove dead-weight byte counters from process_chunks hot loop - Fix stale comment referencing removed process_through_compression - Fix brotli finalization: use drop(encoder) instead of into_inner() to make the intent clear (CompressorWriter writes trailer on drop) - Document reset() as no-op on HtmlRewriterAdapter (single-use) - Add brotli round-trip test covering into_inner finalization path - Add gzip HTML rewriter pipeline test (compressed round-trip with lol_html, not just StreamingReplacer) - Add HtmlWithPostProcessing accumulation vs streaming behavior test
- Add Eq derive to Compression enum (all unit variants, trivially correct) - Brotli finalization: use into_inner() instead of drop() to skip redundant flush and make finalization explicit - Document process_chunks flush semantics: callers must still call encoder-specific finalization after this method returns - Warn when HtmlRewriterAdapter receives data after finalization (rewriter already consumed, data would be silently lost) - Make HtmlWithPostProcessing::reset() a true no-op matching its doc (clearing auxiliary state without resetting rewriter is inconsistent) - Document extra copying overhead on post-processor path vs streaming - Assert output content in reset-then-finalize test (was discarded) - Relax per-chunk emission test to not depend on lol_html internal buffering behavior — assert concatenated correctness + at least one intermediate chunk emitted
Non-processable 2xx responses (images, fonts, video) now stream directly to the client via PublisherResponse::PassThrough instead of buffering the entire body in memory. Content-Length is preserved since the body is unmodified.
Tests verify non-processable 2xx responses return PassThrough, non-processable errors stay Buffered, and processable content goes through Stream (not PassThrough).
Adds pass_through_preserves_body_and_content_length test that verifies io::copy produces identical output and Content-Length is preserved. Updates handle_publisher_request doc to describe all three response variants.
- Exclude 204 No Content from PassThrough (must not have body) - Remove Content-Length before streaming (stream_to_client uses chunked encoding, keeping both violates HTTP spec) - Add tests for 204 exclusion and empty-host interaction - Update doc comment and byte-level test to reflect CL removal
PassThrough reattaches the unmodified body and uses send_to_client() instead of stream_to_client() + io::copy. This preserves Content-Length (avoids chunked encoding overhead for images/fonts) and lets Fastly stream from its internal buffer without WASM memory buffering.
- Fix PassThrough doc comment operation order (set_body before finalize) - Update function doc to describe actual PassThrough flow (set_body + send_to_client, not io::copy) - Remove dead _enters_early_return variable, replace with comment
When the origin returns a processable 2xx response with an encoding the pipeline cannot decompress (e.g. `zstd` from a misbehaving origin), the buffered fallback previously still routed the body through process_response_streaming. `Compression::from_content_encoding` maps unknown values to `None`, so the rewriter would treat the compressed bytes as identity-encoded text and emit garbled output. Bypass the rewrite pipeline entirely in that case and return the origin response untouched. Adds a test asserting byte-for-byte pass-through and updates the is_supported_content_encoding doc to reflect the new behavior. Addresses PR #585 review feedback from @prk-Jr.
…eaming-pipeline-phase2 # Conflicts: # crates/trusted-server-core/src/publisher.rs
- Clarify PassThrough variant doc: finalize_response() and send_to_client() are applied at the outer dispatch level, not in this arm - Hoist status outside the early-return block and reuse is_success to eliminate the duplicate get_status() call - Exclude 205 Reset Content alongside 204 No Content per RFC 9110 §15.3.6; add pass_through_gate_excludes_205_reset_content test - Log binary pass-through before returning to aid production tracing
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Second pass review. Previous comments were addressed — has_html_post_processors() is implemented.
Previous comments resolved: Both prior inline comments are addressed — the has_html_post_processors() optimization is present in registry.rs and the streaming plan doc was updated.
New findings:
-
Doc comment copy-paste in
HtmlRewriterAdapter::reset()(commented inline). Minor, but renders incorrect in rustdoc. -
Gate tests validate boolean expressions, not function behavior (commented inline). The
streaming_gate_*andpass_through_gate_*tests reproduce the conditional logic fromhandle_publisher_requestas local variables and assert on those variables. They pass even if the gate in the actual function is changed. Theunsupported_encoding_response_is_returned_unmodifiedtest (which uses a realResponse) is the right model — the gate tests should follow that pattern usinghandle_publisher_requestreturning aPublisherResponsevariant. -
Missing WARN log when processable content is skipped due to empty request host (commented inline). The combined condition
!should_process || request_host.is_empty() || !is_successlogs at DEBUG for all three cases. An emptyrequest_hostwith processable content is a misconfiguration that silently returns unrewritten HTML/JS to the client. Worth a separate warn branch.
No blocking issues. The core streaming architecture — chunk-emitting HtmlRewriterAdapter, unified process_chunks loop, PublisherResponse enum, and the pass-through/stream/buffered routing in main.rs — is sound. The Mutex accumulation buffers in GTM and NextJS rewriters are correctly drained on the final fragment path. The brotli encoder finalization via flush() inside process_chunks followed by into_inner() is correct. Content-Length handling is consistent across all three response paths.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Findings from second pass are non-blocking — approving.
- Extract gate decision into pure classify_response_route so the formula lives in one place; handle_publisher_request dispatches on the returned ResponseRoute variant. Gate tests now exercise this function directly instead of re-stating its boolean expressions. - Emit log::warn! when processable content returns unmodified because request_host is empty (misconfigured proxy Host header) so it is visible at production log levels. - Fix copy-paste in HtmlRewriterAdapter::reset doc comment.
- Route non-2xx processable responses through the streaming pipeline so branded 404/500 HTML and error JSON still get origin URL rewriting, matching pre-streaming behavior on main. - Exclude 204 No Content and 205 Reset Content at the top of classify_response_route so no downstream arm can emit a body for a status that prohibits one (RFC 9110 §15.3.5 / §15.3.6), regardless of Content-Type or post-processor registration. - Stop GoogleTagManagerIntegration from claiming scripts it cannot plausibly rewrite. The rewriter now returns Keep on every fragment unless the running text contains a GTM/GA domain or ends with a proper prefix of one. Fixes a regression where GTM's Replace(unchanged_full) clobbered NextJsNextDataRewriter's Replace on fragmented __NEXT_DATA__ text nodes, because lol_html runs overlapping text handlers in order and last-replace-wins. - Hoist flate2/brotli imports in StreamingPipeline::process to module scope to satisfy the repo's no-local-imports convention. - Add regression tests: non-2xx Stream/BufferedProcessed routing, 204/205 exclusion for processable content, NextJs+GTM fragmented __NEXT_DATA__ round-trip, and boundary-prefix detection for GTM markers.
…ness
P1: Tighten GoogleTagManagerIntegration's boundary-safe prefix gate.
Previously any non-empty prefix of googletagmanager.com /
google-analytics.com was treated as a plausible continuation, so
fragments whose tails matched as little as "g" (common in English and
minified identifiers like "img", "slug", "config", "thing") engaged
GTM's RemoveNode + Replace path. On fragmented scripts that turned out
not to be GTM, GTM would then re-emit Replace(unchanged) on the last
fragment and clobber replacements from overlapping script rewriters
(e.g., NextJsNextDataRewriter on script#__NEXT_DATA__), because
lol_html runs overlapping text! handlers in order and the last
text.replace wins.
Require a minimum trailing prefix length of 6 bytes ("google" — the
common prefix of both markers). Below that, tails are ambiguous with
ordinary tokens and must not engage accumulation. Add a regression
test (fragmented_next_data_with_trailing_g_survives_gtm) that drives
fragmented __NEXT_DATA__ with an intermediate fragment ending in
a "g"-tail word and verifies NextJs's URL rewrite survives. Add a
dedicated unit test covering short-tail rejection and tightens the
existing full-match / boundary-prefix coverage.
P3: Refresh publisher docs to match the streaming behavior.
The module-level comment claimed process_response_streaming accepts
and returns fastly::Body, but it now writes into a generic Write and
returns (). The PublisherResponse docs and handle_publisher_request
summary also still described Stream as 2xx-only, while the current
classifier intentionally routes processable non-2xx responses through
Stream / BufferedProcessed too.
Adds six regression tests covering previously thin or uncovered paths introduced by the streaming optimization: 1. Fragmented RSC push script through the post-processor fallback (nextjs/mod.rs::small_chunk_rsc_push_survives_fragmentation_via_post_processor_fallback). Drives a large self.__next_f.push([...]) script through the full StreamingPipeline with chunk_size = 128 so NextJsRscPlaceholderRewriter returns Keep on every fragment and post_process_rsc_html_in_place_with_limit has to re-parse the accumulated HTML. Asserts all origin URLs are rewritten, no placeholder leaks, and the push-call envelope stays structurally intact (this is the path I suspect in the autoblog parse error; now we have a local harness to iterate on it). 2. Empty origin body on the streaming route (publisher.rs::stream_publisher_body_handles_empty_body). Exercises the Ok(0)-on-first-read branch of process_chunks plus the processor's is_last=true, chunk=[] terminal call. 3. Mid-stream decode error surfacing (publisher.rs::stream_publisher_body_surfaces_mid_stream_decode_error). Claims gzip encoding and feeds non-gzip bytes; asserts the error propagates out of stream_publisher_body so the adapter can drop the StreamingBody rather than emit malformed output. 4. PassThrough adapter contract (publisher.rs::publisher_response_pass_through_reattach_preserves_bytes). Walks the adapter's take-body/reattach pattern explicitly and asserts byte-for-byte identity plus Content-Length preservation, complementing the classifier-only test. 5. BufferedProcessed dispatch contract (publisher.rs::buffered_processed_sets_content_length_from_processed_body). Confirms nextjs integration registers an HTML post-processor, that the classifier routes HTML to BufferedProcessed when one is present, and that feeding HTML through the same pipeline yields a rewritten body. 6. Document-state survival through the accumulating path (publisher.rs::document_state_placeholders_substitute_through_accumulating_path). Verifies that NextJsRscPlaceholderRewriter's IntegrationDocumentState entry written during streaming is read by NextJsHtmlPostProcessor after accumulation, so placeholders never leak to output and the origin URL in the RSC payload is rewritten. All tests pass (848 total, +6 from this commit). Clippy and fmt clean.
Summary
This PR lands the full streaming-response optimization for the publisher proxy, reducing peak WASM memory from ~4× response size to constant and improving TTFB/TTLB on every processable origin response. It replaces the original "buffer everything, rewrite, send" path with a chunked pipeline that streams through the content rewriters and writes directly to the client.
Delivered across four phases (plus follow-up review cycles):
Phase 1 — Chunk-emitting pipeline (streaming_processor.rs)
StreamingPipeline<R: Read, W: Write>with a singleprocess_chunksloop that handles uncompressed, gzip↔gzip, gzip→none, deflate↔deflate, deflate→none, brotli↔brotli, and brotli→none, including explicit per-codec finalization semantics (finish()for flate2,flush()+into_inner()for brotli).HtmlRewriterAdapterwrapslol_html::HtmlRewriteras aStreamProcessorthat drains output after every chunk via a sharedRc<RefCell<Vec<u8>>>sink, solol_htmlbuffering no longer dictates allocation size.Phase 2 — Stream to client via
StreamingBody(publisher.rs, adapter main.rs)handle_publisher_requestreturns aPublisherResponsesum type (Stream/PassThrough/Buffered) instead of a singleResponse, so the adapter picks the right Fastly dispatch (stream_to_client()/send_to_client()/ return-and-send).#[fastly::main]to an undecoratedmain()withRequest::from_client()—#[fastly::main]auto-sends the returned response and is incompatible with streaming.process_response_streamingbecame generic overW: Write, so the same function writes into aVec<u8>on the buffered route and aStreamingBodyon the streaming route.Content-Lengthis removed on the streaming arm (chunked transfer).restrict_accept_encoding(quality-value-aware) strips unsupported encodings fromAccept-Encodingbefore proxying, so the origin never responds with a codec the pipeline cannot decode.Phase 3 — Fragment-safe script rewriters (google_tag_manager.rs, nextjs/script_rewriter.rs, nextjs/rsc_placeholders.rs)
lol_htmlnow fragments<script>text nodes at chunk boundaries.NextJsNextDataRewriterandGoogleTagManagerIntegrationaccumulate intermediate fragments into aMutex<String>buffer, suppress them viaRemoveNode, and emit the full rewritten text onis_last_in_text_node. The Mutex is uncontended (the pipeline is single-threaded;lol_html::HtmlRewriteris!Send) and is documented with the invariant that the buffer is empty at the start of every text node.NextJsRscPlaceholderRewriterdeliberately does not accumulate — the accompanyingNextJsHtmlPostProcessorre-parses the final document, so fragmented RSC scripts are safely deferred. Accumulating here would risk corrupting non-RSC scripts that happen to be fragmented.scriptselector that matches every inline script. To prevent GTM's finalReplace(accumulated)from overwriting an earlierNextJsNextDataRewriter::Replaceon the same element (lol_htmlruns overlappingtext!handlers sequentially; lasttext.replacewins), GTM only engages its accumulation path when the running text contains a GTM/GA marker or ends with a proper prefix of one of length ≥ 6 bytes ("google"— the common prefix ofgoogletagmanager.comandgoogle-analytics.com). Shorter tails (g,go,goo,goog,googl, plus common English/minified tokens likeimg,slug,config,thing) no longer claim ownership, so overlapping rewriters are preserved.HtmlWithPostProcessingbifurcates: when no post-processors are registered, chunks pass through immediately (the streaming win); when post-processors are registered, output accumulates in an internal buffer untilis_last, then post-processors run on the full document. UsesIntegrationRegistry::has_html_post_processors()(new, avoids cloning theArcvec).Phase 4 — Binary pass-through via
send_to_client()(publisher.rs)PublisherResponse::PassThrough { response, body }. The adapter reattaches the body and callssend_to_client(), preservingContent-Lengthand avoiding chunked-encoding overhead. Fastly streams the body from its internal buffer without copying into WASM memory.Routing classifier (classify_response_route)
PublisherResponseselection from(status, content_type, content_encoding, request_host, has_post_processors). Four outcomes:PassThrough— 2xx non-processable.Stream— processable content (2xx and non-2xx — branded 404/500 HTML and error JSON still get origin URL rewriting, matching pre-streaming behavior) with supported encoding and either non-HTML or no HTML post-processors.BufferedProcessed— HTML with post-processors registered (full document needed for injection).BufferedUnmodified— 204/205, empty request host with processable content (misconfig warning), or unsupportedContent-Encoding.Unsupported encodings and accept-encoding restriction
is_supported_content_encoding(known-list gate on""/identity/gzip/deflate/br) routes unsupported codecs (e.g.zstd) toBufferedUnmodifiedrather than feeding encoded bytes to the identity decoder.Files touched
Test plan
Shipped in this PR (all green under
cargo test --workspace,cargo clippy --workspace --all-targets --all-features -- -D warnings,cargo fmt --all -- --check):lol_html_fragments_text_across_chunk_boundariesverifieslol_htmlsplits text at chunk boundaries (the assumption all fragment-safe rewriters rely on).HtmlRewriterAdapter: streams per chunk, handles large input, toleratesreset()as a no-op, and emits output beforeis_last.(status, content-type, encoding, host, post-processors)combination, including 204/205 exclusion for processable content, non-2xx processable streaming, and unsupported-encoding fallback.NextJsNextDataRewriteraccumulation, no-URL passthrough, and GTM's prefix gate (might_contain_gtm_prefix_detects_full_match_and_boundary_prefix,might_contain_gtm_prefix_rejects_short_ambiguous_tails). End-to-end tests withchunk_size=32(small_chunk_next_data_rewrite_survives_fragmentation,small_chunk_gtm_rewrite_survives_fragmentation) force fragmentation through the full HTML pipeline.fragmented_next_data_survives_with_gtm_enabledandfragmented_next_data_with_trailing_g_survives_gtm(this latter exercises the short-g-tail overlap case from review P1) confirm NextJs rewrites are not clobbered by GTM on the same fragmented element.pass_through_preserves_body_and_content_lengthproving byte-for-byte identity.stream_publisher_bodyend-to-end: gzip CSS round-trip with origin rewriting.zstd), identity-only, quality-value handling (br, gzip;q=0).Coverage gaps from the initial audit now closed (commit ebbfae7):
small_chunk_rsc_push_survives_fragmentation_via_post_processor_fallback— fragmented__next_f.pushthrough the post-processor fallback (also the path suspected in the autoblog parse error).stream_publisher_body_handles_empty_body— empty origin body on the streaming route.stream_publisher_body_surfaces_mid_stream_decode_error— mid-stream decode error must return Err so the adapter can drop theStreamingBodyinstead of emitting malformed output.publisher_response_pass_through_reattach_preserves_bytes— walks the adapter's take-body/reattach pattern explicitly.buffered_processed_sets_content_length_from_processed_body— asserts the classifier → pipeline composition for HTML + post-processor.document_state_placeholders_substitute_through_accumulating_path—IntegrationDocumentStatewritten during streaming is read and consumed by the post-processor.Test totals: 848 workspace tests, all green. Clippy + fmt clean.
Rollout
Hostfalls back to the unchanged buffered path. Only pages that meet all streaming preconditions take the new code path.IntegrationRegistration, routes, or settings.IntegrationRegistry::has_html_post_processors()is a new additive method.