Skip to content

perf ticket 008: visibility buffer (Nanite-style) replaces 4-MRT G-buffer #27

@proggeramlug

Description

@proggeramlug

Deferred perf ticket — see docs/perf/008-visibility-buffer.md.

Summary

Replace Bloom's current 4-MRT G-buffer (18 bytes/pixel written per fragment) with a Nanite-style visibility buffer: store only (triangle_id, u, v, mesh_id) at ~8 bytes/pixel, defer full PBR shading to a second pass that fetches vertex data from shared storage buffers. Expected gain: ≥ 50% fragment-bandwidth reduction, plus "every visible pixel shades exactly once" when combined with depth prepass.

Why deferred

Real GPU bandwidth win (~14 MB/frame saved at 1600×900 × overdraw factor, on a benchmark that currently writes 26 MB/pass) but invisible behind the vsync cap on Sponza. Main perf target (60 fps at full visual quality) is already met; any further bandwidth reduction just gives headroom we can't measure on the current benchmark machine.

Reopen criteria

  • A target scene pushes past the 16.7 ms vsync ceiling on the benchmark machine. Remaining GPU-side lever for bandwidth-bound scenes.
  • Integrated / mobile GPUs become a priority. Bandwidth matters disproportionately more on tile-based and integrated hardware; this ticket is the single biggest available reduction.
  • Overdraw-heavy scenes (foliage, hair, transparent-dense particles) become the target.

Prerequisites

  • Ticket 009 (unified vertex + index buffers + per-mesh descriptor buffer) is a hard prerequisite — the shading pass needs a single bindless-style fetch across all meshes.
  • Ticket 005 (depth prepass) becomes useful again at that point; land alongside.

Effort

~2+ weeks for the baseline redesign: main_hdr_pass output becomes Rgba32Uint (tri_id, u, v, mesh_id) only, new shading pass evaluates PBR from storage-buffer vertex fetches, downstream MRT consumers (SSR / SSGI / SSAO / post-FX) rewired to read from the rebuilt material channels.

Quick-win intermediate (still deferred, ~2 days)

The ticket also documents a simpler intermediate step: drop unused MRTs when the dependent post-FX is disabled (velocity_rt only needed with TAA / motion blur; albedo_rt only needed with SSGI / SSR; material_rt only needed for SSR). That's a 30-50 % MRT bandwidth cut specifically for low-quality modes on integrated hardware — worth doing when targeting those adapters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions