Skip to content

perf ticket 009: GPU-driven rendering — indirect multi-draw + GPU cull #28

@proggeramlug

Description

@proggeramlug

Deferred perf ticket — see docs/perf/009-gpu-driven-rendering.md.

Summary

Replace the scene graph's per-mesh CPU draw loop (one set_bind_group + draw_indexed per mesh, ~340 calls/frame on Sponza across shadow + main + depth-prepass passes) with a single draw_indexed_indirect_count call backed by a GPU-side frustum-cull compute pass. Collapses to one draw per render pass, regardless of mesh count.

Why deferred

Pure CPU-side optimization on a GPU-bound benchmark. The perf README's own rule of thumb: "Sponza is GPU-bound, not CPU-bound. Don't chase CPU micro-optimizations expecting FPS improvement." Render-total CPU is already ~4 ms against a 16.7 ms vsync budget after the landed 001-017 wins (uniform pool, frustum cull, matrix-inverse cache, shadow cascade cache). Shaving another ~600 µs won't move FPS on Sponza — we'd be optimizing a resource we already have in surplus.

Reopen criteria

  • A CPU-bound scene arrives — 10 000+ mesh count, many small static props, or CPU-expensive per-frame state updates pushing render_total CPU past the vsync budget.
  • Ticket 008 (visibility buffer) reopens. 008's shading pass needs a shared vertex/index buffer + per-mesh descriptor buffer — exactly what this ticket builds. Hard prerequisite in that direction.
  • Bindless texture support lands in wgpu. The current "one set_bind_group per draw" pattern is partly about per-material texture binds. Bindless makes indirect multi-draw a straightforward win without the material-binding workarounds the ticket's notes describe.

Effort

~1 week for the baseline draw_indexed_indirect_count path with GPU frustum cull. Material indirection still requires either bindless (not widely supported in wgpu 29) or a texture-array trick — that's where the risk sits, and why it's scoped at "week" not "days."

Files

  • native/shared/src/renderer/mod.rs — shared VB/IB, descriptor buffer, GPU cull compute shader, render pass using draw_indexed_indirect_count.
  • native/shared/src/scene.rs — reworking of per-node GPU resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions