Linear frustum cull — one pass over all splats, testing each
centre against the camera frustum in camera-space. No BVH or
per-splat AABB precomputation.
Sort visible splats globally by camera-space z (front-to-back).
Split image into a grid of sub-frames (each ≤ MAX_SUB_FRAME_TILES
in either dimension) and render each independently. The rasterizer
retains its per-sub-frame running state; the project shader's
group-AABB cull skips splats outside the current sub-frame. The
CPU depth sort is global, so depth ordering is consistent across
sub-frames and there are no boundary seams.
Per-chunk (GPU tile-bin pipeline, fully GPU-resident): project (writes
per-splat tile coverage) → prefix-sum (writes emitOffsets + totalPairs)
→ emit-pairs → prepare-indirect → radix sortIndirect (key + value:
tile keys sorted, splat indices reordered) → init-tile-offsets →
find-boundaries (atomicMin) → rasterize each tile's slice in depth
order. No per-chunk CPU readbacks.
Render a splat scene to an RGBA byte buffer.
Whole-image pipeline:
Per-chunk (GPU tile-bin pipeline, fully GPU-resident): project (writes per-splat tile coverage) → prefix-sum (writes emitOffsets + totalPairs) → emit-pairs → prepare-indirect → radix sortIndirect (key + value: tile keys sorted, splat indices reordered) → init-tile-offsets → find-boundaries (atomicMin) → rasterize each tile's slice in depth order. No per-chunk CPU readbacks.