Following some suggestions on my previous posts, I decided to reconstruct the z-buffer from my linear depth buffer and optimize the lighting pass.
To achieve this, I did the following steps:
- Changed the light-accumulation render target: now it has also a depth/stencil surface (DiscardContents);
- Right after binding it, and before the light rendering, I draw a full-screen quad with a shader that outputs the z-buffer, using our linear depth buffer as input. I know its not precise, since we lost a lot of information close to the near plane, but this fake z-buffer is only used in the lighting stage, and with coarse light volumes. (I have some artifacts when the geometry and lights are close to far plane, maybe I can fix it using some bias);
- Instead of drawing screen-aligned quads, now I’m using a convex mesh that fits the light volume (just a sphere, scaled by the light’s radius). I could switch between front-face or back-face culling, depending if the light volume touches the camera’s near plane or not, as seen here, but I left this to next time. I’ve inverted the winding order of my light-mesh, so I don’t need to change the culling state, and the depth compare function is set to GreaterEqual;
- For each light, compute the appropriate WorldViewProjection matrix (using the scale and position of each light), set the light properties as usual and render. I’m using this technique to recompute the pixel view-space position.
Here is a comparison of the area being affected by the lights:
In a test with 500 lights (341 visible, the exact camera startup position in my project), using the screen-aligned technique takes draw:3ms and gpu:28ms approx. When I change to the mesh-based technique, those values decrease to draw:1.7ms and gpu:16.7ms approx. The draw time is decreased because we don’t need to compute the screen-aligned quads anymore. Note that I don’t know if those measures are 100% correct, I’m using the technique described here, my CPU is an i5-430 and my GPU a HD5650
It proved to be a great step to improve performance, even with the z-reconstruct pass. I would like to see some results, critics and suggestions.
By the way, here is the full source.
See you next time!