The discussion between pros and cons of different techniques for real-time lighting has been running for years. Forward rendering, deferred shading and light pre-pass are some of the most famous techniques nowadays. Their definitions and variations can be found with a simple search on internet, with all the most complex mathematics, notations and formulas possible. Therefore I will not focus on this.
Also, I won’t defend one or another technique, because the best technique is the one that best fits your needs (in my case, the chosen one was light pre-pass, LPP for short).
Instead, I will show you a basic XNA 4.0 implementation of LPP algorithm, at first described by Wolfgang Engel (I don’t know if it was the first one ever, but it was the first one I’ve found), with the full source code (see the “Source Code” topic) and a simple FX file to help you export your models into the LPP pipeline. I will also comment the road I’ve run until I had it done, maybe you find yourself stuck in some of its bumps.
I assume you already know about those techniques, since its not a tutorial of how it works but how I’ve implemented it. Some good resources are on the reference section.
NOTE: if you want the latest source for this series, always visit the latest entry!
In the Doom 3® era (2003-2004) I was an avid C++ / OpenGL game programmer, and I’ve implemented a forward renderer with per-pixel lights and stencil shadows. After that, I got into some Java + OpenGL and then to Unity and some third-party engines. My last roles were more like a tech-artist than a graphics programmer, so I started doing some XNA researching at home for fun and learning purposes.
A few months ago I decided to implement a deferred approach, and at first I chose the classic deferred-shading algorithm. My goals were:
- to have a prototype running as fast as possible. If I wanted to have all features before hitting F5 (build and run), maybe I would not be writting this. Always start small and add functionalities as needed, instead of creating a huge monster that can’t even walk;
- lots of per-pixel lights, some of them casting shadows;
- some level of flexibility on light shading, like specular maps, rim and half-Lambert;
- reflexive, emissive and Image-Based-Lighted materials;
- HDR, DOF and all kind of post-processing effects;
- get it running at good frame rates on Xbox 360
After I had the first item done, I switched to LPP, to try to circumvent some problems I faced.
Both deferred shading and LPP relies on a GBuffer, that is a group of render targets that stores some screen-space information of our scene, like depth, normals, albedo, specular power, motion vectors and any another parameter you would need to reconstruct your lighting using only that buffer and the light information. This way, we don’t need to redraw all geometry for each light that intersects our scene: all information we need is already stored on that GBuffer. All we need is to draw some geometry that encapsulate our light volume (a bounding sphere, full screen quad or something like this), use the screen-space pixel position to fetch the GBuffer data and do the lighting. The LPP technique needs only a depth and a normal/specular power buffer, but it requires two geometry passes for the final shading. In the other hand, the classic deferred shading technique requires more buffers – at least the albedo buffer – but it can be done with single geometry pass.
After researching about Xbox graphics programming and XNA, I assumed the following premises:
- My GBuffer should fit into Xbox 10MB eDRAM, to avoid predicated tiling. That means that the more information we add to our GBuffer, the smaller its resolution should be;
- Also, I want to avoid using the flag PreserveContents in my render targets, to avoid extra buffer copies (read a post from Shawn Hargreaves here);
In my deferred shading approach, to be able to have an ambient lighting with emissive maps, environment reflection and any other techniques with one single geometry pass, I assumed that I would need to have our accumulation buffer (the buffer that stores our final image) bound in the first pass, so each mesh could render its ambient light value, as well as its depth, normal, specular power and albedo (or diffuse) texture, as implemented in KillZone 2. I used only 32bit render targets (RT), as XNA requires that all RT bound at the same time must have the same depth and size. My layout was:
RT0: Accumulation buffer (HDR blendable), for ambient color, (and HW Depth+Stencil)
RT1: Depth - Single(float)
RT2: View Space Normal and Specular Power (R10G10B10A2), using normal encoding
RT3: Albedo (Diffuse) in RGB, Specular Level in Alpha (RGBA32)
That means that each GBuffer pixel needs 20 bytes, and the largest screen resolution would be 942 x 530 pixels. Even if I remove the accumulation buffer, the maximum screen GBuffer resolution would be 1024×576 (a lot of games run in sub-HD resolution, anyway), but I would lose the ambient output.
With this fat GBuffer, I was able to compute all shading drawing only simple geometry (mostly screen-aligned quads) for opaque geometry. The next step was the alpha and additive blend materials: a common approach is to switch to a forward solution, rendering on the accumulation buffer.
This is the result of my deferred renderer (diffuse, normal and depth on the top):
The problem here is that as we have already resolved (unbound) the accumulation buffer, the one that holds the HW z-buffer, and we are not using the Preserve Contents flag, we don’t have the z-buffer anymore: how can we z-test our transparent geometry if we don’t have the HW z-buffer? Maybe we could use the depth texture (which I believe that is slow)to perform the text, using clip(), fill the HW z-buffer again binding the depth texture and drawing a fullscreen quad (in this case, we could even render to a smaller render target, saving some fill-rate if it’s a bottleneck), rendering the geometry again (that is what we avoid when using the deferred approach) or use the Preserve Contents flag and measure if it’s a real concern.
As I didn’t have any milestone or release dates, I decided to switch to LPP and see how it compares to the deferred implementation.
The first difference is the GBuffer layout: we only need the depth and normals, so my buffer is right now
RT0: Depth (and HW Depth+Stencil)
RT1: View SPace Normal and Specular Power (R10G10B10A2)
We could store depth in 24 bits (as our HW already does), normal encoded in RG (8 bits each), specular in another 8 bits and have 16 bits (8 in the depth texture and 8 in the normal texture) free to store more information, like motion vectors, but I left this for the next level. With that layout, we could have a 1217 x 684 resolution, a 60% improvement over my deferred GBuffer. Another point to note is that our lighting equation is very simple (usually Phong or Blinn-Phong), using only normals and depth information.
Another one is the lighting reconstruction: in the deferred approach, the light equation gives us the final value for that pixel, e.g. , the interaction between the light properties (color, attenuation) and the pixel properties (normal, specular, diffuse). Let me show a simple example: if we have a gray pixel on our diffuse buffer, let me say RGB(0.1,0.1,0.1), and the resulting light hitting that point is RGB(1,1,1), we will output RGB(0.1, 0.1, 0.1). If we have 10 lights like that affecting the same pixel, in the end we will have RGB(0.1, 0.1, 0.1)*10, a white pixel RGB(1, 1, 1). In my first LPP attempt I used a RGBA32 buffer, with lighting values ranging from [0..1]. As we only output the lighting (without interaction with diffuse), with only one of the above lights we already reach the maximum supported value, a RGB(1, 1, 1). When we multiply it by the same gray texture, we have a gray output RGB(0.1, 0.1, 0.1) instead of that really saturated one from the deferred approach. What I did was switch to a higher-depth buffer a RGBA64 and put a constant multiplying my lighting output by 0.01f (magic number), so 100 white lighting values are needed before we have the same clamping situation described before. All we need to do is multiply our lighting texture value by 100 (the inverse of 0.01f) when reconstructing the shading.
The second one is that we have some flexibility on the second geometry pass: we can output the ambient contribution plus the lighting, and have environment reflection, emissive materials, rim light and a lot more. We can also have colored specular maps (that would require more channels in the deferred shading), a feature that artists love so much.
After this, we have back the z-buffer for transparent objects: as we had a second geometry pass to reconstruct the shading, we can start rendering the transparent objects as soon as we are done with the opaque ones.
Here is a screenshot of my current LPP implementation, with normal and depth buffers on top (note the specular color map on the lizard):
Here it is. Use it at your own risk!
I’m providing you a simple renderer, that given a camera, a list of lights and a list of meshes, returns you a texture with the final image. It was extracted from my current little engine, but I removed some features to keep it as simple as possible (like shadows and SSAO), and I’ve documented it as much as I could. It requires XNA 4.0 with VS2010 express, and a HiDef enabled computer. I won’t write lots of source code snippets here, since its available for download.
I’ve divided the solution into three projects: the extended content processor, the LPP renderer and the example itself (which also contains the assets).
The content processor works only with static meshes right now. It expects that the materials contain the following maps:
- NormalMap: specular power in alpha. It will be multiplied by 100 in the GBuffer, so you have a range from [0..100] for your specular power;
- DiffuseMap: default color;
- SpecularMap: the light’s specular color will be multiplied by this map, use it for interesting effects;
- EmissiveMap: it will be always output on final buffer, without any attenuation or modulation. Useful for LED small lights, e.g.;
I’m also providing a custom effect file, for using with 3D Studio Max. It’s based on the NVidia effect that is shipped with that software, and should be used in a DirectX material. Its located at “LightPrePass/LightPrePassContent/LightPrePassFX.fx”. It contains the same maps described above, so the content processor will match them into our renderer. If any of those maps weren’t found, the default textures will be used. They are located at “LightPrePass/LightPrePassContent/textures/”. If you want to import new models, remember to change the content processor to “LightPrePass Model Processor”, and to set “Premultiply Texture” to false (normal maps have information in the alpha channel not related to blending).
In this stage, a default effect is applied to all meshes: “LightPrePassContent/shaders/LPPMainEffect.fx”. It contains two techniques: one for outputting the depth and normals into the GBuffer, and another one for reconstructing the material’s shading. If you want different effects, just create new shaders that respect this rule.
Light Pre-Pass Renderer
The library contains only five classes:
Used for rendering screen-aligned quads, like full-screen effects or billboards.
Stores the minimum information needed for a rendering setup: world transform, projection transform and viewport. Its decoupled from any kind of controller, like in MVC pattern. It also contains a bounding frustum, built from the camera eye transform (inverse of world transform) multiplied by the projection matrix. It contains a very important method that projects a bounding sphere into a clipping rectangle. This is very useful for rendering our point lights: it give us the smallest screen-aligned quad that fits the light’s bounding sphere, saving a lot of pixel processing on unlit fragments.
Encapsulate a default XNA model, plus a world transform and some methods for rendering. It assumes that the effect is the one mentioned in the Content Processor topic (or a variation that follow its rules). We could optimize this class storing the effect’s parameters instead of fetching from the dictionary every frame.
Encapsulate some light’s properties, like radius, color, intensity (useful when you want some of the color channels brighter than 255), world transform and type. Right now we only support Point lights.
This is the main class. It’s responsible for all rendering steps, starting with the GBuffer creation up to resolving the final target. Initially, it loads the default effects (one for clearing the GBuffer and one for performing the lighting) and creates the GBuffer.
We have the following render targets, all of them with the Discard Contents flag:
Depth (Single) : stores the pixel’s depth value in linear space. It has a HW z+stencil attached. Its part of the GBuffer;
Normal (R10G10B10A2): stores the pixel’s normal in view-space, using stereographic projection, on the R and G channels. The blue channel stores the specular power. Its part of the GBuffer;
Light accumulation (RGBA64): stores the sum of all lights contribution, with diffuse color in RGB and specular intensity on A;
Output (HDRBlendable): stores the final shaded image. It has a HW z+stencil attached.
For each frame, we have the following steps:
- Compute the frustum corners: I’m using the technique that uses the frustum corners and the depth buffer to recompute each pixel’s position in view space, so I compute it here as the camera won’t change during this frame;
- Bind and clear the GBuffer, clear HW z-buffer too: we set our depth texture and HW z to 1.0f, the farthest distance possible;
- First geometry pass: render all opaque meshes, outputting depth and normals;
- Resolve the GBuffer, bind the light accumulation render target and clear it to black. I’m using a 16bit/channel render target here, so we have some room for playing with precision and light intensities;
- For each light in the scene, compute the smallest screen-aligned quad that fits it, recompute the frustum corners for this smaller quad, apply light parameters (color, intensity, radius) and draw it, with additive blending;
- Resolve the light accumulation render target, bind the output render target and clear it;
- Second geometry pass: render all opaque meshes again, now using the effect that reconstructs the object shading using the light accumulation render target. As we have the z-buffer filled at this point, we could draw unlit geometry (background, skyboxes) and transparent objects like trails, flares, fire, explosions, etc;
- Resolve the output render target, return it to the main application.
There is a simple project that uses the renderer. The class LightPrePass is our main class, that loads a model (the lizard found in the XNA normal mapping sample) and assign to our Mesh class. It has two helper classes: the first one is a CameraController, that takes care of handling user input and moving the camera around the scene. The second one is the MovingLight, a simple class that constantly update a light’s position. You could toggle the GBuffer output, the light’s movement and the light’s screen-quad using your xbox controller (if it’s plugged on your PC) or your keyboard.
Here we can see the screen-aligned clipping quads in action: observe how many pixels are skipped for each light due to this optimization.
Here we can see the GBuffer: the first one is the normal buffer in view space (it has some weird colors since we are using encoding on RG channels and the specular power on B). The second one is the depth buffer, and the third one is the light accumulation. Due to our trick to avoid clamping, its intensity is not perceived on this window. Note the emissive map in action on the lizard’s spikes.
I’m very satisfied with the results I have. The renderer can handle lots of lights, deal with a good variety of materials, transparent objects and it is still a very simple class. I hope I can see some suggestions and improvements, and some derivative samples using this little example.
I already have some of the following topics done on my engine (the bold ones), so I can implement them easily if I have enough requests. Anyway, here are some interesting fields to continue from this point on:
- Convert it into a XBOX360 project ( I don’t have one actually, sorry );
- Skinned pipeline;
- Tile-based light culling;
- Shadows, SSAO;
- Different light types;
- Effects that needs the final buffer + depth: glass, water, heat;
- Use effect pre-processor to optimize shaders that don’t need specular color/emissive/normal maps, or to choose between different ambient or shading reconstruction techniques;
See you next time!