XNA Light Pre-Pass: Cascade Shadow Maps

Hi folks!

After a long time without opening VisualStudio (I deserved a little vacation after DBP), I’m back! I’ve submited my game (far from what I expected it to be), here are some screenshots:

I would like to thanks Virginia Broering, Rudi Bravo, Rafael Moraes, Felipe Frango, Rodrigo Cox, Marina Benites, Fabiano Lima, and specially Justin Kwok that provided me his old Xbox to test the game.

On this post I will talk about my implementation of Cascade Shadow Maps, a common technique used to handle directional light shadows. Some good descriptions can be found here and here. As before, the full source code for this sample can be found here: use it at your own risk!!

The basic idea is to split the view frustum into smaller volumes (usually smaller frustums or frusta), and to generate a shadow map for each volume. This way, we can have a better distribution of our precious shadow map’s texels into each region: the volume closer to the camera is smaller than the farthest one, so we have more shadow resolution closer to the camera. The drawback is that we have more draw calls/state changes, since we render to more than one shadow map. We can use shadow map atlasing to optimize things up, we just need to create a big shadow map that fits all our smaller shadow maps and offset the texel fetch on pixel shader.

You should measure/change/measure your application to decide the best number of splits (or cascades) and the resolution of the shadow maps. I’m using 3 cascades, at 1024×1024 each on PC and 640×640 on Xbox. Also, it’s a good idea to limit it to the main/dominant/call_it_as_you_like directional light, since it’s a performance hungry feature.

The steps needed to have it working are:

  • At initialization, create a big render target that fits all 3 shadow maps (if you want to change the sample to use 2 or 4 or whatever, go ahead). In PC, we should have a 3072×1024 texture. The texture’s format should be SINGLE (floating point), with a depth-buffer attached, and as before, we use the DISCARD_CONTENTS flag. We output the depth to that texture, in LINEAR space.
  • For each frame:
    • Bind the render target and clear it to white, ie the farthest value we can have;
    • Compute the range of each sub-frustum. You can use just a linear distribution, but it won’t provide you the best resolution distribution. I’m using a quadratic distribution, so the first frustum is way smaller than the last one;
    • For each cascade:
      • Compute an orthographic camera that fits the sub-frustum and points in the light’s direction. In my sample I’m using the technique “Fit to scene” described on the link above, where each new sub-frustum overlaps the previous ones. This way we can use some tricks to avoid shadow jittering when the camera moves. I didn’t like the results I have, since we lose lots of resolution when we use that trick, so I left a boolean to trigger it on/off. Compute the viewProjection matrix for this camera;
      • Compute the correct viewport for this sub-frustum;
      • Draw each mesh that is inside this sub-frustum, using the same technique we already use for spot light shadows, but with this new viewProjection matrix.
    • When rendering the directional light that has this cascade shadow map, choose the correct technique and send the parameters to the shader (the shadow map, the view-projection matrix for each sub-frustum and also its range. I put the ranges into a single vector3 (as I have 3 cascades).
    • The shader first compute the pixel position in view space (we need that for lighting anyway), and then we use its Z to pick the correct shadow map. I’m using just a single line to do that, take a look at the shader LightingLPP.fx;
    • Convert the pixel position from camera to world space, and then to light space. Fetch the correct shadow map texel (remember that we are using a big render target with all cascades, so we need to offset it according to the selected cascade). Do the depth comparison, I’m using a 4-sampler PCF to reduce aliasing.

That is it! Here is some screenshots of this sample:

See you next time!
J.Coluna

Posted in XNA | Tagged , , , | 8 Comments

XNA 4.0 Light Pre-Pass: Alpha masking

One important feature of any renderer implementation is the alpha-masking support. Vegetation, chains and wired fences would be a nightmare to model and also a candidate to be a triangle-hungry-monster. The idea of alpha-masking is to decide if a given pixel should be rendered or not using the alpha channel of a texture (we just need one channel, so we can store it on diffuse’s alpha). If the value is bigger then a threshold, we draw it, otherwise it is skipped.

With the introduction of full shader-based pipelines, even this basic behaviour should be implemented on pixel shader. On HLSL we must use the clip(value) function, that discards the pixel if value is negative. Note that the computation for that pixel is still performed, it is just not sent to the render target (or backbuffer). To effectively skip the processing, we could to use dynamic branching and the  notation for Xbox (I will not enter in details here).

In the pixel shader, all we need to do is  clip(diffuse.a – alphaReference), so our values lesser than alphaReference will evaluate a negative result, skipping the result. We can play with the alphaReference value in run-time, to make objects appear/disappear (useful for spawning effects).

Now the cool section: how to integrate it into my light pre pass pipeline (if you don’t know what I’m talking about, take a look at my old posts).  As usual, you can get the full source code here. Use it at your own risk!!

First, we need to do the alpha masking in 3 different stages:

  • when rendering to the G-Buffer;
  • when reconstructing the light;
  • when drawing to a shadow map.

Thinking ahead, we may need to mix alpha masking with fresnel/reflection/skinned meshes/multi-layer materials/etc, so its better to start using a solution to prevent something like “shader_fresnel_alpha_skinned_dual_layer.fx”. I introduce you…uber shaders!!

Uber shader is just a big shader that implements lots of behaviours (fresnel/reflection/etc), and the application decides which path to follow. I will use pre-processor parameters (#define/#ifdef) to construct the shader flow, since it’s a compile-time only process. I must confess I’m not a big fan of uber shaders, since sometimes the code gets messy, tricky to follow and not so human-readable, but for now I’m ok with it.

I’ve added the option to enable/disable alpha-masking and also the alphaReference value on 3DMax®, so we need to find a way to get that information and store into our processed mesh. To accomplish that, we need to make some changes on our Pipeline Processor (it took me a while to have it working properly, so accept this as a good gift :p ):

  • on the model processor (our LightPrePassProcessor.cs), we need to extract the alpha information on the original material, and store a list of “defines” (for now, I’m handling only alpha-masking, but the idea is to gather all kind of information like fresnel/reflection/etc). After that, we put this list into the material’s opaqueData, like “lppMaterial.OpaqueData.Add(“Defines”, listOfDefines);”;
  • we have to extend a material processor: I’ve created a class named LightPrePassMaterialProcessor to handle the “defines” we pushed from the first step, and send it to the effect processor;
  • we need also to extend the EffectProcessor, a job for LightPrePassFXProcessor class. It only reads the “defines” information stored into the context’s parameters and copy it to its “Defines” property.

With these steps working, we can focus on the shader itself. All we need to do is to put the alpha-checking inside “#ifdef ALPHA_MASKING …. #endif” region (ALPHA_MASKING is the key I chose for that, it’s on LightPrePassProcessor.cs). Here is a small snippet, from the technique to render to GBuffer:

#ifdef ALPHA_MASKED
//read our diffuse
half4 diffuseMap = tex2D(diffuseMapSampler, input.TexCoord);
clip(diffuseMap.a – AlphaReference);
#endif

 Note that as we don’t need the diffuse for the rest of this technique (remember we just output normals/depth on this technique), we can put the texture fetch inside the alpha mask region. We need also to support backface lighting, since almost anything that uses alpha-masking is not a closed mesh. To do that, we need to use the VFACE semantics (available only on SM3+) if we detect that macro, like this:

struct PixelShaderInput
{
float4 Position   : POSITION0;
float3 TexCoord   : TEXCOORD0;
float Depth    : TEXCOORD1;float3 Normal : TEXCOORD2;
float3 Tangent : TEXCOORD3;
float3 Binormal : TEXCOORD4;#ifdef ALPHA_MASKED
float Face : VFACE;
#endif
};

 

At this point you’ve probably got the idea of uber shaders (this is just the beginning, though). We need to extend it to the shadow map generation, including texture coordinates to the vertex input/output and performing the clip() inside the pixel shader. Remember also to set the culling to none on the technique declaration.

The trees on this sample were generated by Tree[d], an awesome free tool to generate trees.

I would like to thanks the guys that donated some money. It’s not about the money itself: to get to the point of donating anything, someone has read my blog, downloaded the code, run it, enjoyed it, returned to the blog, and clicked on the button to do a donation. This means that I’m doing a good job in sharing the knowledge, and it motivates me to continue this series of samples.

Thanks guys, see you!

J.Coluna

Posted in XNA | 28 Comments

XNA 4.0 Light Pre-Pass: casting shadows

It’s been a long time since my last post, but I didn’t give up on this blog. I was working really hard on a project aimed at DreamBuildPlay, but things didn’t go as I expected. It’s hard to keep everyone as motivated as you, and frustrating to see all your weeks of work being used in teapots and boxes (well, at least I have Crytek’s Sponza model).

Here is a screenshot of the editor I’ve been working on, with the game running inside it’s viewport:


It features cascade shadow maps for directional lights, spot lights (with shadows too), full HDR rendering+post processing (bloom, tone mapping and screen space lens flare), gamma corrected pipeline and SSAO, running at 60fps on XBox 360 (its a little vague since it depends on the overdraw, simultaneous shadow caster lights, etc, but it works very well).

Ok, let’s move on. On this post I will talk (and release the code as usual) about shadows, specifically for spot lights (the easiest). I’m using plain old and good shadow mapping, which basically consists in rendering the scene using the light point of view, storing the depth of each fragment on a texture (shadow map). Then, at the lighting stage, we recompute each pixel to be lit in that same light space and compare the Z of this pixel with the Z stored on the shadow map, lighting or shadowing that pixel. The full source code is here, use at your own risk.

Since I don’t want to use the PRESERVE_CONTENTS flag on any render target I use, I have to generate all shadow textures before the lighting stage begins: we cannot switch render targets from “shadow 0 -> lightAccumulation->shadow 1->lightAccumulation->etc”, otherwise we would lose its contents. The solution I’m using is:

  • at the initialization stage, create the render targets for the max simultaneous lights you wanna allow to cast shadows on a single frame (you can create them at different resolutions);
  • at the beginning of the rendering stage, determine visible lights;
  • sort these lights, using any heuristics; I choose something like the light’s radius divided by the distance to the camera;
  • select the highest rated lights as shadow casters, generate the shadows and assign them the shadow map+light view projection matrix (we could use the heuristics to select the shadow map resolution);
  • render meshes to the GBuffer as usual;
  • render the lights to the accumulation buffer, using the shadow information generated before

Remember that we should not draw the meshes culled by our main camera: for spot lights, we can compute a frustum using the spot cone angle and an aspect of 1.0f, and the light transform. Compare this frustum against all world’s meshes (or use any partitioning structure you like) and pick only the meshes that intersect it. The code for constructing that frustum is on Light.cs, method “UpdateSpotValues()”.

I’ve added another technique to the main shader (LPPMainEffect.fx) , that outputs the shadow for that model: I already had the technique to write to GBuffer and the one to reconstruct the lighting. This way makes easier to use some ubber shader tricks to allow alpha-masked geometry or skinned meshes, since we can use #defines to change the behavior of the three stages accordingly.

The result is here:

Soon I will port the optimizations I did on my engine: reconstruction of the Z-buffer and stencil tricks for optimized pixel lighting. I can also put the XBox version (it took me 15 minutes to fix the compilation problems, but I lost that version somewhere), although it has some useless per-frame allocations (aka iterators).

I hope you enjoy it. See ya!

J.Luna

Posted in XNA | Tagged , , , | 39 Comments

XNA 4.0 Light Pre-Pass: shedding the light

After my last post, I decided to continue with the z-reconstruct approach and to use light-meshes where possible. I’ve implemented two more light types: directional and spot. You can get the source code here.

Here is a screenshot of the lights in action, in the game editor I’m working on:

Some models are from Crytek Atrium Sponza Palace, and others from the XNA samples.

Directional Lights

The directional light is the easiest one: just render a full-screen quad and do the math. We don’t need to compute attenuation or light to pixel vectors (assuming our directional light only has parallel rays), so the shader is very simple. The only problem is the fill-rate: as directional lights usually don’t have attenuation, we always draw a full-screen quad for each directional light. To make it less expensive, I’m using depth test to reject fragments that only contains background values and thus don’t need to be lit. My fullscreen quad outputs z = 1, the farthest distance possible, and I set the depth test to Greater.

Here is a screenshot with a single directional light:

Spotlights

Spotlights are basically lights with a cone-shaped attenuation, like a desk lamp. In my code, I assume it behaves like a point light (has a world position) and also a direction. I don’t use the correct formulas to compute cone attenuation, falloff etc, like described here. Instead, I’ve developed my own (crazy) formula that works fine and doesn’t need pow or div and still gives a customizable falloff on cone’s border.

To render the spotlights I’ve created a cone mesh, with length = 1 and radius ~= 1. It is very coarse, about 30 triangles. I had to expand the radius to more than 1 unit, because the tessellation makes the distance between the edges and the center smaller than 1. With this mesh in hands, I could scale the length by light radius and the cone radius by light radius * tan(spot angle). All the math is in the source code, both the mesh scaling and the shading formula.

Here is a sequence of optimizations to reduce pixel processing:

From left to right, top to bottom:

  • Final image, single spot light;
  • Using a screen-aligned quad;
  • Using a sphere mesh;
  • Using a cone mesh (good);
  • Using z-test (better!);
  • Using clip() after attenuation: it saves the specular computing (two normalizes).

As you can see its a HUGE win to use the cone mesh, we save one order of magnitude of pixels in that case. We could even use a frustum-cone intersection to skip the light in the culling stage, or even a frustum-frustum (built-in on XNA) instead of the frustum-bounding sphere that I’m doing in the code.

Here are some screenshots of the sample I’m releasing:

What’s next?

I’m slowly importing my engine code into this project, so probably the next topic will be about skinned meshes and alpha masked objects, and then shadows or SSAO. Comments and replies are welcome.

Thanks to Crytek for making the Atrium Sponza Palace mesh and textures available to the public, and to the guys at Gamedev.net and App Hub forums.

See ya!

J.Luna

Posted in XNA | Tagged , , , , , , | 11 Comments

XNA 4.0 Light Pre-Pass: Optimization Round One

Following some suggestions on my previous posts, I decided to reconstruct the z-buffer from my linear depth buffer and optimize the lighting pass.

To achieve this, I did the following steps:

  • Changed the light-accumulation render target: now it has also a depth/stencil surface (DiscardContents);
  • Right after binding it, and before the light rendering, I draw a full-screen quad with a shader that outputs the z-buffer, using our linear depth buffer as input. I know its not precise, since we lost a lot of information close to the near plane, but this fake z-buffer is only used in the lighting stage, and with coarse light volumes. (I have some artifacts when the geometry and lights are close to far plane, maybe I can fix it using some bias);
  • Instead of drawing screen-aligned quads, now I’m using a convex mesh that fits the light volume (just a sphere, scaled by the light’s radius). I could switch between front-face or back-face culling, depending if the light volume touches the camera’s near plane or not, as seen here, but I left this to next time. I’ve inverted the winding order of my light-mesh, so I don’t need to change the culling state, and the depth compare function is set to GreaterEqual;
  • For each light, compute the appropriate WorldViewProjection matrix (using the scale and position of each light), set the light properties as usual and render. I’m using this technique to recompute the pixel view-space position.

Here is a comparison of the area being affected by the lights:

In a test with 500 lights (341 visible, the exact camera startup position in my project), using the screen-aligned technique takes draw:3ms and gpu:28ms approx. When I change to the mesh-based technique, those values decrease to  draw:1.7ms and gpu:16.7ms approx. The draw time is decreased because we don’t need to compute the screen-aligned quads anymore. Note that I don’t know if those measures are 100% correct, I’m using the technique described here, my CPU is an i5-430 and my GPU a HD5650

It proved to be a great step to improve performance, even with the z-reconstruct pass. I would like to see some results, critics and suggestions.

By the way, here is the full source.

See you next time!

J.Luna

Posted in XNA | Tagged , , , , , | 2 Comments

Reconstructing view-space position from depth

After my first LPP implementation, I decided to go for the mesh-based approach for rendering point lights. It consists in rendering a convex mesh (usually a sphere) that fits the light’s volume. This way we can use some depth/stencil tricks to reject pixels outside light range and save some pixel processing.

As I was using screen-aligned quads before, it was easy to compute each pixel view-space position using this. However, as I’m using a mesh now, I had to figure out a way to recreate the position using some projection tricks.

I didn’t find any good resource on google, so I decided to create my shaders and releasing them here, with some basic explanation.

First, I store the linear depth in the range [0..1], where 0 is at camera origin (not near plane) and 1 is at far plane, and all my code is valid for view-space and for perspective projection only.

Second, we need to send our current Tangent(camera_fovY / 2) and camera_aspect * Tangent(camera_fovY / 2) to our shader, like this:

_lighting.Parameters[“TanAspect”].SetValue(new Vector2(camera.TanFovy * camera.Aspect, -camera.TanFovy));

I’m negating camera.TanFovy due to some shader black magic (basically to avoid more negating operations inside the shader).

We need also to send our camera’s FarClip to the shader, and the WorldViewProjection for the current light-volume-mesh.

With this in hands we can use the concept of Similar Triangles and deduce that

posViewSpace.x = posClipSpace.x*TanAspect.x*depth;
posViewSpace.y = posClipSpace.y*TanAspect.y*depth;

where posClipSpace is in range [-1..1], and depth is in range [0..FarClip]. Here is the source, in HLSL (I’m using in my XNA project):

float2 PostProjectionSpaceToScreenSpace(float4 pos)
{
float2 screenPos = pos.xy / pos.w;
return (0.5f * (float2(screenPos.x, -screenPos.y) + 1));
}
struct VertexShaderOutputMeshBased
{
float4 Position : POSITION0;
float4 TexCoordScreenSpace : TEXCOORD0;
};
VertexShaderOutputMeshBased PointLightMeshVS(VertexShaderInput input)
{
VertexShaderOutputMeshBased output = (VertexShaderOutputMeshBased)0;
output.Position = mul(input.Position, WorldViewProjection);
//we will compute our texture coords based on pixel position further
output.TexCoordScreenSpace = output.Position;
return output;
}
float4 PointLightMeshPS(VertexShaderOutputMeshBased input) : COLOR0
{
//as we are using a sphere mesh, we need to recompute each pixel position
//into texture space coords. GBufferPixelSize is used to fetch the texel’s center
float2 screenPos = PostProjectionSpaceToScreenSpace(input.TexCoordScreenSpace) + GBufferPixelSize;
//read the depth value
float depthValue = tex2D(depthSampler, screenPos).r;
depthValue*=FarClip;
// Reconstruct position from the depth value, the FOV, aspect and pixel position
// Convert screenPos to [-1..1] range
// We negate the depthValue since it goes to -FarClip in view space
float3 pos = float3(TanAspect*(screenPos*2 – 1)*depthValue, -depthValue);
}

As you can see we don’t need to use a matrix multiplication inside the pixel shader for reconstruction the position, only muls, which helps in processing cost. Hope it helps. See ya!

Coluna

Posted in XNA | Tagged , , , , | 3 Comments

XNA 4.0 Light Pre-Pass

Introduction

The discussion between pros and cons of different techniques for real-time lighting has been running for years. Forward rendering, deferred shading and light pre-pass are some of the most famous techniques nowadays. Their definitions and variations can be found with a simple search on internet, with all the most complex mathematics, notations and formulas possible. Therefore I will not focus on this.

Also, I won’t defend one or another technique, because the best technique is the one that best fits your needs (in my case, the chosen one was light pre-pass, LPP for short).

Instead, I will show you a basic XNA 4.0 implementation of LPP algorithm, at first described by Wolfgang Engel (I don’t know if it was the first one ever, but it was the first one I’ve found), with the full source code (see the “Source Code” topic) and a simple FX file to help you export your models into the LPP pipeline. I will also comment the road I’ve run until I had it done, maybe you find yourself stuck in some of its bumps.

I assume you already know about those techniques, since its not a tutorial of how it works but how I’ve implemented it. Some good resources are on the reference section.

NOTE: if you want the latest source for this series, always visit the latest entry!

Motivation

In the Doom 3® era (2003-2004) I was an avid C++ / OpenGL game programmer, and I’ve implemented a forward renderer with per-pixel lights and stencil shadows. After that, I got into some Java + OpenGL and then to Unity and some third-party engines. My last roles were more like a tech-artist than a graphics programmer, so I started doing some XNA researching at home for fun and learning purposes.

A few months ago I decided to implement a deferred approach, and at first I chose the classic deferred-shading algorithm. My goals were:

  • to have a prototype running as fast as possible. If I wanted to have all features before hitting F5 (build and run), maybe I would not be writting this. Always start small and add functionalities as needed, instead of creating a huge monster that can’t even walk;
  • lots of per-pixel lights, some of them casting shadows;
  • some level of flexibility on light shading, like specular maps, rim and half-Lambert;
  • reflexive, emissive and Image-Based-Lighted materials;
  • HDR, DOF  and all kind of post-processing effects;
  • get it running at good frame rates on Xbox 360

After I had the first item done, I switched to LPP, to try to circumvent some problems I faced.

The basics

Both deferred shading and LPP relies on a GBuffer, that is a group of render targets that stores some screen-space information of our scene, like depth, normals, albedo, specular power, motion vectors and any another parameter you would need to reconstruct your lighting using only that buffer and the light information. This way, we don’t need to redraw all geometry for each light that intersects our scene: all information we need is already stored on that GBuffer. All we need is to draw some geometry that encapsulate our light volume (a bounding sphere, full screen quad or something like this), use the screen-space pixel position to fetch the GBuffer data and do the lighting. The LPP technique needs only a depth and a normal/specular power buffer, but it requires two geometry passes for the final shading. In the other hand, the classic deferred shading technique requires more buffers – at least the albedo buffer – but it can be done with single geometry pass.

Implementing

After researching about Xbox graphics programming and XNA, I assumed the following premises:

  • My GBuffer should fit into Xbox 10MB eDRAM, to avoid predicated tiling. That means that the more information we add to our GBuffer, the smaller its resolution should be;
  • Also, I want to avoid using the flag PreserveContents in my render targets, to avoid extra buffer copies (read a post from Shawn Hargreaves here);

In my deferred shading approach, to be able to have an ambient lighting with emissive maps, environment reflection and any other techniques with one single geometry pass, I assumed that I would need to have our accumulation buffer (the buffer that stores our final image) bound in the first pass, so each mesh could render its ambient light value, as well as its depth, normal, specular power and albedo (or diffuse) texture, as implemented in KillZone 2. I used only 32bit render targets (RT), as XNA requires that all RT bound at the same time must have the same depth and size. My layout was:

RT0: Accumulation buffer (HDR blendable), for ambient color, (and HW Depth+Stencil)
RT1: Depth - Single(float)
RT2: View Space Normal and Specular Power (R10G10B10A2), using normal encoding
RT3: Albedo (Diffuse) in RGB, Specular Level in Alpha (RGBA32)

That means that each GBuffer pixel needs 20 bytes, and the largest screen resolution would be 942 x 530 pixels. Even if I remove the accumulation buffer, the maximum screen GBuffer resolution would be 1024×576 (a lot of games run in sub-HD resolution, anyway), but I would lose the ambient output.

With this fat GBuffer, I was able to compute all shading drawing only simple geometry (mostly screen-aligned quads) for opaque geometry. The next step was the alpha and additive blend materials: a common approach is to switch to a forward solution, rendering on the accumulation buffer.

This is the result of my deferred renderer (diffuse, normal and depth on the top):

The problem here is that as we have already resolved (unbound) the accumulation buffer, the one that holds the HW z-buffer, and we are not using the Preserve Contents flag, we don’t have the z-buffer anymore: how can we z-test our transparent geometry if we don’t have the HW z-buffer? Maybe we could use the depth texture (which I believe that is slow)to perform the text, using clip(), fill the HW z-buffer again binding the depth texture and drawing a fullscreen quad (in this case, we could even render to a smaller render target, saving some fill-rate if it’s a bottleneck), rendering the geometry again (that is what we avoid when using the deferred approach) or use the Preserve Contents flag and measure if it’s a real concern.

As I didn’t have any milestone or release dates, I decided to switch to LPP and see how it compares to the deferred implementation.

The first difference is the GBuffer layout: we only need the depth and normals, so my buffer is right now

RT0: Depth (and HW Depth+Stencil)
RT1: View SPace Normal and Specular Power (R10G10B10A2)

We could store depth in 24 bits (as our HW already does), normal encoded in RG (8 bits each), specular in another 8 bits and have 16 bits (8 in the depth texture and 8 in the normal texture) free to store more information, like motion vectors, but I left this for the next level. With that layout, we could have a 1217 x 684 resolution, a 60% improvement over my deferred GBuffer. Another point to note is that our lighting equation is very simple (usually Phong or Blinn-Phong), using only normals and depth information.

Another one is the lighting reconstruction: in the deferred approach, the light equation gives us the final value for that pixel, e.g. , the interaction between the light properties (color, attenuation) and the pixel properties (normal, specular, diffuse). Let me show a simple example: if we have a gray pixel on our diffuse buffer, let me say RGB(0.1,0.1,0.1),  and the resulting light hitting that point is RGB(1,1,1), we will output RGB(0.1, 0.1, 0.1). If we have 10 lights like that affecting the same pixel, in the end we will have RGB(0.1, 0.1, 0.1)*10, a white pixel RGB(1, 1, 1). In my first LPP attempt I used a RGBA32 buffer,  with lighting values ranging from [0..1]. As we only output the lighting (without interaction with diffuse), with only one of the above lights we already reach the maximum supported value, a RGB(1, 1, 1). When we multiply it by the same gray texture, we have a gray output RGB(0.1, 0.1, 0.1) instead of that really saturated one from the deferred approach. What I did was switch to a higher-depth buffer a RGBA64 and put a constant multiplying my lighting output by 0.01f (magic number), so 100 white lighting values are needed before we have the same clamping situation described before. All we need to do is multiply our lighting texture value by 100 (the inverse of 0.01f) when reconstructing the shading.

The second one is that we have some flexibility on the second geometry pass: we can output the ambient contribution plus the lighting, and have environment reflection, emissive materials, rim light and a lot more. We can also have colored specular maps (that would require more channels in the deferred shading), a feature that artists love so much.

After this, we have back the z-buffer for transparent objects: as we had a second geometry pass to reconstruct the shading, we can start rendering the transparent objects as soon as we are done with the opaque ones.

Here is a screenshot of my current LPP implementation, with normal and depth buffers on top (note the specular color map on the lizard):

Source code

Here it is. Use it at your own risk!

I’m providing you a simple renderer, that given a camera, a list of lights and a list of meshes, returns you a texture with the final image. It was extracted from my current little engine, but I removed some features to keep it as simple as possible (like shadows and SSAO), and I’ve documented it as much as I could. It requires XNA 4.0 with VS2010 express, and a HiDef enabled computer. I won’t write lots of source code snippets here, since its available for download.

I’ve divided the solution into three projects: the extended content processor, the LPP renderer and the example itself (which also contains the assets).

Content Processor

The content processor works only with static meshes right now. It expects that the materials contain the following maps:

  • NormalMap: specular power in alpha. It will be multiplied by 100 in the GBuffer, so you have a range from [0..100] for your specular power;
  • DiffuseMap: default color;
  • SpecularMap: the light’s specular color will be multiplied by this map, use it for interesting effects;
  • EmissiveMap: it will be always output on final buffer, without any attenuation or modulation. Useful for LED small lights, e.g.;

I’m also providing a custom effect file, for using with 3D Studio Max. It’s based on the NVidia effect that is shipped with that software, and should be used in a DirectX material. Its located at “LightPrePass/LightPrePassContent/LightPrePassFX.fx”. It contains the same maps described above, so the content processor will match them into our renderer. If any of those maps weren’t found, the default textures will be used. They are located at “LightPrePass/LightPrePassContent/textures/”. If you want to import new models, remember to change the content processor to “LightPrePass Model Processor”, and to set “Premultiply Texture” to false (normal maps have information in the alpha channel not related to blending).

In this stage, a default effect is applied to all meshes: “LightPrePassContent/shaders/LPPMainEffect.fx”. It contains two techniques: one for outputting the depth and normals into the GBuffer, and another one for reconstructing the material’s shading. If you want different effects, just create new shaders that respect this rule.

Light Pre-Pass Renderer

The library contains only five classes:

QuadRenderer.cs

Used for rendering screen-aligned quads, like full-screen effects or billboards.

Camera.cs

Stores the minimum information needed for a rendering setup: world transform, projection transform and viewport. Its decoupled from any kind of controller, like in MVC pattern. It also contains a bounding frustum, built from the camera eye transform (inverse of world transform) multiplied by the projection matrix. It contains a very important method that projects a bounding sphere into a clipping rectangle. This is very useful for rendering our point lights: it give us the smallest screen-aligned quad that fits the light’s bounding sphere, saving a lot of pixel processing on unlit fragments.

Mesh.cs

Encapsulate a default XNA model, plus a world transform and some methods for rendering. It assumes that the effect is the one mentioned in the Content Processor topic (or a variation that follow its rules). We could optimize this class storing the effect’s parameters instead of fetching from the dictionary every frame.

Light.cs

Encapsulate some light’s properties, like radius, color, intensity (useful when you want some of the color channels brighter than 255), world transform and type. Right now we only support Point lights.

Renderer.cs

This is the main class. It’s responsible for all rendering steps, starting with the GBuffer creation up to resolving the final target. Initially, it loads the default effects (one for clearing the GBuffer and one for performing the lighting) and creates the GBuffer.

We have the following render targets, all of them with the Discard Contents flag:

Depth (Single) : stores the pixel’s depth value in linear space. It has a HW z+stencil attached. Its part of the GBuffer;
Normal (R10G10B10A2): stores the pixel’s normal in view-space, using stereographic projection, on the R and G channels. The blue channel stores the specular power. Its part of the GBuffer;
Light accumulation (RGBA64): stores the sum of all lights contribution, with diffuse color in RGB and specular intensity on A;
Output (HDRBlendable): stores the final shaded image. It has a HW z+stencil attached.

For each frame, we have the following steps:

  • Compute the frustum corners: I’m using the technique that uses the frustum corners and the depth buffer to recompute each pixel’s position in view space, so I compute it here as the camera won’t change during this frame;
  • Bind and clear the GBuffer, clear HW z-buffer too: we set our depth texture and HW z to 1.0f, the farthest distance possible;
  • First geometry pass: render all opaque meshes, outputting depth and normals;
  • Resolve the GBuffer, bind the light accumulation render target and clear it to black. I’m using a 16bit/channel render target here, so we have some room for playing with precision and light intensities;
  • For each light in the scene, compute the smallest screen-aligned quad that fits it, recompute the frustum corners for this smaller quad, apply light parameters (color, intensity, radius) and draw it, with additive blending;
  • Resolve the light accumulation render target, bind the output render target and clear it;
  • Second geometry pass: render all opaque meshes again, now using the effect that reconstructs the object shading using the light accumulation render target. As we have the z-buffer filled at this point, we could draw unlit geometry (background, skyboxes) and transparent objects like trails, flares, fire, explosions, etc;
  • Resolve the output render target, return it to the main application.

Example Project

There is a simple project that uses the renderer. The class LightPrePass is our main class, that loads a model (the lizard found in the XNA normal mapping sample) and assign to our Mesh class. It has two helper classes: the first one is a CameraController, that takes care of handling user input and moving the camera around the scene. The second one is the MovingLight, a simple class that constantly update a light’s position. You could toggle the GBuffer output, the light’s movement and the light’s screen-quad using your xbox controller (if it’s plugged on your PC) or your keyboard.

Here we can see the screen-aligned clipping quads in action: observe how many pixels are skipped for each light due to this optimization.

Here we can see the GBuffer: the first one is the normal buffer in view space (it has some weird colors since we are using encoding on RG channels and the specular power on B). The second one is the depth buffer, and the third one is the light accumulation. Due to our trick to avoid clamping, its intensity is not perceived on this window. Note the emissive map in action on the lizard’s spikes.

Conclusion

I’m very satisfied with the results I have. The renderer can handle lots of lights, deal with a good variety of materials, transparent objects and it is still a very simple class. I hope I can see some suggestions and improvements, and some derivative samples using this little example.

Further Improvements

I already have some of the following topics done on my engine (the bold ones), so I can implement them easily if I have enough requests. Anyway, here are some interesting fields to continue from this point on:

  • Convert it into a XBOX360 project ( I don’t have one actually, sorry );
  • Profiling/optimizations;
  • Skinned pipeline;
  • Tile-based light culling;
  • Shadows, SSAO;
  • Different light types;
  • Effects that needs the final buffer + depth: glass, water, heat;
  • Use effect pre-processor to optimize shaders that don’t need specular color/emissive/normal maps, or  to choose between different ambient or shading reconstruction techniques;

See you next time!

Coluna

References

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html

http://mquandt.com/blog/2010/03/light-pre-pass-round-2/

http://forums.create.msdn.com/forums/t/70326.aspx

http://forums.create.msdn.com/forums/t/26870.aspx

http://create.msdn.com/en-US/education/catalog/sample/normal_mapping

http://aras-p.info/texts/CompactNormalStorage.html

http://mynameismjp.wordpress.com/2009/03/10/reconstructing-position-from-depth/

http://msdn.microsoft.com/en-us/library/bb447672%28v=xnagamestudio.10%29.aspx

http://www.catalinzima.com/tutorials/deferred-rendering-in-xna/creating-the-g-buffer/

http://developer.valvesoftware.com/wiki/Half_Lambert

http://www.gamasutra.com/view/feature/2942/the_mechanics_of_robust_stencil_.php?page=6

Posted in XNA | Tagged , , , , | 38 Comments