XNA 4.0 Light Pre-Pass

Introduction

The discussion between pros and cons of different techniques for real-time lighting has been running for years. Forward rendering, deferred shading and light pre-pass are some of the most famous techniques nowadays. Their definitions and variations can be found with a simple search on internet, with all the most complex mathematics, notations and formulas possible. Therefore I will not focus on this.

Also, I won’t defend one or another technique, because the best technique is the one that best fits your needs (in my case, the chosen one was light pre-pass, LPP for short).

Instead, I will show you a basic XNA 4.0 implementation of LPP algorithm, at first described by Wolfgang Engel (I don’t know if it was the first one ever, but it was the first one I’ve found), with the full source code (see the “Source Code” topic) and a simple FX file to help you export your models into the LPP pipeline. I will also comment the road I’ve run until I had it done, maybe you find yourself stuck in some of its bumps.

I assume you already know about those techniques, since its not a tutorial of how it works but how I’ve implemented it. Some good resources are on the reference section.

NOTE: if you want the latest source for this series, always visit the latest entry!

Motivation

In the Doom 3® era (2003-2004) I was an avid C++ / OpenGL game programmer, and I’ve implemented a forward renderer with per-pixel lights and stencil shadows. After that, I got into some Java + OpenGL and then to Unity and some third-party engines. My last roles were more like a tech-artist than a graphics programmer, so I started doing some XNA researching at home for fun and learning purposes.

A few months ago I decided to implement a deferred approach, and at first I chose the classic deferred-shading algorithm. My goals were:

  • to have a prototype running as fast as possible. If I wanted to have all features before hitting F5 (build and run), maybe I would not be writting this. Always start small and add functionalities as needed, instead of creating a huge monster that can’t even walk;
  • lots of per-pixel lights, some of them casting shadows;
  • some level of flexibility on light shading, like specular maps, rim and half-Lambert;
  • reflexive, emissive and Image-Based-Lighted materials;
  • HDR, DOF  and all kind of post-processing effects;
  • get it running at good frame rates on Xbox 360

After I had the first item done, I switched to LPP, to try to circumvent some problems I faced.

The basics

Both deferred shading and LPP relies on a GBuffer, that is a group of render targets that stores some screen-space information of our scene, like depth, normals, albedo, specular power, motion vectors and any another parameter you would need to reconstruct your lighting using only that buffer and the light information. This way, we don’t need to redraw all geometry for each light that intersects our scene: all information we need is already stored on that GBuffer. All we need is to draw some geometry that encapsulate our light volume (a bounding sphere, full screen quad or something like this), use the screen-space pixel position to fetch the GBuffer data and do the lighting. The LPP technique needs only a depth and a normal/specular power buffer, but it requires two geometry passes for the final shading. In the other hand, the classic deferred shading technique requires more buffers – at least the albedo buffer – but it can be done with single geometry pass.

Implementing

After researching about Xbox graphics programming and XNA, I assumed the following premises:

  • My GBuffer should fit into Xbox 10MB eDRAM, to avoid predicated tiling. That means that the more information we add to our GBuffer, the smaller its resolution should be;
  • Also, I want to avoid using the flag PreserveContents in my render targets, to avoid extra buffer copies (read a post from Shawn Hargreaves here);

In my deferred shading approach, to be able to have an ambient lighting with emissive maps, environment reflection and any other techniques with one single geometry pass, I assumed that I would need to have our accumulation buffer (the buffer that stores our final image) bound in the first pass, so each mesh could render its ambient light value, as well as its depth, normal, specular power and albedo (or diffuse) texture, as implemented in KillZone 2. I used only 32bit render targets (RT), as XNA requires that all RT bound at the same time must have the same depth and size. My layout was:

RT0: Accumulation buffer (HDR blendable), for ambient color, (and HW Depth+Stencil)
RT1: Depth - Single(float)
RT2: View Space Normal and Specular Power (R10G10B10A2), using normal encoding
RT3: Albedo (Diffuse) in RGB, Specular Level in Alpha (RGBA32)

That means that each GBuffer pixel needs 20 bytes, and the largest screen resolution would be 942 x 530 pixels. Even if I remove the accumulation buffer, the maximum screen GBuffer resolution would be 1024×576 (a lot of games run in sub-HD resolution, anyway), but I would lose the ambient output.

With this fat GBuffer, I was able to compute all shading drawing only simple geometry (mostly screen-aligned quads) for opaque geometry. The next step was the alpha and additive blend materials: a common approach is to switch to a forward solution, rendering on the accumulation buffer.

This is the result of my deferred renderer (diffuse, normal and depth on the top):

The problem here is that as we have already resolved (unbound) the accumulation buffer, the one that holds the HW z-buffer, and we are not using the Preserve Contents flag, we don’t have the z-buffer anymore: how can we z-test our transparent geometry if we don’t have the HW z-buffer? Maybe we could use the depth texture (which I believe that is slow)to perform the text, using clip(), fill the HW z-buffer again binding the depth texture and drawing a fullscreen quad (in this case, we could even render to a smaller render target, saving some fill-rate if it’s a bottleneck), rendering the geometry again (that is what we avoid when using the deferred approach) or use the Preserve Contents flag and measure if it’s a real concern.

As I didn’t have any milestone or release dates, I decided to switch to LPP and see how it compares to the deferred implementation.

The first difference is the GBuffer layout: we only need the depth and normals, so my buffer is right now

RT0: Depth (and HW Depth+Stencil)
RT1: View SPace Normal and Specular Power (R10G10B10A2)

We could store depth in 24 bits (as our HW already does), normal encoded in RG (8 bits each), specular in another 8 bits and have 16 bits (8 in the depth texture and 8 in the normal texture) free to store more information, like motion vectors, but I left this for the next level. With that layout, we could have a 1217 x 684 resolution, a 60% improvement over my deferred GBuffer. Another point to note is that our lighting equation is very simple (usually Phong or Blinn-Phong), using only normals and depth information.

Another one is the lighting reconstruction: in the deferred approach, the light equation gives us the final value for that pixel, e.g. , the interaction between the light properties (color, attenuation) and the pixel properties (normal, specular, diffuse). Let me show a simple example: if we have a gray pixel on our diffuse buffer, let me say RGB(0.1,0.1,0.1),  and the resulting light hitting that point is RGB(1,1,1), we will output RGB(0.1, 0.1, 0.1). If we have 10 lights like that affecting the same pixel, in the end we will have RGB(0.1, 0.1, 0.1)*10, a white pixel RGB(1, 1, 1). In my first LPP attempt I used a RGBA32 buffer,  with lighting values ranging from [0..1]. As we only output the lighting (without interaction with diffuse), with only one of the above lights we already reach the maximum supported value, a RGB(1, 1, 1). When we multiply it by the same gray texture, we have a gray output RGB(0.1, 0.1, 0.1) instead of that really saturated one from the deferred approach. What I did was switch to a higher-depth buffer a RGBA64 and put a constant multiplying my lighting output by 0.01f (magic number), so 100 white lighting values are needed before we have the same clamping situation described before. All we need to do is multiply our lighting texture value by 100 (the inverse of 0.01f) when reconstructing the shading.

The second one is that we have some flexibility on the second geometry pass: we can output the ambient contribution plus the lighting, and have environment reflection, emissive materials, rim light and a lot more. We can also have colored specular maps (that would require more channels in the deferred shading), a feature that artists love so much.

After this, we have back the z-buffer for transparent objects: as we had a second geometry pass to reconstruct the shading, we can start rendering the transparent objects as soon as we are done with the opaque ones.

Here is a screenshot of my current LPP implementation, with normal and depth buffers on top (note the specular color map on the lizard):

Source code

Here it is. Use it at your own risk!

I’m providing you a simple renderer, that given a camera, a list of lights and a list of meshes, returns you a texture with the final image. It was extracted from my current little engine, but I removed some features to keep it as simple as possible (like shadows and SSAO), and I’ve documented it as much as I could. It requires XNA 4.0 with VS2010 express, and a HiDef enabled computer. I won’t write lots of source code snippets here, since its available for download.

I’ve divided the solution into three projects: the extended content processor, the LPP renderer and the example itself (which also contains the assets).

Content Processor

The content processor works only with static meshes right now. It expects that the materials contain the following maps:

  • NormalMap: specular power in alpha. It will be multiplied by 100 in the GBuffer, so you have a range from [0..100] for your specular power;
  • DiffuseMap: default color;
  • SpecularMap: the light’s specular color will be multiplied by this map, use it for interesting effects;
  • EmissiveMap: it will be always output on final buffer, without any attenuation or modulation. Useful for LED small lights, e.g.;

I’m also providing a custom effect file, for using with 3D Studio Max. It’s based on the NVidia effect that is shipped with that software, and should be used in a DirectX material. Its located at “LightPrePass/LightPrePassContent/LightPrePassFX.fx”. It contains the same maps described above, so the content processor will match them into our renderer. If any of those maps weren’t found, the default textures will be used. They are located at “LightPrePass/LightPrePassContent/textures/”. If you want to import new models, remember to change the content processor to “LightPrePass Model Processor”, and to set “Premultiply Texture” to false (normal maps have information in the alpha channel not related to blending).

In this stage, a default effect is applied to all meshes: “LightPrePassContent/shaders/LPPMainEffect.fx”. It contains two techniques: one for outputting the depth and normals into the GBuffer, and another one for reconstructing the material’s shading. If you want different effects, just create new shaders that respect this rule.

Light Pre-Pass Renderer

The library contains only five classes:

QuadRenderer.cs

Used for rendering screen-aligned quads, like full-screen effects or billboards.

Camera.cs

Stores the minimum information needed for a rendering setup: world transform, projection transform and viewport. Its decoupled from any kind of controller, like in MVC pattern. It also contains a bounding frustum, built from the camera eye transform (inverse of world transform) multiplied by the projection matrix. It contains a very important method that projects a bounding sphere into a clipping rectangle. This is very useful for rendering our point lights: it give us the smallest screen-aligned quad that fits the light’s bounding sphere, saving a lot of pixel processing on unlit fragments.

Mesh.cs

Encapsulate a default XNA model, plus a world transform and some methods for rendering. It assumes that the effect is the one mentioned in the Content Processor topic (or a variation that follow its rules). We could optimize this class storing the effect’s parameters instead of fetching from the dictionary every frame.

Light.cs

Encapsulate some light’s properties, like radius, color, intensity (useful when you want some of the color channels brighter than 255), world transform and type. Right now we only support Point lights.

Renderer.cs

This is the main class. It’s responsible for all rendering steps, starting with the GBuffer creation up to resolving the final target. Initially, it loads the default effects (one for clearing the GBuffer and one for performing the lighting) and creates the GBuffer.

We have the following render targets, all of them with the Discard Contents flag:

Depth (Single) : stores the pixel’s depth value in linear space. It has a HW z+stencil attached. Its part of the GBuffer;
Normal (R10G10B10A2): stores the pixel’s normal in view-space, using stereographic projection, on the R and G channels. The blue channel stores the specular power. Its part of the GBuffer;
Light accumulation (RGBA64): stores the sum of all lights contribution, with diffuse color in RGB and specular intensity on A;
Output (HDRBlendable): stores the final shaded image. It has a HW z+stencil attached.

For each frame, we have the following steps:

  • Compute the frustum corners: I’m using the technique that uses the frustum corners and the depth buffer to recompute each pixel’s position in view space, so I compute it here as the camera won’t change during this frame;
  • Bind and clear the GBuffer, clear HW z-buffer too: we set our depth texture and HW z to 1.0f, the farthest distance possible;
  • First geometry pass: render all opaque meshes, outputting depth and normals;
  • Resolve the GBuffer, bind the light accumulation render target and clear it to black. I’m using a 16bit/channel render target here, so we have some room for playing with precision and light intensities;
  • For each light in the scene, compute the smallest screen-aligned quad that fits it, recompute the frustum corners for this smaller quad, apply light parameters (color, intensity, radius) and draw it, with additive blending;
  • Resolve the light accumulation render target, bind the output render target and clear it;
  • Second geometry pass: render all opaque meshes again, now using the effect that reconstructs the object shading using the light accumulation render target. As we have the z-buffer filled at this point, we could draw unlit geometry (background, skyboxes) and transparent objects like trails, flares, fire, explosions, etc;
  • Resolve the output render target, return it to the main application.

Example Project

There is a simple project that uses the renderer. The class LightPrePass is our main class, that loads a model (the lizard found in the XNA normal mapping sample) and assign to our Mesh class. It has two helper classes: the first one is a CameraController, that takes care of handling user input and moving the camera around the scene. The second one is the MovingLight, a simple class that constantly update a light’s position. You could toggle the GBuffer output, the light’s movement and the light’s screen-quad using your xbox controller (if it’s plugged on your PC) or your keyboard.

Here we can see the screen-aligned clipping quads in action: observe how many pixels are skipped for each light due to this optimization.

Here we can see the GBuffer: the first one is the normal buffer in view space (it has some weird colors since we are using encoding on RG channels and the specular power on B). The second one is the depth buffer, and the third one is the light accumulation. Due to our trick to avoid clamping, its intensity is not perceived on this window. Note the emissive map in action on the lizard’s spikes.

Conclusion

I’m very satisfied with the results I have. The renderer can handle lots of lights, deal with a good variety of materials, transparent objects and it is still a very simple class. I hope I can see some suggestions and improvements, and some derivative samples using this little example.

Further Improvements

I already have some of the following topics done on my engine (the bold ones), so I can implement them easily if I have enough requests. Anyway, here are some interesting fields to continue from this point on:

  • Convert it into a XBOX360 project ( I don’t have one actually, sorry );
  • Profiling/optimizations;
  • Skinned pipeline;
  • Tile-based light culling;
  • Shadows, SSAO;
  • Different light types;
  • Effects that needs the final buffer + depth: glass, water, heat;
  • Use effect pre-processor to optimize shaders that don’t need specular color/emissive/normal maps, or  to choose between different ambient or shading reconstruction techniques;

See you next time!

Coluna

References

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html

http://mquandt.com/blog/2010/03/light-pre-pass-round-2/

http://forums.create.msdn.com/forums/t/70326.aspx

http://forums.create.msdn.com/forums/t/26870.aspx

http://create.msdn.com/en-US/education/catalog/sample/normal_mapping

http://aras-p.info/texts/CompactNormalStorage.html

http://mynameismjp.wordpress.com/2009/03/10/reconstructing-position-from-depth/

http://msdn.microsoft.com/en-us/library/bb447672%28v=xnagamestudio.10%29.aspx

http://www.catalinzima.com/tutorials/deferred-rendering-in-xna/creating-the-g-buffer/

http://developer.valvesoftware.com/wiki/Half_Lambert

http://www.gamasutra.com/view/feature/2942/the_mechanics_of_robust_stencil_.php?page=6

About these ads

About jcoluna

Game developer and musician
This entry was posted in XNA and tagged , , , , . Bookmark the permalink.

35 Responses to XNA 4.0 Light Pre-Pass

  1. DrJBN says:

    Informative post

  2. Pingback: XNA 4.0 Light Pre-Pass « Sgt. Conker

  3. z0r says:

    Nice description of the light pre-pass. I currently use deferred shading and thinking if I need a pre-pass.
    Can you comment on the speed? How is the rendering speed of rendering geometry twice (in light pre-pass) in stead of rendering it once in deferred shading?
    Further more I would like to see it with shadows :).

  4. Eclectus says:

    Nice :)

    Have you considered saving fill rate on the point lights by drawing spheres (inverted winding order so you always draw the inside of them, that way they don’t disappear when you go inside them) or octogonal sprites? The area of a circle is about 79% of the area of a square whose breadth is equal to the circles diameter, meaning an 21% saving in fill rate for the lights. :)

    • jcoluna says:

      Yes, you are right! The only problem is that reconstructing the depth will be a bit more expensive using a sphere mesh, but I’ll give a shot in the optimization step. I’ve considered the octagonal sprites, but I just chose the quads for simplicity :p
      Tks!

      • Eclectus says:

        No problem, I think that quads were the right first choice to use initially, its what I would have done too, to avoid premature optimization, and have a ‘tracer bullet’. :)

  5. Pingback: TECH :: XNA LIGHT PRE-PASS « Game Developers

  6. Pingback: XNA 4.0 Light Pre-Pass « Sgt. Conker

  7. Great post :)

    I’ve been dreading updating my XNA 2.0 LPP renderer but I’ve got no excuse now, thanks for an excellent resource!

  8. Peter says:

    Good work. Thanks for share your code. I’m really interested to see speed compare result to deferred renderer. I manage to write my own from a lot of other components. I had CPU GeoClipmaps Terrain, CSSM, DOF, post process water + lightshafts everything with deferred renderer but on Xbox speed is very slow. On my laptop with ATI 4670 performance is quite good.

  9. xna says:

    Thanks for share your code. Could you agree if i will translate this post to russian language?

  10. Pingback: Windows Client Developer Roundup 056 for 1/24/2011 - Pete Brown's 10rem.net

  11. Evan says:

    Nice write-up.

    I still don’t think this (like regular deferred shading) is a great technique for XNA/XBox 360 though because of the loss of depth information when you unbind render targets. You’re drawing screen-space quads for each light which for large point lights are probably effectively full-screen quads, and doing the light accumulation calculation for pixels that may not even affected by that light in the scene. If you still had the depth buffer at this point, you could draw 3D bounding geometry for the lights (as Eclectus mentioned) and use the front-faces/back-faces trick with the stencil buffer to only “shade” pixels in the scene that were actually touched by that light.

    What would be an interesting comparison would be to compare the results of your current implementation with some large lights in the scene, to one where you recreate the depth buffer (the best option we have for XNA unfortunately) before the light accumulation pass and then do 3D bounding volumes + stencil masking. I think the most efficient way to recreate the depth buffer, rather than using clip(), would be to bind your original depth texture and draw a full screen quad with color-write disabled and output to the DEPTH semantic.

    • jcoluna says:

      I’m doing the sphere-mesh technique right now, I will post the results as soon as I get them. Do you know a good technique for measuring elapsed drawing time in XNA? I tried to profile with the XNA performance measuring sample (http://create.msdn.com/en-US/education/catalog/sample/performance_sample), but it doesn’t seem to work very well when it comes to GPU time. See ya!

      • Evan says:

        Even by drawing 3D meshes for the lights, without the depth buffer you lose the ability to optimize for situations like when the light is behind a wall and doesn’t actually affect anything on screen. Shawn’s presentation here: http://www.talula.demon.co.uk/DeferredShading.pdf covers a lot of these tricks for optimizing convex light hulls.

        I’d be curious to know if the benefits from these types of optimization would be worth the cost of manually recreating the depth buffer from your original depth texture.

  12. Pingback: Light Pre-Pass Rendering in XNA « gpubound

  13. Zoki says:

    Awesome! Thank you.

  14. Kdoto says:

    I didn’t think that those texture formats were available on the Xbox 360, perhaps that changed with XNA 4. Are you able to run this sample on the xbox?

  15. System says:

    How could I load my own 3D model? Could you write a little tutorial? I have a little problem when I am loading my model, it actually doesn’t appear.

    • jcoluna says:

      Try to load the existing FBX on your 3D DCC program (blender, 3DMax, whatever), just for comparing the scales. Sometimes the object is exported on weird sizes/positions. Check also if the shader you are using is the one I’ve provided with the code. I can do a quick tutorial (after June 14th), but it should be as easy as it is on the sample
      -J.Coluna

      • System says:

        Thanks for reply, problem solved. I had some problems with the shader, now it works fine. This is really the best example code of light I’ve ever seen.

  16. troll4eg says:

    I need to draw models without light with the light scene. Assuming skinned model without any shaders + light scene. Does it possible?

  17. jcoluna says:

    Yes, it’s possible. You have to assign a different shader to the model, and draw it after the “Reconstruct light” stage. This model probably doesn’t need to output normals/depth in the first stage, so you can skip it. I will post a new entry about transparency, and also custom shaders.

    See ya!
    J.Coluna

  18. Pingback: Light pre pass en xna « Aprendiendo XNA

  19. jaimo says:

    Hi. Thanks for the excellent job.
    I just wanted to ask how at this point do I render tranparent objects. Does it depend on the alpha channel of the textures?

    • jaimo says:

      And can I add costom effects (from effect files) to the objects in my scene just the same way I did with forward rendering?
      Sorry for disturbing, but I’m new to programming.

      Thanks again.

  20. Pingback: New Rendering Engine | codingnick

  21. Pingback: Dev Diary #1 « Nick the Coder

  22. Pingback: Simple Shadow Mapping « Nick the Coder

  23. Hello, I think your website might be having browser
    compatibility issues. When I look at your blog site in Safari,
    it looks fine but when opening in Internet Explorer, it has some overlapping.
    I just wanted to give you a quick heads up! Other then that, very good blog!

  24. Joel Utting says:

    I am trying to get LPP working on dynamically created models like the ones in creators club priitives example, but the dynamic model is not lighting correctly.

    this is displaying the tree from your tutorial (works, I upped the light intensity) and behind it my dynamically created cube. I’ve tried everything I can think of – would there be some trick to doing this?

    • jcoluna says:

      Sorry my friend, I didn’t get any notification about your question when you posted it.
      Maybe the tangents are missing in your models, or even the normals (or both). Hope it helps!
      -J.Coluna

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s