Parallel-Split Shadow Maps on Programmable GPUs

(To appear in GPU Gems 3)

   
   

Fan Zhang1        Hanqiu Sun1           

Oskari Nyman2

1. Dept. of Computer Science & Engineering           

2. Dept. of Computer Science & Engineering

The Chinese University of Hong Kong                      

Helsinki University Of Technology

Email: {fzhang, hanqiu}@cse.cuhk.edu.hk             

Email: oskari.nyman@hut.fi

 

 

Disclaimer: Due to the copyright issue, we didn't disclose our GPU Gems 3 paper (including the accompanying source codes) yet.  Please check this page later for updates. This page is maintained by Fan Zhang. If you have any question or suggestion, please contact me at fzhang@cse.cuhk.edu.hk. This page contains a few HTML background images (just for good-looking:) downloaded from NVidia.com. If you find any information conflicting with your copyright, please let me know at your convenience.
 
Personal Advertisement:)

    I'm expecting to graduate September-2007, and currently looking for a graphics-related job or post-doc position. Any helpful information and suggestion will be appreciated. For your review, you may refer to the following materials, 
  
Curriculum Vitae  .pdf version   .doc version
   Personal Profile  weblink to my profile at Monster.com
   Research Statement (upon request)
  
3D Arts in my publications (upon request)

If you're interested with my background, please feel free to send me email at fzhang@cse.cuhk.edu.hk. I would like to provide all supplementary materials for your evaluation.

 

 

PSSM(3; 1Kx1K)   v.s. SSM(2Kx2K)
Overview
Shadow Qualities
Shadow Maps

Dawnspire: Prelude.
PSSM(3; 1Kx1K) uses 3 shadow maps with the size of 1Kx1K, instead of the single one shadow map with the size of 2Kx2K in standard shadow maps.

 

 

Abstract

Shadow mapping is well known for its generality and efficiency, thus it has been extensively employed for real-time shadow rendering in diverse applications. However, it suffers from an inherent aliasing problem due to its image-based nature. In this paper, we present the implementation details of Parallel-Split Shadow Maps (PSSMs) on programmable GPUs. PSSMs split the view frustum into parts using planes parallel to the view plane, and then generate a shadow map for each part. A fast and robust splitting strategy based on the analysis of shadow map aliasing is proposed, which results in a moderate aliasing distribution over the whole depth range. By applying a geometry approximation to each of the split parts instead of the entire scene, tighter bounding shapes of visible objects enhance the utilization of the shadow map resolution. Hardware accelerated processing on DirectX-9 level GPUs is developed to eliminate extra rendering passes which surpass that of standard shadow mapping when synthesizing scene-shadows. Fully GPU-based implementation on DirectX-10 level hardware is proposed as well, in which only a single rendering pass is required for both generating PSSMs and rendering the scene-shadows.

 

Download  (continuously updated)

 

Related Papers

1. "Parallel-split shadow maps for large-scale virtual environments". Project Page. In ACM VRCIA'06.

2."Hardware-accelerated parallel-split shadow maps". To appear in International Journal of Image and Graphics. (coming soon...)

3. "Parallel-Split Shadow Maps on Programmable GPUs". To appear in GPU Gems 3. (will be posted later due to the copyright issue)

Highlights: let me ask you a question: for the split scheme PSSM(m) in which the view frustum is split into m parts, how many rendering passes do u need to produce shadow maps and scene-shadows? Surely, the answer is m+m without hardware-acceleration. Using pixel shader on DX9-level hardware, the number of rendering passes is m+1 as explained in our VRCIA paper. However, on DX10-level hardware, we reduce ALL extra rendering passes when using PSSMs! the number of rendering passes is 1+1! Although I can't explain it in detail right now (due to the copyright issue), this exciting result definitely will motivate some of you to study novel methods to achieve this goal. To really experience the power of PSSM, try to get into DX10 new features as soon as possible:)

     

 

Related Implementations

1. DirectX9 implementation, documents, developed by Oskari Nyman. (***Strongly recommended!***)

Highlights: this demo might be the most popular PSSM implementation on the internet so far.
Requirements: Visual C++ Express 2005 with Microsoft Platform SDK + DirectX 9 SDK

 

2. OpenGL Implementation documents, developed by Jeroen Put.

 

3. XNA Implementation, refer to here for more details. (Many thanks to the author and all copyrights are reserved by the original author.)

 

4. OpenSceneGraph-based implementation (weblink)

             Terry Welsh's implementation documents, extra documents
             Adrian Egli's implementation, documents 

 

5. DirectX10 Implementation (will be posted later...)

Highlights: this implementation is the accompanying demo to our GPU Gems 3 paper, which implements three methods, 1) without hardwar-accel. 2) with DX9-level hardwar-accel. and 3) with DX10-level hardware-accel. in both DirectX and OpenGL.
Requirement: Microsoft Vista OS + Visual Studio 2005.

      

screenshots from our DX10 implementation

 

6. See how PSSM+VSM (Variance Shadow Maps) produces fantastic soft shadows, refer to here for more details.

     Download Demo, developed by Andrew Lauritzen. (***Strongly recommended!***)

Requirement: Windows Vista (for D3D10) + A D3D10 capable video card + DirectX Redist April 2007 +
Visual C++ 2005 Redistributable Package.

    

7. The demo from the Engine GODZ developed by Richard Osborne. 

     Download Demo        Documents

Highlights: This demo shows the implementation when using a 3rd-person camera. A detailed tutorial for such kind of implementation and other optimization techniques will be posted soon!
Notes
from the author: This version requires DirectX 9.0c. Requires a card that can run shader 2.0+. This demo has collision detection & response, parallel split shadow mapping (uses 3 1024x1024 shadowmaps) + PCF 16 sampling for Soft Shadows, and uses a multithreaded renderer. This is a rough alpha, still not optimized for best performance. All shadowmap parameters are exposed to Default.lua config file. 
Notes from Me: Some of you may see the error message "This application has failed to start because d3dx9d_32.dll was not found. Re-installing the application may fix the problem." Please download the d3dx9d_32.dll HERE and put it into "c:\windows\system32".

    

8. The ORGE 3D based PSSM implementation.

     Weblink to the implementation details, developed by Rvkennedy.

Note: No source codes yet.

    

 

 

Related Demos & Images

1. Video (low-resolution .wmv video ~24M) from the our GPU Gems 3 paper.

     errata: "PSSM(2; 512x512)" in the video should be corrected to "PSSM(3;512x512)".

2. Video (~30M) from the PSSM+VSM demo.

3. Video (~10M) from Hammer Engine.

4. Video (~5M) from Phoenix Engine.

5. Check the "VPSSM Shadows" on the page at here.

6. Video from Dawnspire: Prelude (http://www.dawnspire.com).

     

More images included in our GPU Gems 3 paper will be posted later...

 

Games/Projects/Engines using PSSMs  (please contact me if you want to be the next one!)

Dawnspire: Prelude
courtesy of Silent Grove Studios.
The most popular DX9 PSSM demo
courtesy of Oskari Nyman
OpenGL PSSM demo
courtesy of Jeroen Put
VSM+PSSM on DX10
courtesy of Andrew Lauritzen.
Hammer Engine
courtesy of Sepehr Taghdissian
GODZ Engine
courtesy of Richard Osborne
Blade3D PSSM Shadows
courtesy of Blade3D.
Killzone 2
a talk on the PSSMs implementation in Killzone 2, presented by Michal Valient
Phoenix Engine

 

Further Research & Miscs

 

Question: What are the differences between PSSM and CSM (Cascaded Shadow Maps)?

Answer: Everything doesn't come from nowhere. PSSMs are not an exception as well. The idea of using multiple shadow maps was introduced in Tadamura et al. ("Rendering optimal solar shadows with plural sunlight depth buffers") 2001 and further studied in Lloyd et al. ( "Warping and Partitioning for Low Error Shadow Maps ") 2006, and it was also implemented as cascaded shadow mapping in Futuremark's benchmark application 3DMark 2006.
        
PSSMs better handle the following two major problems in all these algorithms:
        
         a) How to determine the split positions?

         For this issue, we proposed the practical split scheme to achieve a better tradeoff between theory and practice. See our paper for more information. Surely, the split positions also can be sometimes pre-computed or manually adjusted. Let me share you my personal experience of studying an "practical" split scheme during the past. When I first saw the nice paper "
light space perspective shadow maps (ESGR'04)", I thought the logarithmic split scheme might be the best choice. This split scheme has been implemented later in "Warping and Partitioning for Low Error Shadow Maps  (ESGR'06)" and  "logarithmic shadow maps (Sketch paper in SIGGRAPH'06)". However, as explained in our paper, it's not very good to EXACTLY simulate this split scheme on discrete buffers to produce the theoretically even distribution of perspective aliasing over the whole depth range. The logarithmic split scheme usually results in an "over-strong" split effect in practice. For example, for n=1 and f=1000 in PSSM(3), the first split part only occupies the first 1% of the depth range! I know this conclusion might be a little bit "subjective", I strongly recommend you to try all the three splits schemes (uniform, logarithmic, practical) to get your own conclusion. In general, the practical split scheme is more flexible for most cases.
        
         b) How to alleviate the performance drop caused by multiple rendering passes?
         In our Gems 3 paper, we thoroughly  discussed this issue. For the split scheme PSSM(m) (the frustum is split into m parts), the number of rendering passes for 1) without hardware-acceleration, 2) with DX9-level HW-accel. and 3) with DX10-level HW-accel. are 2m, m+1 and 1+1 respectively. In particular, in comparison with the standard shadow mapping approach, we reduce all extra rendering passes in our DX10 implementation.  For more details, see the upcoming book GPU Gems 3 and the accompanying source codes.
         A common misunderstanding on this issue: some people previously may think PSSM(m) "always" need m+m rendering passes which is probably caused by the most popular PSSM implementation provided by Oskari Nyman (sometimes the best is not always good ). This is however wrong! Actually in our ACM VRCIA'06 paper, we first implemented the accelerated PSSM rendering using pixel shader. The number of rendering passes reduce to m+1. 

         Furthermore, I list other few differences here. This part will be continuously updated due to the progress of our research. First of all, the construction of the light frustum for each split is not that trivial.  We use a consistent way to do that for both spot and directional lights. Everyone knows the basic idea, but the implementation of this step should be careful and optimized. Second, we proposed the "geometry approximation" (it's called "scene-dependent projection" in GPU Gems 3) to optimize the usage of each shadow map. 

Question: In which directions, we can further improve PSSMs?

Answer: PSSM+VSM (variance shadow maps) and PSSM/CSM+LiSPSM (light space perspective shadow maps) might be better ways to further improve PSSMs. A good example for PSSM/CSM+LiSPSM is the professional game Lost Planet, refer to the discussion here for more information (just use Ctrl+F to find "Fan Zhang" on the page). For your convenience, I post the PSSM/CSM related part  in the discussion below,
         
              

         
Furthermore, since the shadow-map alignment depends on the light-view configuration, the shadow boundaries might "flicker" as the viewer moves. See a movie here (.avi) to show this problem. For this problem, when using a shadow map size over 1024 and the number of splits is greater than 3, this phenomena can be nearly eliminated with PCF/VSM. Another possible solution is used in Terry Welsh's OpenSceneGraph-based implementation, which moves the light projection in x and y coordinates in a texel size, so you are always having the same aliasing pattern. However, when you change the size of the projection (i.e. the min and max vectors in the crop matrix in Oskari Nyman's DirectX9 implementation), this problem still happens. Anyway, as I mentioned, this issue is not noticeable at all if we use PSSM+PCF/VSM.
         
The last issue might the performance. I've to say, as the tremendous advances in modern GPUs,  even without hardware-acceleration, PSSMs could achieve a high fps in practical applications. In our Gems 3 paper/implementation, we proposed the acceleration methods on DX9-level and DX10-level GPUs. Please check this page back later. In summary, personally, I don't think the performance is a problem for PSSMs on modern GPUs.

 

Acknowlegements   

All screenshots are from Dawnspire: Prelude (http://www.dawnspire.com) courtesy of Silent Grove Studios®. Thanks to Anders Hammervald (anders@hammervald.com) for his sincere help during preparing all images. Many thanks to the volunteers who implement and further research our PSSMs algorithm.

 

Last update: 04-May-2007
Copyright 2003-2007 Fan Zhang. All rights reserved.