Der Schmale – Real-time 3D programming

Putting my Helix 3D engine code online (JS/WebGL)

David — Thu, 20 Jul 2017 19:29:52 +0000

It’s been a while! Over the past couple of years, I’ve been working off and on (and more off than on) on a playground 3D engine called Helix. Starting as a WebGL port of a personal C++ engine I did and after rewriting it umpteen times, it ended up being a platform I do like to play around with. Also, the term “for shits and giggles” comes to mind.

At this point, it comes with a long list of disclaimers. It’s not optimised very much, the API is probably a bit different from what you’d expect, and most importantly: I made it for me, not you ;) I have no interest or motivation to compete with existing 3D engines and offer support for things that I’m not personally working on myself.

But in the spirit of sharing, I decided to make the code public. This way, I can more easily put some shader experiments online and as such it may serve an educational purpose. (Much like my Flash-based engine “Wick3d” from 10 years ago ;) )

Anyway, code and some documentation is here:

Github (code + wiki)
Class reference

And some examples as coded by a coder (not all are optimised):

Some for desktop only (using WASD + mouse interaction), not optimised at all!

And finally, a tongue-in-cheek nod to my partners in crime Frank Reitberger and Nicolas Barradeau!

Amazing Horse!

But more about those guys at some other time soon :)

Project: WebGL Porsche 911 Showcase

David — Sun, 18 Oct 2015 14:52:41 +0000

I don’t really get to post much about actual projects for a couple of reasons. My work is usually behind the scenes graphics coding, which typically result in posts about the techniques rather than the projects themselves. In my last project, a showcase project for the new Porsche 911 with the German agency UDG, I was the user of a 3D engine for a change. Focusing on how things look rather than how things work was a nice change of pace. Furthermore, I was lucky to work together with two close friends: Frank Reitberger, taking the reins of our sub-team and catching the inter-team blows, and Simo Santavirta who worked on a lot of the playful background stuff, animations, and so on. Maybe they’ll put blog posts online about their parts, but I’ll just focus on my contributions here.

First of all: check out the project here!

My tasks (the ones I want to talk about anyway, no one cares about 100 iterations of model imports and texture compression) were mainly shader and engine-oriented: materials, reflections, etc. The engine in question is Mr. Doob’s ever popular Three.js. In what follows, I’ll explain some of the things I did in the projects in words and concepts, not code. If anyone wants to know more about some aspect or other, just let me know.

Project overview

The project itself is a 5-chapter showcase for Porsche’s latest 911 models, showing off some facets they seem to be pretty proud of: design, perfomance (showing off the engine), driving (showing off the wheels/axles), some weird things the headlights do when turning, all that jazz! Parts of the site also had to run on newer mobile devices.

There were a couple of immediate challenges and we had to give the 3D modellers a really hard time to get poly & draw call count down as much as possible, as well as the amount and sizes of the textures.

(Oh, and I’ll admit it, I know nothing about cars. I don’t even have a driver’s license, nor do I want one. So yeah, most communication happened as “that springy thingy” or “that punchy thing inside the engine”. Since UDG is a German company, I did pick up on some great vocabulary. The winner? “Auspuff”, meaning “exhaust pipe” :’) Anyway… moving on.)

Custom work is more fun

Most of the work we had to do, even if you can’t tell by looking at it, required a degree of custom work. We hacked the three.js codebase in places in order to splice in our changes (I can’t say I generally like being limited to out of the box stuff, and neither should you). The materials were all custom-built so we had full control over lighting models, which type of lights to use depending on the material, baked maps, custom reflections, and weird animation code.

Lens flares not cut off by geometry

To give an example of the less obvious: small lights, lens flares or highlighted car parts are made by quads that always point towards the camera. When done manually with default code, these quads would intersect with the car’s geometry and not be visible (unlike an actual flare which scatters inside the lens). So the quad should be in front of regular geometry, but still fade out depending how much geometry occludes the light itself. Rather than doing expensive occlusion tests like a default lens flare (there is code for that in the examples repository), we managed to make these things work by changing the vertex shader’s depth value and some algebra. It’s not perfect, some cut-off still occurs, but it works well enough given some patience to tweak the numbers since it’s not a complex occlusion situation (and it’s much more performant).

Similar tricks were used to get some of the transition animations to work: changing wheels in the showroom required some depth buffer trickery to make them morph into eachother nicely.

Materials

Most of the material shaders were built keeping physical plausibility in mind. Given the limitations of WebGL and not being able to use some extensions, we couldn’t go all the way with this. No floating point textures, so no HDR to work with, we solved some things by for example simply scaling environment map values. All of the materials do have fresnel-based BRDFs with normalized distribution functions (we mostly avoided geometric self-shadowing or foreshortening terms for performance reasons). Expecting limited overdraw, we used Three’s forward renderer which gave us a lot of flexibility to tweak lighting models and materials as required for the surfaces. The scene was relatively static, so all shadows are just baked light- and ambient occlusion maps.

All materials except for the very rough ones (where it would be a nearly invisible waste of resources) use an environment map. We couldn’t rely on the EXT_shader_texture_lod extension so a mip-chain to handle different roughnesses was out of the question. Instead, we settled for 3 separate environment maps. The largest one for very smooth surfaces was one that’s updated at real time to represent the actual environment. The two others, for different degrees of roughness, were baked convoluted cube maps. These were generated using Knald’s Lys, a tool I’ve grown very fond of. When required, the environment map was assigned a size and position in the shader. That way, we could calculate where the reflection ray intersects the reflection cube, resulting in much more locally correct reflections, which is especially important for the many flat surfaces we were dealing with.

Slight duo-tone effect (yellow/red) for colour depth.

The car paint has a GGX Trowbridge-Reitz specular distribution model to get nicer highlight tails that allow for a better soft metallic look. Normals are perturbed both with a normal map and a fleck texture to get some subtle metallic flecks in there. I had hoped to be able to spend more time on the actual metallic clear-coat shader, but instead I had to adapt what we already had to match a series of Photoshopped screenshots (I had forgotten this is how 2D-oriented people like to work ;) ). The diffuse paint model supports a fresnel-based multi-layered “douchebag” paint effect, but that actually turned out to be little used except to add some depth in the paint: there’s no actual douchebag paints in the showcase. What a pity! With some tweaking and subtle use, however, it sometimes even gives a slight impression of subsurface scattering, which is always a nice extra with car paint.

Other “solid” materials just use the normalized Blinn-Phong model with regular Lambertian diffuse scattering. The metallic materials of course just use specular reflections: at least an environment map and optionally including the scene lights. In this case, the albedo colour is used as the normal incident specular reflection colour. In the picture on the right, some are black metal (kvlt!), some are more regularly coloured, but all are metal. Apart from this, there’s also optional self-occlusion maps that can be used to darken some of the reflections in niches.

A bunch of metallic materials with different configurations.

Not quite metal, not quite plastic.

I’ve been told the car rims aren’t actually metal, but they do seem to exhibit some definite metallic reflections. To get them to look convincing – but not quite chrome-like – we used a hybrid model. Basically, it’s a somewhat regular Blinn-Phong model with normal incidence reflections boosted, while reducing diffuse reflections based on the specular boost. Not very different from changing the “metallicness” value in something like Unreal.

Glass materials are mostly just environment maps using the Fresnel factor as alpha with normal alpha-blending. In the case of the car windows, there’s a layer that uses multiplicative blending to darken what’s behind it before applying the environment map in a second pass. It’s considerably more realistic than doing everything in one pass with default blending.

Most of the “special effect” materials such as the highlights are simply a flat colour with fresnel-based fall-off (think rim-lighting), and additive blending. There’s also some depth offset being applied to allow overdraw of near pixels while still preserving most occlusions.

Floor reflections

Reflections getting softer away from the floor

One of the most striking aspects of the original mood boards were the reflections of the car and the environment on the floor. Somewhat soft reflections as in real life: perfect reflections where the objects touch but getting blurrier the further away it is from the surface. Obviously we wanted to replicate this in the project as well. There is code out there to do planar reflections in three.js, but those result in perfect mirror-like reflections. To get what we wanted, we built our own reflection renderer, much like what I did for Away3D back in the day (see this) with some optimizations/omissions: our reflecting plane was always aligned with the XZ plane going through the origin without the camera ever crossing it. In other words: mirror the camera vertically and render the scene to a texture. To get the distance-based soft reflections working, we had to have all the materials output the fragment’s world space Y coordinate to the alpha channel.

An object close to the surface should not contribute to far object’s blur radius

Using the alpha value, we could calculate an approximate distance of the reflected point to the floor, which in turn could be used in the blurring stage. That blurring worked very much like a depth-aware blur. First, the central point is sampled to figure out how far that is from the floor. This distance is used to calculate the basic blur radius. Every point that’s then sampled within the blur radius has a weight calculates based on its own distance, so we can calculated a weighted average at the end. If we wouldn’t do this, objects close to the surface would be included in the blur of an object further away, which should not always be the case.

The final blurred texture is then used when rendering the floor itself, as with normal planar reflections, using the floor normals to perturb the sampled point a bit.

Last words

I’m not sure in how far a write-up like this is useful, as – again – it’s not something I get to do all that often. But at least I can show an actual project!

In the past year or so, I’ve gotten pretty comfortable with WebGL and while it always feels like a step back from all the amazing things you can do with desktop tech (what?! no compute shaders?!), the fun is also in the limitations themselves: finding cheap solutions or approximations with what you’ve got. I’ll never be a fan of Javascript – or even Typescript – tho ;)

Speaking at Reasons to be Creative 2015

David — Thu, 16 Apr 2015 10:52:50 +0000

Hey there!

A quick update to plug the fact that I’ll be speaking again at Reasons to be Creative in Brighton, September 7 to 9! As it’s one of my all-time favourite events, I’m stoked to be representing the real-time 3D graphics programming crowd (while being incredibly humbled by the other names on the bill)!

What can you expect from my talk? I haven’t settled on a topic for a full 100% yet, but if you follow my blog or have seen some of my previous talks, you should have an idea. Obviously, you can expect real-time 3D in some form or another. I’m playing with some ideas, but if anyone who’s coming has a specific request, I can try and incorporate it into the talk. If not, it’ll always be a good subject to talk about over a couple beers at night! :)

In any case, I’ll keep this post updated once I’ve submitted my session description, so check back later!

Hope to see you there!

Upcoming Talks: Crib Game Days & FITC Amsterdam

David — Wed, 17 Dec 2014 13:40:46 +0000

Hi everyone!

A quick update to plug a couple of events that have invited yours truly for a talk. Yes I know, it’s been about two years since my last real talk. I haven’t even been able to visit any conferences this year. I’m showing severe withdrawal symptoms and I’m ecstatic to be able to spend a couple of days in such fine company again. So check out:

Crib Game Days, January 23: Genk, Belgium
FITC Amsterdam, February 23/24: Amsterdam, The Netherlands

While different in size, both events have a great line-up, so be sure not to miss out on them! :)

The talk

The title of my talk will be “A Peek at the Future of 3D on the Web“. It will draw from having been quite intimate with its past and my experience with WebGL and its extensions. A while ago, I took it upon myself to take my playground DirectX engine Helix and create a JavaScript/WebGL version. Since the original Helix is DX11-based and relies on quite a few ‘modern things’, I wasn’t too worried about cross-platform functionality. If it ran well enough on both my desktop and laptop – both relatively capable machines – it’s all good. I might fix everything up and make it work “everywhere” if I have the time at some point, but that’s beside the point right now :) The Helix port will function as a sort of leitmotif throughout the presentation.

So obviously, the talk won’t be about how to build a WebGL game that runs reliably on all platforms; quite the contrary. Of course, there will be segments that show how to improve your rendering today, which I’ve found oddly lacking in existing projects/engines. I wouldn’t want you to go home without being able to apply anything directly, now would I? However, the main focus will be on what the future brings: either currently available through extensions (and hence might only work on a select amount of devices) or what’s being proposed but currently only exists in f.e. OpenGL/DirectX.

Apart from all that…

I’ve been taking quite a long time off from payed work to focus on learning new things: brushing up on my JavaScript/WebGL, checking out Outracks’ UNO & Fuse Tools (looking promising, I might be showing some of this during the talk as well), some Unity (always a pleasure), and a brief foray into Python territory. Yes, watch me embellish my LinkedIn profile so headhunters can try and hire me for completely unrelated things! I’ve even started working on music again (gasp!), if you’re into post-rock/metal-ish sort of things ;)

Not sure if it counts as a sabbatical, but getting back to actual work (I gotsta get paid!) with new knowledge under my belt definitely feels gratifying. And hopefully, it will also lead to more blog posts in which I can actually show running examples ;)

Signing out!

– D

Unprojections Explained

David — Sun, 28 Sep 2014 16:32:10 +0000

Recently, one of the responses to the Reconstruction Positions […] post dealt with the unprojection of frustum corners. More specifically: with the inverted projection matrix and the final division with the coordinate. Being the lazy sod that I am on Sundays, I thought I’d quickly google it and paste a link with the explanation. Only one problem: I couldn’t find any decent articles! At least not within a reasonable amount of time, that is. I’m sure they’re out there somewhere ;) Most people asking “How do I unproject?” or “How can I get view space positions from a screen/mouse position?”* were told to check out existing open source code and copy it. That would indeed solve the issue at hand, but if you’re anything like me, you don’t like using code you don’t truly understand. So here’s my attempt to explain (Also… Mathematical rigour? What’s that? :) ).

*This question is also addressed in the earlier mentioned posts, but they’re geared toward shader-based post-processing, and it skimps over the unprojection part.

Homogeneous coordinates

In order to understand “un”-projections, it would help to know how projections work in the first place. I’ll probably be a bit too verbose in this part, but I reckon it’s good to have a proper intuitive grasp on it.

When working in regular 3D space, we tend to use 4D coordinates to differentiate between vectors and points (by setting the fourth – – coordinate to 0 or 1, respectively). This lets us use 4D matrices to perform affine transformations (for example: rotations, scale, and translations and combinations) with a single matrix, without having translations affect vectors (since == 0, the translation component will be nullified). If you’re not rolling your eyes at this point because I’m stating the obvious, you should grab any book on 3D programming math and revise :)

Anyway, these 4D coordinates are called homogeneous coordinates. Before projecting, the homogeneous aspect doesn’t really matter because the coordinate is hardly used. But, since we’re operating in 4D, we can do things not possible with a simple matrix in 3D, including projections. Projections are extensively covered all over the place, but let’s revisit homogeneous coordinates and how they’re relevant for this article.

More generally, homogeneous coordinates can be seen as an “extension” to regular triplets by adding said coordinate, moving towards 4D. They map back to good old 3D as follows:

This is a projection from 4D to 3D. This is in fact also used to “rearrange” the z coordinate for perspective projections to get the divide-by-z, but you can find that explained in any proper 3D book as well. You’ll see that any scalar multiple of homogeneous coordinates will project to the same 3D point. For example, a point

So, we see that a homogeneous point and represent the same 3D point, and we call scalar multiples of homogeneous points equivalent:

We’re used to work with the subset where . I’m not sure if there’s an actual name for this set, but let’s call them the principal representation of the point to make things easier to explain (that’s right, I’m coining things here!). This is almost always the representation we want in the end.

A final note about when . These points are called ideal points, and have some practical applications which we don’t need to concern ourselves about here. Multiplied with a scalar, an ideal point remains an ideal point. Furthermore, they are projected at infinity (division by 0). They don’t correspond to proper 3D points, which is at the base of why we can use them to represent vectors. But since we’re just dealing with points from now on, let’s let it rest at that :)

Check your 3D math books chapter again on (perspective) projection, and you should have a better idea of how the homogeneous coordinates function theoretically beyond “divide by for perspective foreshortening”. In any case, the important part here is this: scalar multiples of homogeneous coordinates represent the same 3D point.

Unprojecting

Your usual every day projection happens as follows:

Provide a point in view space (principal representation, ).
Multiply with the projection matrix: this yields a homogeneous coordinate with non-principal representation.
Divide by to get the projected point in principal representation (the GPU does this for you for the vertex shader’s position output). This yields normalized device coordinates (NDC).

So when “unprojecting”, we want to figure out when we know *. Simple solving, right?

* You may not know the full NDC coordinates and only window coordinates, but that’s okay, see below.

But wait, you’d need to know to calculate ! Mission impossible, because that’s obviously part of what we’re trying to figure out! But remember, we’re dealing with homogeneous coordinates here, so we can use the equivalence property. is a simple scalar, which means and are equivalent; they represent the same point! The matrix transformation does not affect equivalence, which means:

is a homogeneous coordinate equivalent to . The last thing to do is map that back to the principal representation and we have the correct result:

To recap, unprojection happens as follows:

Provide an NDC coordinate.
Multiply with the inverse projection matrix, yielding a homogeneous coordinate equivalent to the view position.
Divide by to get the principal representation of the view position.

This should at least explain what’s going on in the position reconstruction post. The coordinates unprojected there are the NDC coordinates corresponding to the frustum corners.

What about screen positions?

If all you have are coordinates on the screen such as a mouse position, there’s some info lacking, huh? NDC coordinates are 3D so we’re obviously missing a component. But first things first, let’s give you the NDC and components. They’re obtained by a simple remapping to a [-1, 1] range:

But is an unknown. This shouldn’t be surprising, as a whole ray of points in space project to that same point on the screen. You’re essentially free to pick your own coordinate and something along that ray will come up. A value of 1 represents the intersection of the ray with the camera’s far plane. A value of 0 (DirectX) or -1 (OpenGL) represents one on the near field. You can use either to get an unprojected position and together with the camera position in the same space, this can be used to construct a ray to perform ray intersection tests in your scene.

I hope this helped if you’re struggling to figure out this stuff. Until next time!

“Post-filtered” Soft Variance Shadow Mapping for Varying Penumbra Sizes

David — Thu, 24 Jul 2014 17:52:10 +0000

Okay, I’ll state this up front: I’m probably not going to use this approach in my own engine because of many issues inherent with Variance Shadow Mapping. However, I think I did end up with some interesting results to play with, so if VSM with fixed penumbra sizes (or just for filtering) is working well for you, the article may still be useful anyway.
Further worth noting is that most soft shadow articles discuss point lights, so I’ve done things with directional lights.

Introduction

Penumbra widening with distance

If you need an introduction to Variance Shadow Maps, I recommend Andrew Lauritzen’s classic article in GPU Gems3: Summed-Area Variance Shadow Maps. It also contains a technique for very nice precise soft shadows. So, yeah, VSM uses probability theory to estimate whether or not a point is in shadow. Groovy! Unlike standard shadow mapping, this allows for texture filtering the same way regular texture sampling does (bilinear/anisotropic sampling, mip-mapping, etc), and you can use anti-aliasing while rendering the shadow maps. What’s more, the shadow maps can be pre-filtered with a separable blur. This way we can eliminate jaggies using a small filter kernel, or create very soft shadows (with a fixed penumbra size) using a larger kernel.
Real shadows, however, do not have a fixed size penumbra size; they get “softer” further away from the occluding object. No matter the size of the penumbra, we will need to filter the shadow map to get a blurred version of the original. The two general approaches are:

Using mip-maps: sample the mip-levels since they already contain further filtered data. By default, this starts looking very boxy with larger penumbrae. To alleviate this, you’d have to generate the mip levels with more expensive filtering (more or less like blurring the mip-maps as you’re generating them).
Using Summed-Area Tables as in the GPU Gems article, resulting in very high-quality results. You can generate SATs like this: “Hensley [2005]: Fast Summed-Area Table Generation and its Applications”. Armed with this, any convolution using a SAT just takes 4 texture samples.

Either way, you end up pre-filtering the shadow maps, the cost of which is dependent on the amount and resolution of the shadow maps (cubic shadow maps, different cascades for directional lights, etc…); not something I wanted to spend too much frame time on. So instead of pre-filtering, I wanted to try and combine it with “post-filtering” in the lighting shader in a way similar to PCSS but without the crazy amount of samples. However, a standard mip-map chain still needs to be generated.

Overview

Calculating soft shadows with shadow maps is not exactly physically correct, but it does result in a visually pleasing approximation: penumbrae get wider with distance. Conceptually, we’ll be using the same method described in NVidia’s Percentage-Closer Soft Shadows paper (so review it ;) ) but of course the implementation will be quite different. As a recap, the steps involved are:

Find the search area where potential occluders could be
Find the average occluder depth
Calculate the penumbra size from the average depth
Test the shadow map to find the percentage (or in our case: probability) of occluders in the penumbra region.

The main building block

The main building block for our approach, unsurprisingly, uses the Chebyshev’s inequality theorem to find an upper bound for the probability that a sample is in the light. This is the default VSM fare:

// moments contains float2(E(x), E(x^2))
// reference contains the depth value of the point to be compared
float UpperBoundShadow(float2 moments, float referenceDepth)
{
float variance = moments.y - moments.x * moments.x;
// clamp to some minimum small variance value for numerical stability
variance = max(variance, MIN_VARIANCE);
float diff = referenceDepth - moments.x;

// Chebyshev's inequality theorem
float upperBound = variance / (variance + diff*diff);

// The upper bound is only correct when referenceDepth < moments.x (if not, return 1.0, ie: fully lit)
return max(upperBound, referenceDepth < moments.x);
}

Finding the occluder search area

This one is exactly the same as for PCSS with an exception for directional lights. If actual directional lights would exist, there would be no penumbra. After all, all light rays are parallel! Also, the traditional PCSS way of back-projecting makes little sense either because the light doesn’t have an actual position in space. To get some handle on it, we’ll settle for a fixed search area instead.

Find average occluder depth

The original PCSS approach tests shadow map samples in the search area to figure out whether they’re occluders. Then, the average depth for the occluders is calculated. Our approach will use Chebyshev’s inequality theory to again get an upper probability bound of occlusion for the entire search area. From this probability, we can calculate the average depth (see [Yang 2010] Variance Soft Shadow Mapping). is the total average depth, and are the average depths for occluders and non-occluders, respectively. is the probability of a point being lit (ie: a non-occluder). Then, we can make the following observation:

Since the area search approach is based on the simplification that receiving and casting planes are parallel to the shadow map, is the reference depth. The only thing left to do is calculate the probability and the average depth for the entire search area. We’ll do this again very coarsely: we’ll use a single sample in a coarser mip level. It’s not exactly precise, but seems to work well enough. The average depth is already right there in the red channel of the shadow map, and for the probability we’ll use the upper bound again. The code below is for illustration, don’t expect any optimizations:

// searchAreaSize is expressed in shadow map UV coords (0 - 1)
// shadowMapSize is the size of the shadow map in texels
// shadowMapCoord is the shadow map coord projected into directional light space (so z contains its depth)
float GetAverageOccluderDepth(float searchAreaSize, int shadowMapSize, float4 shadowMapCoord)
{
// calculate the mip level corresponding to the search area
// Really, mipLevel would be a passed in as a constant.
float mipLevel = log2(searchAreaSize * shadowMapSize);

// retrieve the distribution's moments for the entire area
// shadowMapSampler is a trilinear sampler, not a comparison sampler
float2 moments = shadowMap.SampleLevel(shadowMapSampler, shadowMapCoord.xy, mipLevel);
float averageTotalDepth = moments.x; // assign for semantic clarity
float probability = UpperBoundShadow(moments, shadowMapCoord.z);

// prevent numerical issues
if (probability > .99) return 0.0;

// calculate the average occluder depth
return (averageTotalDepth - probability * shadowMapCoord.z) / (1.0 - probability);
}

Calculate penumbra size from average depth

We calculate the penumbra size in the same exact way as in the PCSS. For directional lights, this again doesn’t hold up very well (ah you missing light position!) Instead, we can simply use the distance to the average occluder as a scale factor instead. It’s fun when things get simpler!

// softness is the light size expressed in shadow map UV coords (0 - 1)
// shadowMapSize is the size of the shadow map in texels
// shadowMapCoord is the shadow map coord projected into directional light space (so z contains its depth)
// penumbraScale is a value describing how fast the penumbra should go soft. It can also be used to control the world space fall-off (by projecting world space distances to depth values)
float EstimatePenumbraSize(float lightSize, int shadowMapSize, float4 shadowMapCoord, float penumbraScale)
{
// the search area covers twice the light size
float averageOccluderDepth = GetAverageOccluderDepth(lightSize, shadowMapSize, shadowMapCoord);
float penumbraSize = lightSize * (shadowMapCoord.z - averageOccluderDepth) * penumbraScale;

// clamp to the maximum softness, which matches the search area
return min(penumbraSize, lightSize);
}

Calculate occluder probability

Instead of using a pre-blurred mip-map chain or a SAT table, we’ll perform the filtering on the fly. We’ll start by sampling a fixed number of points in a Poisson disk distribution to get the (approximate) moments of the entire filter region (ie: the penumbra size). We’ll rotate the sample points randomly to reduce banding in favour of noise. This is essentially the same as percentage closer filtering, but using probabilities instead. So, a first draft:

float4 shadowMapCoord = mul(fragmentPosition, shadowMapMatrix);
float penumbraSize = EstimatePenumbraSize(lightSize, shadowMapSize, shadowMapCoord, penumbraScale);
float2 moments = 0.0;
// ditherTexture contains 2d rotation matrix (cos, -sin, sin, cos), this will tile the texture across the screen
float4 rotation = ditherTexture.SampleLevel(nearestWrapSampler, screenUV * screenSize / ditherTextureSize, 0) * 2.0 - 1.0;

for (int i = 0; i < numShadowSamples; ++i) {
// poissonDiskValues contain the sampling offsets in the unit circle
// scale by penumbraSize / 2 to get samples within the penumbra radius (penumbraSize is diameter)
float2 sampleOffset = poissonDiskValues[i] * penumbraSize / 2;
float4 coord = shadowMapCoord;

// add rotated sample offset using dithered sample
coord.x += sampleOffset.x * rotation.x + sampleOffset.y * rotation.y;
coord.y += sampleOffset.x * rotation.z + sampleOffset.y * rotation.w;

// shadowMapSampler is a trilinear sampler, not a comparison sampler
moments += shadowMap.Sample(shadowMapSampler, shadowMapCoord.xy);
}
moments /= numShadowSamples;

float lightContribution = UpperBoundShadow(moments, shadowMapCoord.z);

But we can do better, observing that when sampling the disk distribution, we’d get a better approximation if we could get the average over every disk’s area instead of only at the sample point. Again, we can use the mip levels to get an approximation. A Poisson disk distribution has a minimum distance between any two points, so we can use this to calculate the mip level to sample from. Let’s replace some of the shader code:

float4 shadowMapCoord = mul(fragmentPosition, shadowMapMatrix);
float penumbraSize = EstimatePenumbraSize(lightSize, shadowMapSize, shadowMapCoord);
float2 moments = 0.0;
// ditherTexture contains 2d rotation matrix (cos, -sin, sin, cos), this will tile the texture across the screen
float4 rotation = ditherTexture.SampleLevel(nearestWrapSampler, screenUV * screenSize / ditherTextureSize, 0) * 2.0 - 1.0;

// calculate the mip level for the disk sample's area
// Sample points are expected to be penumbraSize * poissonRadius * shadowMapSize texels apart
// poissonRadius is half the minimum distance in the disk distribution
float mipLevel = log2(penumbraSize * poissonRadius * shadowMapSize);

for (int i = 0; i < numShadowSamples; ++i) {
// poissonDiskValues contain the sampling offsets in the unit circle
// scale by penumbraSize / 2 to get samples within the penumbra radius (penumbraSize is diameter)
float2 sampleOffset = poissonDiskValues[i] * penumbraSize / 2;
float4 coord = shadowMapCoord;

// add rotated sample offset using dithered sample
coord.x += sampleOffset.x * rotation.x + sampleOffset.y * rotation.y;
coord.y += sampleOffset.x * rotation.z + sampleOffset.y * rotation.w;

// shadowMapSampler is a trilinear sampler, not a comparison sampler
moments += shadowMap.SampleLevel(shadowMapSampler, shadowMapCoord.xy, mipLevel);
}
moments /= numShadowSamples;

float lightContribution = UpperBoundShadow(moments, shadowMapCoord.z);

Not only does this give us a better estimate, it also reduces the noise from the random rotations because samples are expected to differ less. And what’s more, we don’t have to take all that many samples, even for quite large filter sizes! Another way of looking at this approach is as blurring the mip levels in the lighting shader – just enough to remove the jaggies – instead of doing so on the shadow map directly.

Shadow map bounds

As usual with soft shadows, there’s an issue with sampling outside the shadow map boundaries. For this reason, it may be required to extend the shadow maps to accommodate the largest penumbra size (our “lightSize” value). You might also want to keep the light size within certain limits so that most of the shadow map usage isn’t just there to provide the area not on the screen.

Conclusion

Note the difference between the vase’s shadow and that of the distant flag pole next to it.

A problem with contact shadows for thin casters

As I’ve said before, chances are slim that I’ll actually use this approach in my own code – unless if it’s for something very specific and manageable. Variance shadow maps have too many issues for me:

Light leaking: while this can be ameliorated easily, but not completely avoided, solutions have a strong impact on the softness of the penumbra, destroying some of our hard work.
Thin caster leaks: the closer a point gets to the occluder, the smaller the upper bound becomes (as it’s less and less likely for it to be in the shadow). This creates severe light leaking close to thin casters.

But, again, VSMs have been used with success in the past, so who knows this article still may be of use to someone. You might run into other problems too, if you’re up for pursuing this.
And perhaps VSMs could be used only to perform the area search, and PCF sampling for the occlusion tests, which should remove any light leaking problems. Anyway, I’m up to receive any ideas, comments or poisonous arrows!

Deferred Subsurface Scattering using Compute Shaders

David — Mon, 02 Jun 2014 10:26:27 +0000

I’ve recently decided to look into supporting subsurface scattering in my playground rendering engine Helix. It’s not the first time I’ve dabbled in the subject, but not being limited to crappy platforms I could push things a bit further. It’s a well researched and oft written about topic, so I’ve been reluctant to write up on the results of my implementation, especially seeing how heavily it’s based on these writings. But then I reflected back at why I started this blog in the first place: sharing a learning process. Not everyone likes reading papers so an implementation overview to go along with them might be helpful. The implementation I’ll show is simplified and slightly different to what you’ll find in the source material:

Eugene d’Eon: GPU Gems 3 Chapter 14: Advanced Techniques for Realistic Real-Time Skin Rendering
Jorge Jimenez and Diego Gutierrez’ article in GPU Pro: Screen-Space Subsurface Scattering
Nicolas Schulz: The Rendering Technology of Ryse (I discovered relatively late how similar my solution was to this, which is both a bummer and encouraging ;) )

I really recommend checking out these links if you have more than a passing interest in the subject (or to verify that I really can’t take credit for much in this post!). Finally, Gaussian convolutions are an important concept in what follows so if you’re hazy on the subject, read up on Gaussian blurs and how to implement them on compute shaders (explained in any decent DirectX 11 intro book, or here).

The head model used in the screenshots is provided by Ten24 here.

Introduction

Simply put, light can do two things when hitting an object. Either it reflects (specular reflections) or it enters the object. In the latter case, it bounces around inside for a bit before re-emerging at the surface (diffuse scattering). The properties of the material define how far the light is likely to travel before exiting. This distance is often so small that we traditionally assume that it emerges at the same exact point it enters. This distance can however be quite large for translucent materials and this assumption fails to result in convincing images; surfaces look to harsh or claylike.
As light passes through, parts of its spectrum get absorbed. In other words, light tends to discolour more the further it travels underneath the surface. The function of discolouration over distance is expressed using a diffusion profile (refer to d’Eon). Some materials consist of several layers that absorb light differently (consider the layers of skin: oil, epidermis, dermis) so the diffusion profile can get quite complex. d’Eon uses the sum of 6 Gaussians to approximate the profile for skin, and Jimenez further reduces it to 4 (which can be implemented as 3, more on that later). Again, I suggest reading those articles if you haven’t yet, I don’t want to repeat them too much.
Translucency results in a couple of visible effects, depending on where the light enters and exits the surface relative to the viewer. We’ll deal with each in separate ways.

Back-lit transmittance: Light enters the back side of an object and exits from the front.
Same-side surface scattering: Light exits on the same side of an object.

“That’s a nice backlit ear you got there!”

For my own implementation, I put forth the following goals:

Both back-lighting and same-side surface scattering should be supported.
It needs to work nicely with Helix’s deferred rendering pipeline. This meant working in screenspace.
Support multiple material profiles: not only skin, but wax, marble, jade, sticky white substances, …
While not being limited to it, it should be able to represent believable skin

Render pipeline

To get started, I’ll detail the GBuffer layout and the render pipeline I settled on (even if it’s something I keep changing constantly ;) ):

0	Depth			Stencil
1	Albedo R	Albedo G	Albedo B	Emission
2	Packed normal X	Packed normal Y	Translucency	Extended Material Profile ID
3	Metallicness	Normal specular reflectivity	Roughness	TBD

Layer 3 is irrelevant for subsurface scattering since it’s only used for specular reflections. In case you’re interested, it’s not unlike Unreal Engine 4’s approach to specular representations.

Two entries are mainly of concern here:

Translucency: The amount of back-lighting allowed to pass through the surface.
Extended material profile: Contains an ID of the surface type. For example: 0 = default, 1 = skin, 2 = marble, … More on that later.

Subsurface scattering does not affect specular reflections, so we’ll need to accumulate the lighting using separate HDR light accumulation buffers for diffuse and specular (the R11G11B11_FLOAT texture format worked well for me). Our diffuse target does not have albedo applied yet. The render pipeline is as follows:

Render material properties to the GBuffer.
Render lights to diffuse + specular accumulation buffers (lighting includes transmittance)
Perform subsurface scattering
Combine lighting and apply albedo: light = (diffuse + emission) * albedo + specular
Post-processing (bloom, tone mapping, …)

Helix’ lighting pipeline

Why doesn’t the diffuse accumulation buffer have albedo applied, you ask? It probably doesn’t matter all that much, but my reasoning is as follows: albedo maps are usually generated from scans in evenly lit situations and as such already exhibit a degree of subsurface scattering (that’s why they are coloured the way they are in the first place!). Similarly, maps created by artists tend to mimic this look as well.

Extended Material Profiles

To support different material types, we store an “extended material profile” index in the GBuffer. This will be used to access a structured buffer object in the shaders. Each entry is of type ExtendedMaterialProfile which contains details about the (sub)surface properties. Since these properties are per material type and don’t vary per pixel (which is of course a simplification of reality) we don’t need to store all properties in the GBuffer, which would be prohibitively expensive. This construct is not necessarily only used for subsurface scattering but could be extended for other effects. The ExtendedMaterialProfile struct is defined in the shader as follows:

struct ExtendedMaterialProfile
{
// same-side scattering properties
int enableSubsurfaceScattering;
uint numGaussians;
float subsurfaceRadius;
float3 originalBlendFactors;
float3 subsurfaceBlends[MAX_GAUSSIANS];
float4 subsurfaceGaussianExponents;

// back-lit transmittance properties
int enableDistanceBasedTransmittance;
float3 transmittanceCoefficient;
};

What every property means will be explained as we go.

Back-lit Transmittance

For default materials, we handle back-lit transmittance in a very traditional way: we invert the normal, calculate lighting for that and add it to the calculated light:

diffuseLight = Diffuse(LightDir, Normal) + Diffuse(LightDir, -Normal) * extendedMaterialProfile.transmittanceCoefficient * GBuffer.translucency;

transmittanceCoefficient is a simple colour value to modulate the amount of light transmitted. This approach is useful for thin surfaces such as leaves or paper. For objects with more volume we need to calculate (or rather: estimate) how far light has travelled through the object in order to know how much of it is absorbed. This is in fact the same as we did way back in Away3D Broomstick.

To recap: we get the depth value from the shadow map and use that to calculate the position of the occluder (not sure how?). We can use the distance between the occluder and the shaded point as an estimate of how far the light has travelled through the object. Unfortunately, this approach requires lights to have shadow maps associated with them. Helix simply ignores distance-based transmittance for those that don’t. You may also want to consider storing linear depth values for point and spot lights to prevent reduced precision further away from the light.

Armed with this distance value, I’ve found that using the Beer-Lambert law for transmittance allows for convincing enough results for common cases. For each colour channel, the transmitted ratio of light for distance is as follows:

Again, we simply use the inverted normal to get an approximation of light hitting the other side of the surface. The total diffuse lighting for the pixel will be:

diffuseLight = Diffuse(LightDir, Normal) + Diffuse(LightDir, -Normal) * exp(-extendedMaterialProfile.transmittanceCoefficient * distance) * GBuffer.translucency;

Louis Bavoil suggests a nice artist-friendly way to calculate the transmittanceCoefficient value for a measured colour at a given distance, which is implemented on the C++ side of the ExtendedMaterialProfile in the following convenience method:

void ExtendedMaterialProfile::SetTransmittanceCoefficientByDistance(float3 measuredColor, float measureDistance)
{
enableDistanceBasedTransmittance = 1;
transmittanceCoefficient = -Ln(measuredColor) / measureDistance;
}

The transmittance mask used for the head

The enableDistanceBasedTransmittance property dictates which approach is used. For leaves, we’d set it to 0, for skin we’d want 1. The amount of transmitted light is modulated using a transmittance mask, the values of which are written to the GBuffer.

You could also use the diffusion profile to calculate the transmittance (which is something I’ll probably experiment with at some point). For now, this is faster and quite acceptable.

The result of the back-lit implementation

Same-side subsurface scattering

Comparison between not using same-side subsurface scattering and the implementation in Helix

This aspect deserves a bit more finesse. After all, it’s what gives skin its organic fleshy look and we want the implementation to be solid enough to support this believably. As humans, we’re very attuned to recognizing others as humans; we can easily spot fake ones based on small perceptional errors. We’ll base ourselves on Jimenez’ approach of using 4 Gaussians and we’ll treat other diffusion profiles the same way. Remember what Gaussian distributions look like for variance :

Jimenez’ approach can be thought of as performing 4 Gaussian blurs on the image and blending them together with different weights per colour channel. Note that the sum of all blend weights must be 1 for every colour channel or energy would be lost or gained when compositing.

Blending together weighted Gaussian convolutions

Talking about Gaussian blurs; some implementations do exactly that. By using a fixed sample count the samples’ weights can be precomputed (the total blended sum-of-gaussians, that is). The sampling disk around the centre pixel is rotated to more closely match the orientation of the surface and projected to screen space. This way, a somewhat correct range is sampled. However, using the distance of the sampling points’ on the sampling disk as weights for the Gaussian convolution assumes that all sampled points are evenly spaced. This does not necessarily match with what’s on screen. Take a look at a top-down view of such a sampling:

As you can see, the real distances to the central point are quite different. This can lead to an over-contribution of samples at strongly varying surfaces, manifesting in halos. In my implementation, I used the depth buffer to reconstruct view space positions to get actual view-space distances. While still not a correct estimate of how far the light has travelled underneath the surface, it’s a better approximation. However, it also means that the samples aren’t necessarily evenly distributed with respect to the Gaussian curve. This is really only a problem at discontinuities and is in my opinion less objectionable than halo artefacts. Since our sampling weights are not known beforehand, we need to manually normalize the calculation. This means we’ll need to sum all calculated weights and use it to ‘average out’ the total.

There’s some extra benefits to this approach. Because we’re manually normalizing the total, we don’t have to use a fixed sample count: we can limit the sampling to exactly the pixels we need. A traditional Gaussian convolution with precomputed weights would require us to sample points, even if the radius is less than pixels. This doesn’t really contribute any new info (sampling 11 points across 4 pixels is a waste). Similarly, when the radius exceeds , we can now add the contribution for every pixel, increasing the quality.

A smaller bonus is that the Gaussian calculation can be a bit simplified. The gaussian functions themselves don’t need to be scaled with a normalization factor as it will be implicit in the total (ie: we can drop the factor in the Gaussian formula above).

Using the real distance and a manual normalization isn’t without its drawbacks, however. Most obviously, the position needs to be reconstructed from the depth buffer. This means extra texture fetches and calculations. Furthermore, we can’t precompute the sum-of-Gaussians and store them in a lookup texture. Every Gaussian will need to be calculated separately, and a total weight should be counted for each, so that each curve can be normalized individually. If we wouldn’t do this, we’d get an incorrect balance between the layers.

Finally, Jimenez observes that the first Gaussian for skin has such a small variance that it usually wholly falls within a single pixel. This means that we can just calculate 3 Gaussians and the value from the original diffuse buffer’s value.

Separability

2D Gaussians are separable: it’s identical to a horizontal 1D Gaussian followed by a vertical one, reducing the amount of samples necessary to instead of . However, this is not the case with depth-dependent Gaussians, nor the sum of several. However, ignoring this and merging everything in 2 passes anyway (instead of doing it in up to 8) doesn’t result in a noticeable difference.

Comparing the correct 2D approach with the less correct 2x1D separated approach

I needed to use the histogram in Photoshop with the “difference” blend layer to verify that there was in fact a slight difference, mainly due to the wider red Gaussian. In any case, it gets my pseudo-separable stamp!

Performing 1D convolutions sampling only at pixel centres is pretty efficient using compute shaders, meaning we can get higher fidelity and less noise than using a fragment shader using jittered samples.

Implementation

Returning to the ExtendedMaterialProfile struct, we can now explain what the other properties mean:

enableSubsurfaceScattering; Indicates whether or not subsurface scattering should be performed for this material.
numGaussians: the amount of Gaussians. 3 for skin, for example.
subsurfaceRadius: the sample radius in meters of the largest Gaussian, derived from the variances.
originalBlendFactors: the amount of unblurred diffuse lighting that is blended in. This is used to replace the smallest Gaussian for skin.
subsurfaceBlends: the amount of blending for each Gaussian layer. Summed all together with originalBlendFactors, it needs to form 1 for each channel.
subsurfaceGaussianExponents: The exponents used to calculate the Gaussian weights ( from the Gaussian formula above)

On the C++ side, another convenience method is provided to set the subsurface properties:

void ExtendedMaterialProfile::SetSubsurfaceScattering(unsigned int numGaussians, const float3* blendWeights, const float* variances)
{
float3 total(0.0, 0.0, 0.0);
enableSubsurfaceScattering = 1;
this->numGaussians = numGaussians;

for (unsigned int i = 0; i < numGaussians; ++i) {
// calculate the total blend weights of the gaussians, used to automatically set the amount of unblurred light
total += blendWeights[i]; subsurfaceBlends[i] = blendWeights[i];
// gaussian normal distribution
subsurfaceGaussianExponents[i] = -1.0f / (2.0f * variances[i]);
// use standard deviation as a radius estimate
// Gaussian is expressed in terms of millimeters, radius needs to be in meters, so divide by 1000.0f!
float radius = Sqrt(variances[i]) / 1000.0f;
if (radius > subsurfaceRadius)
subsurfaceRadius = radius;
}

// the amount of the original unblurred diffuse is just as much so that all blend weights sum to 1
originalBlendFactors = 1.0 - total;
}

We’re using compute shaders to perform the 1D convolutions, so we can efficiently pre-fetch the texture samples and calculate the view-space positions in faster group-shared memory. An overview of the compute shader:

Gather and precompute everything required for the current compute shader thread group, so we don’t need to do this per sample:
- diffuse radiance, sampled from accumulation buffer
- view-space position, derived from depth buffer
Retrieve the extended material ID.
Project the sample radius using the camera to get a radius approximation in screen space.
Loop over all samples falling within the projected sample radius.
- Calculate distance to central pixel
- Calculate Gaussian weight for pixel
- Add weighted radiance sample
- Add weight to total
“Average out” total radiance using total weight count.

The code below should make this a bit clearer. It’s for the horizontal Gaussians only, but the vertical is nearly identical:

#define NUMTHREADS 256
#define MAX_RADIUS 32

#define MAX_GAUSSIANS 4

struct ExtendedMaterialProfile
{
int enableSubsurfaceScattering; // whether or not to use subsurface scattering (0 or 1)
uint numGaussians; // the amount of gaussians to approximate the diffusion profile
float subsurfaceRadius; // the radius in meters of the largest Gaussian
float3 originalBlendFactors; // the ratio of the original unblurred texture to add
float3 subsurfaceBlends[MAX_GAUSSIANS]; // the blend weights for each gaussian
float4 subsurfaceGaussianExponents; // The constant factor of the Gaussian exponents: -1.0f / (2.0f * variances[i]);
// We store the exponents for all 4 in a single float4 for convenience (see code)

int enableDistanceBasedTransmittance; // whether or not to use distance-based transmittance scattering (0 or 1, unused here but used in the lighting shader)
float3 transmittanceCoefficient; // if enableDistanceBasedTransmittance = 0, this is just the colour of the backlit light, otherwise, it's the density of the beer-law exponent
};

cbuffer cameraData
{
float4x4 projectionMatrix; // the local projection matrix
float4 viewFrustumVectors[4]; // contains the view frustum vectors for each edge going from the near to far plane, scaled so that z == 1, in clockwise order starting top-left
float2 renderTargetResolution; // the size of the render target
};

Texture2D<float3> diffuseSource;
Texture2D<float> gbufferDepth;
Texture2D<float4> gbufferNormalMaterial;
StructuredBuffer<ExtendedMaterialProfile> extendedMaterialProfiles;

RWTexture2D<float3> diffuseTarget;

groupshared float3 radianceSamples[2 * MAX_RADIUS + NUMTHREADS];
groupshared float3 positionSamples[2 * MAX_RADIUS + NUMTHREADS];

// Retrieve the view vector for a given pixel coordinate.
float4 GetViewVector(uint2 coord)
{
// turn coords into a uv ratio for interpolation
float2 uv = coord / renderTargetResolution;

// Uses standard trilinear interpolation
float4 top = lerp(viewFrustumVectors[0], viewFrustumVectors[1], uv.x);
float4 bottom = lerp(viewFrustumVectors[3], viewFrustumVectors[2], uv.x);
return lerp(bottom, top, uv.y);
}

// Unproject a value from the depth buffer to the Z value in view space.
// Multiply the result with an interpolated frustum vector to get the actual view-space coordinates
// Refer to http://www.derschmale.com/2014/01/26/reconstructing-positions-from-the-depth-buffer/ for more info on this:
float DepthToViewZ(float depthValue)
{
return projectionMatrix[3][2] / (depthValue - projectionMatrix[2][2]);
}

// returns the view-space position for the point at the given pixel
// coord: the pixel coordinate to sample the depth at
// viewDir: the view direction (with z == 1) matching the pixel coordinate
// Refer to http://www.derschmale.com/2014/01/26/reconstructing-positions-from-the-depth-buffer/ for more info on this
float3 GetViewPosition(uint2 coord, float3 viewDir)
{
return viewDir * DepthToViewZ(gbufferDepth[coord]);
}

[numthreads(NUMTHREADS, 1, 1)]
void main(uint3 dispatchThreadID : SV_DispatchThreadID, uint3 groupThreadID : SV_GroupThreadID)
{
// Store all radiance samples and view-space positions in groupshared memory.
// See Gaussian blur example at: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Efficient%20Compute%20Shader%20Programming.pps

float3 viewDir = GetViewVector(dispatchThreadID.xy).xyz;
float3 frustumDiff = float3((viewFrustumVectors[2].x - viewFrustumVectors[3].x) / renderTargetResolution.x, 0.0, 0.0);
float3 centerSample = diffuseSource[dispatchThreadID.xy];
float3 viewPosition = GetViewPosition(dispatchThreadID.xy, viewDir);

// the central pixel is placed in "groupThreadID.x + MAX_RADIUS"
radianceSamples[groupThreadID.x + MAX_RADIUS] = centerSample;
positionSamples[groupThreadID.x + MAX_RADIUS] = viewPosition;

if (groupThreadID.x < MAX_RADIUS) {
float2 coord = dispatchThreadID.xy - uint2(MAX_RADIUS, 0);
radianceSamples[groupThreadID.x] = diffuseSource[coord];
positionSamples[groupThreadID.x] = GetViewPosition(coord, viewDir - frustumDiff * MAX_RADIUS);
}
else if (groupThreadID.x >= NUMTHREADS - MAX_RADIUS) {
float2 coord = dispatchThreadID.xy + uint2(MAX_RADIUS, 0);
radianceSamples[groupThreadID.x + 2 * MAX_RADIUS] = diffuseSource[coord];
positionSamples[groupThreadID.x + 2 * MAX_RADIUS] = GetViewPosition(coord, viewDir + frustumDiff * MAX_RADIUS);
}

// Wait for all data to be ready
GroupMemoryBarrierWithGroupSync();

// fetch material profile ID, stored in alpha channel of the normal/material GBuffer texture
float4 normalMaterialData = gbufferNormalMaterial[dispatchThreadID.xy];
ExtendedMaterialProfile profile = extendedMaterialProfiles[normalMaterialData.w * 255];

// if no subsurface scattering is required, simply output the original sample
if (!profile.enableSubsurfaceScattering) {
diffuseTarget[dispatchThreadID.xy] = centerSample;
return;
}

// project the sample radius
float w = viewPosition.z * projectionMatrix[2][3] + projectionMatrix[3][3];
float radiusProjection = projectionMatrix[1][1] / w * renderTargetResolution.x * .25;
int sampleRadius = profile.subsurfaceRadius * radiusProjection;

// sample radius too small, would just convolute a single pixel, so just return that immediately
if (sampleRadius < 1) {
diffuseTarget[dispatchThreadID.xy] = centerSample;
return;
}

// make sure we don't go out of bounds (usually when getting the camera very close)
sampleRadius = min(sampleRadius, MAX_RADIUS - 1);

float4 totalWeights = 0;
// stores all 4 blurs in a single var
float4x3 totals = 0;

for (int i = -sampleRadius; i <= sampleRadius; ++i) {
// Remember the central pixel is placed in "groupThreadID.x + MAX_RADIUS"
int index = groupThreadID.x + MAX_RADIUS + i;

float3 dir = positionSamples[index] - viewPosition;

// Calculate the squared distance and convert from meters^2 to millimeters^2 (squared, so multiply by 1000^2)
float distSqr = dot(dir, dir) * 1000000.0f;

// Calculate all 4 Gaussian weights
float4 weights = exp(distSqr * profile.subsurfaceGaussianExponents);

totalWeights += weights;

// add the sample to each layer with their respective Gaussian weight
[unroll]
for (int j = 0; j < MAX_GAUSSIANS; ++j)
totals[j] += weights[j] * radianceSamples[index];

}

// start with the amount of original diffuse light that we specified
float3 total = centerSample * profile.originalBlendFactors;

// add blended blurred results
[unroll]
for (int j = 0; j < MAX_GAUSSIANS; ++j)
total += totals[j] / totalWeights[j] * profile.subsurfaceBlends[j].xyz;

diffuseTarget[dispatchThreadID.xy] = total;
}

Conclusion

Showing the subtle fleshy discolouration using the skin settings

By adding different configurations to the StructuredBuffer in the shader, you can easily support different materials per shader. Skin is created using Jimenez’ settings:

float variances[3] = { .0516f, .2719f, 2.0062f };
float3 blends[3] = { float3(.1158f, .3661f, .3439f), float3(.1836f, .1864f, .0f), float3(.46f, .0f, .0402f) };
profile.SetSubsurfaceScattering(3, blends, variances);
profile.SetTransmittanceCoefficientByDistance(float3(.94f, .14f, .14f), .0002f);

Other materials can be created, such as wax:

float variances[4] = { .362f, 2.144f, 8.555f, 34.833f };
float3 blends[4] = { float3(.0544f, .1245f, .2177f), float3(.2436f, .2435f, .1890f), float3(.3105f, .3158f, .3742f), float3(.3913f, .3161f, .2189f) };
profile.SetSubsurfaceScattering(4, blends, variances);
profile.SetTransmittanceCoefficientByDistance(float3(.3913f, .3161f, .2189f), .1f);

Using the wax settings

I’ll leave you with an observation. Recently, I’ve been watching Breaking Bad (yeah, I’m way behind on the cool stuff). Don’t you think Cranston has the best distribution profile going on?

Reconstructing positions from the depth buffer pt. 2: Perspective and orthographic general case

David — Wed, 19 Mar 2014 21:30:56 +0000

Introduction

It’s been a while since last time, when I promised a general method for depth reconstruction regardless of projection type. I had told myself to do it “soon”. Due to lack of time partly caused by moving to Ghent and other random occurrences “soon” changed into “sooner or later” and finally we’ve arrived at “later”, but here we are!
As a small disclaimer, implementing such a general case only makes sense if you need to support different projection types and can’t provide separate shaders for each. Bespoke implementations are obviously faster, especially in the case of orthographic projections.
For what follows I will continue to use excruciatingly slow step-by-step derivations.

The orthographic case

For completion’s sake, I’ll show position reconstruction for orthographic-only projections. This is considerably easier compared to perspective projections. After all, the value stored in the depth buffer is a value that maps linearly to the near and far plane. As a result, reconstructing the view-space value is therefore a simple linear interpolation between the two:

Recall that in DirectX, is simply the depth buffer value, while in OpenGL it’s . The complete position can be similarly reconstructed:

and are the positions on the near and far plane for the current pixel being shaded. They are calculated by bilinearly interpolating between the near and far frustum corners. By simply passing the frustum corners from the vertex shader to the fragment shaders as interpolated values, this is handled by the hardware in the same way we interpolated the frustum direction in the previous post. The corners themselves are the 8 different combinations of .

Reconstructing z: Generalization

Reconstruction the view-space value for the generalization doesn’t change much compared to the perspective projection-only version, except that we can’t make the same assumptions concerning the projection matrix; and are not 0 for orthographic projections. So we’re stuck with this:

Review the previous post if you’re hazy on why.

Calculating the position from the z-value: Generalization

In the perspective-only case, we reconstructed the position value assuming the ray origin was 0. This is no longer the case for orthographic projections, as you can see below:

Evidently this means we’re going to have to use the full ray equation. We’ll have to define a ray origin point which we’ll keep on the -plane going through the origin to remain compatible with the origin used in the perspective version. We’ll call this point , with . For both cases, we define the ray to be:

Assuming nothing, we need to calculate . The line coinciding with the ray can be expressed as using a different origin, one whose value we know. Let’s pick the near position . is simply that line’s intersection with the plane, found by solving for :

Plugging back into the line equation , we get :

Remember ? We defined this as , the z-normalized view vector. Armed with a ray origin, we can redo the same math as before. We’re interested in the point on the ray that is the view position, ie: .

Using the fact that . Plugging back into the ray equation:

I’ve taken the long way round showing what you probably already figured intuitively: it’s the same as the perspective case, but simply taking into account the origin vector.
Similarly to both the bespoke orthogonal and the perspective cases, and are to be precomputed for each corner of the quad that’s being rendered. The vertex shader can then simply pass them along to the fragment shader so they’re automatically interpolated for the pixel we’re currently operating on.

Calculating the view vectors

All that rests us to do is to calculate the z-normalized view vectors for each quad corner. This is simply done by calculating the near and far frustum corners and z-normalizing the difference. The corners in view space are handled entirely the same as we did last time: unprojecting NDC coordinates. The following code shows this, and also performs the ray origin calculations.

// For near corners, you should set z = -1.0f instead of 0.0f for OpenGL
// This time it does matter since we're using the unprojection position for the origin calculation.
Vector3D nearHomogenousCorners[4] = {
Vector3D(-1.0f, -1.0f, 0.0f, 1.0f),
Vector3D(1.0f, -1.0f, 0.0f, 1.0f),
Vector3D(1.0f, 1.0f, 0.0f, 1.0f),
Vector3D(-1.0f, 1.0f, 0.0f, 1.0f)
};

Vector3D farHomogenousCorners[4] = {
Vector3D(-1.0f, -1.0f, 1.0f, 1.0f),
Vector3D(1.0f, -1.0f, 1.0f, 1.0f),
Vector3D(1.0f, 1.0f, 1.0f, 1.0f),
Vector3D(-1.0f, 1.0f, 1.0f, 1.0f)
};

Matrix3D inverseProjection = projectionMatrix.Inverse();
Vector3D rays[4];
Vector3D origins[4];

for (unsigned int i = 0; i < 4; ++i) {
Vector3D& ray = rays[i];

// unproject the far and near frustum corners from NDC to view space
Vector3D nearPos = inverseProjection * nearHomogenousCorners[i];
Vector3D farPos = inverseProjection * farHomogenousCorners[i];
nearPos /= nearPos.w;
farPos /= farPos.w;
ray = farPos - nearPos;

// z-normalize this vector
ray /= ray.z;

origins[i] = nearPos - ray * nearPos.z;
}

Working in world space

You can simply change to world space reconstruction by transforming both and to world space, and everything will happen by itself:

Conclusion

One final note about the calculation above. Working with only 2 projection types, you may consider to check for the projection type and simply pass in more easily calculated values: for perspective and the 4 combinations of for orthographic projections. That’s up to you. Personally, unless absolutely necessary, I’ll always prefer an elegant calculation to a horrible if-statement checking for types!

So finally, I got this post out. I hope it can be of use to anyone. As always, any questions or suggestions, feel free to drop a line in the comment box.

Reconstructing positions from the depth buffer

David — Sun, 26 Jan 2014 17:27:14 +0000

Introduction

[edit] To start things off more easily, I decided to limit this post to perspective projections and move on to the generalization (including orthographic projections) in a next blog post.

When doing deferred shading or some post-processing effects, we’ll need the 3D position of the pixel we’re currently shading at some point. Rather than waste memory and bandwidth by storing the position vectors explicitly in a render target, the position can be reconstructed cheaply from the depth buffer alone. This is data we already have at our disposal. The techniques to do so are pretty commonplace, so this article will hardly be a major revelation. So why bother writing it at all? Well…

Too often, you’ll stumble over code keeping a position render target anyway.
Often, articles explaining the technique are not entry-level and skip over derivations, making it hard for beginners to figure things out.
Many aspects of the implementation are scattered across many articles and forum posts. I’d like a single comprehensive article.
It somewhat made a relatively unexplained appearance in the previous post’s sample code, so I figured I might as well elaborate.
The therapeutic value of writing? ;)

Since I’m trying to keep it at an entry level, the math will have a slightly step-by-step approach. Sorry if that’s too slow :)

Note: I remember an article by Crytek briefly mentioning similar material, but I can’t seem to find it anymore. If anyone can point me to it, let me know!

Non-linearity

So as you already (should) know, the depth buffer contains a range between 0 and 1, representing the depth on the near plane and far plane respectively. A ray goes from the camera through the “screen” into the world, and the depth defines where exactly on the ray that lies. If you’ve never touched the depth buffer before, you might be tempted to simply linearly interpolate between where that ray intersects the near and far planes. However, the depth buffer’s depth values are not necessarily linear (perspective projections), so there goes that idea.

Often, linear depth is stored explicitly in a render target to make this approach possible. Depending on your case, this might be a valuable option. If you expect to sample the depth along with the normals most of the times, you could throw it in there and have all the data in one texture fetch. This is, of course, provided you’re using enough precision in your render target which you might not want to spend on your normals.

Reconstructing z

So, it’s obvious we’re going to need to convert the depth buffer’s value to a linear depth representation. Instead of converting to another [0, 1]-based range, we’ll calculate the view position’s z value directly instead. As we’ll see later on, we can use this to very cheaply reconstruct the whole position vector. To do so, remember what the depth buffer’s value contains. In the vertex shader, we projected our vertices to homogeneous (clip) coordinates. These are eventually converted to normalized device coordinates (NDC) by the gpu by dividing the whole vector with its w component. The NDC coordinates essentially result in XY coordinates from -1 to 1 which can be mapped to screen coordinates, and a Z coordinate that is used to compare with and store in the depth buffer. It’s this value that we want. In other words, given projection matrix and view space position :

Writing this out in full for and :

Here, we assumed a regular projection matrix where the clip planes are parallel to the screen plane. No wonky oblique near planes! This means . is a regular point, hence .
Solving for :

If you know you’ll have a perspective projection, you can optimize by entering the values for and .
For DirectX ( and ):

For OpenGL ( and ):

If you’re using DirectX, is simply the depth buffer value. OpenGL uses the convention that ranges from -1 to 1, so .

Depending on your use case, you may want to precalculate this value into a lookup texture rather than performing the calculation for every shader that needs it.

Calculating the position from the z-value for perspective matrices

Now that we have the z coordinate of the position, we basically have everything we need to construct the position. For perspective projections, this is very simple. We know the point is somewhere on the view ray with direction (in view space, with origin = 0). For now, we assume nothing about (it’s not necessarily normalized or anything). We solve for using , the only component we know everything about, and substitute.

(Yes, I hear you sighing, this is a simple intersection test.)
With this formula, we can make an optimization by introducing a constraint on . If we resize D so that (let’s call this a z-normalization), then things simplify:

We can precalculate for each screen corner and pass it into the vertex shader. The vertex shader in turn can pass it on to the fragment shader as an interpolated value. Since interpolation is linear, we’ll always get a correct view ray with . This way, the reconstruction happens with a single multiplication! Using a compute shader as with the HBAO example, the interpolation has to be performed manually.

Calculating the view vectors

There’s various ways to go about calculating the view vectors for a perspective projection. Since it’s only done once every time the projection properties change it’s not exactly a performance-critical piece of code. I’ll go for what I consider the ‘neatest’ way. Since we use the projection matrix to map from view-space points to homogeneous coordinates and convert those to NDC, we can invert the process to go from NDC to view-space coordinates. The view directions for every quad corner correspond to the edges of the frustum linking near plane corners to far plane corners. These are simply formed by the NDC extents -1.0 and 1.0 (0.0 and 1.0 for z in DirectX), since the frustum in NDC forms the normal cube.

Edit: More on unprojections here.

// You could set z = -1.0f instead of 0.0f for OpenGL
// but it doesn't matter since any z value lies on the same ray anyway.

Vector3D homogenousCorners[4] = {
Vector3D(-1.0f, -1.0f, 0.0f, 1.0f),
Vector3D(1.0f, -1.0f, 0.0f, 1.0f),
Vector3D(1.0f, 1.0f, 0.0f, 1.0f),
Vector3D(-1.0f, 1.0f, 0.0f, 1.0f)
};

Matrix3D inverseProjection = projectionMatrix.Inverse();
Vector3D rays[4];

for (unsigned int i = 0; i < 4; ++i) {
Vector3D& ray = rays[i];

// unproject the frustum corner from NDC to view space
ray = inverseProjection * homogenousCorners[i];
ray /= ray.w;

// z-normalize this vector
ray /= ray.z;
}

Pass the rays into the vertex shader, either as a constant buffer using vertex IDs or as a vertex attribute and Bob’s your uncle!

Working in world space

If you want to perform your lighting or whatever in world space, you can simply transform the z-normalized view rays to world space and add the camera position. No need to perform matrix calculations in your fragment shader.

Conclusion

There we have it! I think it should be straightforward enough to implement this in a shader. If not, let me know and I shall have to expand on this. Just stop storing your position vectors now, mkay? :)

An alternative implementation for HBAO

David — Thu, 19 Dec 2013 23:13:13 +0000

Introduction

Image-space horizon-based ambient occlusion [HBAO] is a technique introduced by NVidia (Louis Bavoil et al.) in 2008. I recommend checking out the following resources to find out exactly how the algorithm works, this post will build on it further:

ShaderX7 – Image-Space Horizon-Based Ambient Occlusion (by Louis Bavoil & Miguel Sainz)
The original Siggraph paper (by Louis Bavoil, Miguel Sainz & Rouslan Dimitrov)
Siggraph presentation
Horizon-Based Ambient Occlusion using Compute Shaders

The ShaderX7 book in particular offers a well-explained detailed approach to the problem.
Before I continue, I’d like to extend my gratitude to Louis Bavoil for taking the time to review this article and giving some insightful tips on how to improve the shader.

A recap

Roughly, HBAO works by raymarching the depth buffer, and doing this in a number of equiangular directions across a circle in screen-space.

For the derivations, all points and vectors are expressed in spherical coordinates relative to the eye space basis. The azimuth angle rotates about the negated eye space Z-axis and the elevation angle is relative to the XY plane. This is a bit different to the common way of defining the polar angle, which is usually relative to the main axis, but the approach serves us well. The elevation angle is the only angle that will be of much interest to us, since the azimuthal integration will be handled entirely by marching in the different directions.

With each raymarch step, the line between the sample point and the centre point is considered. Each time the elevation angle is larger than the previous maximum, a new chunk of occluding geometry has been found. We add in the occlusion for the arc-segment between the last two found horizon vectors and weigh it with a distance function.

Note that the horizon angle is measured with respect to the plane parallel to the XY view plane, and therefore a tangent vector needs to be taken into consideration, with its angle relative to the same plane. This yields the following equation, as noted in ShaderX7:

For the derivation and other implementation details, please refer to the source material. It will be useful as I’ll explain the differences in the implementation I did for Helix.

The Helix approach

With respect to this article, the most important aspect of the original paper is that it uses eye-space as a basis for spherical coordinates, which is where this post will differ. While we’ll still use eye-space as a basis for our positions and vectors in the shader, for the derivations we’ll express the spherical coordinates relative to the normal vector and the tangent plane. Both changes will result in a leaner and, more importantly, trig-less approach. Here’s the previous figure reimagined:

Which will result in a similar equation, without involving :

We’ll take some liberties with the attenuation function, approximating it piecewise (see original). In practice, this means we define it to be constant between two adjacent horizon vectors, using the value at the furthest sample point. Furthermore, since we don’t snap to texel centres, we don’t need to make tangent adjustments as in the original. Focusing on the inner integral, we get:

Making the following observation, we can make our lives a lot easier:

This allows us to rewrite any occurence of as a simple dot product. Using the piecewise approximation for , the equation then becomes:

[Update]

One thing I forgot to mention that needs to be taken into account is the case when the angle extends 90 degrees:

In this case, the occlusion should be total, but the dot product with the normal would result in the wrong angle. This is actually a limitation of working in screen-space, where marching in 2D does not match a marching in 3D due to overlap. What we need to do is test the angle with the tangent: if , we simply do not have proper data in the direction we’re interested in: it could be completely occluded, or not at all. Personally, I find picking “half-occluded” works best in this case.

Some more implementation details

As with SSAO, to reduce banding artefacts, the orientations of all the sample directions should be rotated randomly per pixel. Helix’s approach takes one similar to Crytek’s original SSAO algorithm, using a 4×4 ‘dither’ texture containing the 2D rotation factors which is indexed so that it becomes tiled 1:1 over the screen. To assure an even distribution, rather than going for random angles, I instead used 16 evenly spaced angles between 0 and with being the amount of raymarching directions.
Furthermore, the dither texture also contains a jitter factor, which we’ll use to offset the starting position to reduce banding artefacts. This is similar to the original, with some nuances to play nice with the way view space positions are reconstructed in Helix. As per Louis Bavoil’s suggestion, introducing an extra sample closer to the center of the kernel creates more interesting contact occlusions. In my approach, both are combined by jittering the ray’s start position between the closest neighbour and the first sample size.
Using dithering obviously introduces a lot of noise. Performing a depth-dependent 4×4 box blur afterwards will remove this while making sure every pixel always has contributions from the same sample directions. Some implementations rotate the directions entirely randomly (non-tiled) and blur heavily, but I feel this tends to create what I can only describe as “static AO clouding”: you can see a static AO pattern moving along with the camera. Other implementations like to combine 4×4 dithering and heavy blurring, but personally I like to retain some of the higher frequency occlusions.

Subtle normal-based occlusion

Regarding the normals, the source recommends using face normals which can be derived from the depth buffer. However, I simply use the per-pixel normals that are stored in the deferred renderer’s normal buffer. While this can introduce artefacts, the use of a bias angle largely cancels them out while still adding some normal-based occlusion.

Comparison

Below, you can see a comparison between this HBAO approach and traditional SSAO using similar settings and sample count.

HBAO (4×4 samples)

SSAO (16 samples)

HBAO (6×5 samples)

SSAO (32 samples)

Notice how there’s less over-occlusion, especially near discontinuities (for example between the curtain and floor) while some details are handled more correctly (such as the creases in the curtain to the right). Even with a relatively small amount of samples, the improvements are remarkable. I did boost the parameters on both, violating my own rules, but purely for illustrational purposes!

Example shader

I guess no one will be happy without some sample code. I can’t publish the entire Helix code (mainly because it’s a pretty inefficient mess right now) but I can show the shaders for the ambient occlusion step. The rest (blur shaders, etc) are default fare. All code is in HLSL Shader Model 5 for DirectX 11.

Conclusion

And that’s it! I hope this post was useful looking into different AO techniques. Any questions, comments, corrections, … are more than welcome!