Monday, August 14, 2017

Notes On Screen Space HIZ Tracing

Note: The Markdown version of this document is available and might have better formatting on phones/tablets.

The following is a small gathering of notes and findings that we made throughout the implementation of hiz tracing in screen space for ssr in Stingray. I recently heard a few claims regarding hiz tracing which motivated me to share some notes on the topic. Note that I also wrote about how we reproject reflections in a previous entry which might be of interest. Also note that I've included all the code at the bottom of the blog.

The original implementation of our hiz tracing method was basically a straight port of the "Hi-Z Screen-Space Tracing" described in GPU-Pro 5 by Yasin Uludag. The very first results we got looked something like this:

Original scene:

Traced ssr using hiz tracing:


The weird horizontal stripes were reported when ssr was enabled in the Stingray editor. They only revealed themselves for certain resolution (they would appear and disappear as the viewport got resized). I started writing some tracing visualization views to help me track each hiz trace event:

Using these kinds of debug views I was able to see that for some resolution, the starting position of a ray when traced at half-res happened to be exactly at the edge of a hiz cell. Since tracing the hiz structure relies on intersecting the current position of a ray with the boundary of cell it lies in, it means that we need to do a ray/plane intersection. As the numerator of (planes - pos.xy)/dir.xy got closer and closer to zero the solutions for the intersection started to loose precision until it completely fell apart.

To tackle this problem we snap the origin of each traced rays to the center of a hiz cell:

float2 cell_count_at_start = cell_count(HIZ_START_LEVEL);
float2 aligned_uv = floor(input.uv * cell_count_at_start)/cell_count_at_start + 0.25/cell_count_at_start;

Rays traced with and without snapping to starting pos of the hiz cell center:

This looked better. However it didn't address all of the tracing artifacts we were seeing. The results were still plagued with lots of small pixels whose traced rays failed. When investigating these failing cases I noticed that they would sometimes get stuck for no apparent reason in a cell along the way. It also occurred more frequently when rays travelled in the screen space axes (±1,0) or (0,±1). After drawing a bunch of ray diagrams on paper I realized that the cell intersection method proposed in GPU-Pro had a failing case! To ensure hiz cells are always crossed, the article offsets the intersection planes of a cell by a small offset. This is to ensure that the intersection point crosses the boundaries of the cell it's intersecting so that the trace continues to make progress.

While this works in most cases there is one scenario which results in a ray that will not cross over into the next hiz cell (see diagram bellow). When this happens the ray wastes the rest of it's allocated trace iterations intersecting the same cell without ever crossing it. To address this we changed the proposed method slightly. Instead of offsetting the bounding planes, we choose the appropriate offset to add depending on which plane was intersected (horizontal or vertical). This ensures that we will always cross a cell when tracing:

float2 cell_size = 1.0 / cell_count;
float2 planes = cell_id/cell_count + cell_size * cross_step + cross_offset;
float2 solutions = (planes - pos.xy)/dir.xy;

float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);
return intersection_pos;
float2 cell_size = 1.0 / cell_count;
float2 planes = cell_id/cell_count + cell_size * cross_step;
float2 solutions = (planes - pos)/dir.xy;

float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);
intersection_pos.xy += (solutions.x < solutions.y) ? float2(cross_offset.x, 0.0) : float2(0.0, cross_offset.y);
return intersection_pos;

Incorrect VS correct cell crossing:

Final result:

Ray Marching Towards the Camera

At the end of the GPU-Pro chapter there is a small mention that raymarching towards the camera with hiz tracing would require storing both the minimum and maximum depth value in the hiz structure (requiring to bump the format to a R32G32F format). However if you visualize the trace of a ray leaving the surface and travelling towards the camera (i.e. away from the depth buffer plane) then you can simply acount for that case and augment the algorithm described in GPU-Pro to navigate up and down the hierarchy until the ray finds the first hit with a hiz cell:

if(v.z > 0) {
  float min_minus_ray = min_z - ray.z;
  tmp_ray = min_minus_ray > 0 ? ray + v_z*min_minus_ray : tmp_ray;
  float2 new_cell_id = cell(tmp_ray.xy, current_cell_count);
  if(crossed_cell_boundary(old_cell_id, new_cell_id)) {
    tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
    level = min(HIZ_MAX_LEVEL, level + 2.0f);
} else if(ray.z < min_z) {
  tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
  level = min(HIZ_MAX_LEVEL, level + 2.0f);

This has proven to be fairly solid and enabled us to trace a wider range of the screen space:

Ray Marching Behind Surfaces

Another alteration that can be made to the hiz tracing algorithm is to add support for rays to travel behind surface. Of course to do this you must define a thickness to the surface of the hiz cells. So instead of tracing against extruded hiz cells you trace against "floating" hiz cells.

With that in mind we can tighten the tracing algorithm so that it cannot end the trace unless it finds a collision with one of these floating cells:

if(level == HIZ_START_LEVEL && min_minus_ray > depth_threshold) {
  tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset);
  level = HIZ_START_LEVEL + 1;

Tracing behind surfaces disabled VS enabled:

Unfortunately this often means that the traced rays travelling behind a surface degenerate into a linear search and the cost can skyrocket for these traced pixels:

Number of iterations to complete the trace (black=0, red=64):

The Problem of Tracing a Discrete Depth Buffer

For me the most difficult artifact to understand and deal with when implementing ssr is (by far) the implications of tracing a discreet depth buffer. Unless you can fully commit to the idea of tracing objects with infinite thicknesses, you will need to use some kind of depth threshold to mask a reflection if it's intersection with the geometry is not valid. If you do use a depth threshold then you can (will?) end up getting artifacts like these:

The problem as far as I understand it, is that rays can osciliate from passing and failing the depth threshold test. It is essentially an amplified alliasing problem caused by the finite resolution of the depth buffer:

I have experimented with adapting the depth threshold based on different properties of the intersection point (direction of reflected ray, angle of insidence at intersection, surface inclination at intersection) but I have never been able to find a silver bullet (or anything that resembles a bullet to be honest). Perhaps a good approach could be to interpolate the depth value of neighboring cells if the neighbors belong to the same geometry? I think that Mikkel Svendsen proposed a solution to this problem while presenting Low Complexity, High Fidelity: The Rendering of "INSIDE" but I have yet to wrap my head around the proposed solution and try it.

All or Nothing

Finally it's worth pointing out that hiz tracing is a very "all or nothing" way to find an intersection point. Neighboring rays that exhaust their maximum number of allowed iterations to find an intersection can end up in very different screen spaces which can cause a noticeable discontinuity in the ssr buffer:

This is something that can be very distracting and made much worst when dealing with a jittered depth buffer when combined with taa. This side-effect should be considered carefully when choosing a tracing solution for ssr.


float2 cell(float2 ray, float2 cell_count, uint camera) {
 return floor(ray.xy * cell_count);

float2 cell_count(float level) {
 return input_texture2_size / (level == 0.0 ? 1.0 : exp2(level));

float3 intersect_cell_boundary(float3 pos, float3 dir, float2 cell_id, float2 cell_count, float2 cross_step, float2 cross_offset, uint camera) {
 float2 cell_size = 1.0 / cell_count;
 float2 planes = cell_id/cell_count + cell_size * cross_step;

 float2 solutions = (planes - pos)/dir.xy;
 float3 intersection_pos = pos + dir * min(solutions.x, solutions.y);

 intersection_pos.xy += (solutions.x < solutions.y) ? float2(cross_offset.x, 0.0) : float2(0.0, cross_offset.y);

 return intersection_pos;

bool crossed_cell_boundary(float2 cell_id_one, float2 cell_id_two) {
 return (int)cell_id_one.x != (int)cell_id_two.x || (int)cell_id_one.y != (int)cell_id_two.y;

float minimum_depth_plane(float2 ray, float level, float2 cell_count, uint camera) {
 return input_texture2.Load(int3(vr_stereo_to_mono(ray.xy, camera) * cell_count, level)).r;

float3 hi_z_trace(float3 p, float3 v, in uint camera, out uint iterations) {
 float level = HIZ_START_LEVEL;
 float3 v_z = v/v.z;
 float2 hi_z_size = cell_count(level);
 float3 ray = p;

 float2 cross_step = float2(v.x >= 0.0 ? 1.0 : -1.0, v.y >= 0.0 ? 1.0 : -1.0);
 float2 cross_offset = cross_step * 0.00001;
 cross_step = saturate(cross_step);

 float2 ray_cell = cell(ray.xy, hi_z_size.xy, camera);
 ray = intersect_cell_boundary(ray, v, ray_cell, hi_z_size, cross_step, cross_offset, camera);

 iterations = 0;
 while(level >= HIZ_STOP_LEVEL && iterations < MAX_ITERATIONS) {
  // get the cell number of the current ray
  float2 current_cell_count = cell_count(level);
  float2 old_cell_id = cell(ray.xy, current_cell_count, camera);

  // get the minimum depth plane in which the current ray resides
  float min_z = minimum_depth_plane(ray.xy, level, current_cell_count, camera);

  // intersect only if ray depth is below the minimum depth plane
  float3 tmp_ray = ray;
  if(v.z > 0) {
   float min_minus_ray = min_z - ray.z;
   tmp_ray = min_minus_ray > 0 ? ray + v_z*min_minus_ray : tmp_ray;
   float2 new_cell_id = cell(tmp_ray.xy, current_cell_count, camera);
   if(crossed_cell_boundary(old_cell_id, new_cell_id)) {
    tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
    level = min(HIZ_MAX_LEVEL, level + 2.0f);
    if(level == 1 && abs(min_minus_ray) > 0.0001) {
     tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
     level = 2;
  } else if(ray.z < min_z) {
   tmp_ray = intersect_cell_boundary(ray, v, old_cell_id, current_cell_count, cross_step, cross_offset, camera);
   level = min(HIZ_MAX_LEVEL, level + 2.0f);
  } =;

 return ray;

Sunday, July 16, 2017

Validating materials and lights in Stingray

Stingray 1.9 is just around the corner and with it will come our new physical lights. I wanted to write a little bit about the validation process that we went through to increase our confidence in the behaviour of our materials and lights.

Early on we were quite set on building a small controlled "light room" similar to what the Fox Engine team presented at GDC as a validation process. But while this seemed like a fantastic way to confirm the entire pipeline is giving plausible results, it felt like identifying the source of discontinuities when comparing photographs vs renders might involve a lot of guess work. So we decided to delay the validation process through a controlled light room and started thinking about comparing our results with a high quality offline renderer. Since SolidAngle joined Autodesk last year and that we had access to an Arnold license server it seemed like a good candidate. Note that the Arnold SDK is extremely easy to use and can be downloaded for free. If you don't have a license you still have access to all the features and the only limitation is that the rendered frames are watermarked.

We started writing a Stingray plugin that supported simple scene reflection into Arnold. We also implemented a custom Arnold Output Driver which allowed us to forward Arnold's linear data directly into the Stingray viewport where they would be gamma corrected and tonemapped by Stingray (minimizing as many potential sources of error).

Material parameters mapping

The trickiest part of the process was to find an Arnold material which we could use to validate. When we started this work we used Arnold 4.3 and realized early that the Arnold's Standard shader didn't map very well to the Metallic/Roughness model. We had more luck using the alSurface shader with the following mapping:

// "alSurface"
// =====================================================================================
AiNodeSetRGB(surface_shader, "diffuseColor", color.x, color.y, color.z);
AiNodeSetInt(surface_shader, "specular1FresnelMode", 0);
AiNodeSetInt(surface_shader, "specular1Distribution", 1);
AiNodeSetFlt(surface_shader, "specular1Strength", 1.0f - metallic);
AiNodeSetRGB(surface_shader, "specular1Color", white.x, white.y, white.z);
AiNodeSetFlt(surface_shader, "specular1Roughness", roughness);
AiNodeSetFlt(surface_shader, "specular1Ior", 1.5f); // ior = (n-1)^2/(n+1)^2 for 0.04
AiNodeSetRGB(surface_shader, "specular1Reflectivity", white.x, white.y, white.z);
AiNodeSetRGB(surface_shader, "specular1EdgeTint", white.x, white.y, white.z);

AiNodeSetInt(surface_shader, "specular2FresnelMode", 1);
AiNodeSetInt(surface_shader, "specular2Distribution", 1);
AiNodeSetFlt(surface_shader, "specular2Strength", metallic);
AiNodeSetRGB(surface_shader, "specular2Color", color.x, color.y, color.z);
AiNodeSetFlt(surface_shader, "specular2Roughness", roughness);
AiNodeSetRGB(surface_shader, "specular2Reflectivity", white.x, white.y, white.z);
AiNodeSetRGB(surface_shader, "specular2EdgeTint", white.x, white.y, white.z);

Stingray VS Arnold: roughness = 0, metallicness = [0, 1]

Stingray VS Arnold: metallicness = 1, roughness = [0, 1]

Halfway through the validation process Arnold 5.0 got released and with it came the new Standard Surface shader which is based on a Metalness/Roughness workflow. This allowed for a much simpler mapping:

// "aiStandardSurface"
// =====================================================================================
AiNodeSetFlt(standard_shader, "base", 1.f);
AiNodeSetRGB(standard_shader, "base_color", color.x, color.y, color.z);
AiNodeSetFlt(standard_shader, "diffuse_roughness", 0.f); // Use Lambert for diffuse

AiNodeSetFlt(standard_shader, "specular", 1.f);
AiNodeSetFlt(standard_shader, "specular_IOR", 1.5f); // ior = (n-1)^2/(n+1)^2 for 0.04
AiNodeSetRGB(standard_shader, "specular_color", 1, 1, 1);
AiNodeSetFlt(standard_shader, "specular_roughness", roughness);
AiNodeSetFlt(standard_shader, "metalness", metallic);

Investigating material differences

The first thing we noticed is an excess in reflection intensity for reflections with large incident angles. Arnold supports Light Path Expressions which made it very easy to compare and identify the term causing the differences. In this particular case we quickly identified that we had an energy conservation problem. Specifically the contribution from the Fresnel reflections was not removed from the diffuse contribution:

Scenes with a lot of smooth reflective surfaces demonstrates the impact of this issue noticeably:

Another source of differences and confusion came from the tint of the Fresnel term for metallic surfaces. Different shaders I investigaed had different behaviors. Some tinted the Fresnel term with the base color while some others didn't:

It wasn't clear to me how Fresnel's law of reflection applied to metals. I asked on Twitter what peoples thoughts were on this and got this simple and elegant claim made by Brooke Hodgman: "Metalic reflections are coloured because their Fresnel is wavelength varying, but Fresnel still goes to 1 at 90deg for every wavelength". This convinced me instantly that indeed the correct thing to do was to use an un-tinted Fresnel contribution regardless of the metallicness of the material. I later found this graph which also confirmed this:

For the Fresnel term we use a pre filtered Fresnel offset stored in a 2d lut (as proposed by Brian Karis in Real Shading in Unreal Engine 4). While results can diverge slightly from Arnold's Standard Surface Shader (see "the effect of metalness" from Zap Anderson's Physical Material Whitepaper), in most cases we get an edge tint that is pretty close:

Investigating light differences

With the brdf validated we could start looking into validating our physical lights. Stingray currently supports point, spot, and directional lights (with more to come). The main problem we discovered with our lights is that the attenuation function we use is a bit awkward. Specifically we attenuate by I/(d+1)^2 as opposed to I/d^2 (Where 'I' is the intensity of the light source and 'd' is the distance to the light source from the shaded point). The main reason behind this decision is to manage the overflow that could occur in the light accumulation buffer. Adding the +1 effectively clamps the maximum value intensity of the light as the intensity set for that light itself i.e. as 'd' approaches zero 'I' approaches the intensity set for that light (as opposed to infinity). Unfortunatly this decision also means we can't get physically correct light falloffs in a scene:

Even if we scale the intensity of the light to match the intensity for a certain distance (say 1m) we still have a different falloff curve than the physically correct attenuation. It's not too bad in a game context, but in the architectural world this is a bigger issue:

This issue will be fixed in Stingray 1.10. Using I/(d+e)^2 (where 'e' is 1/max_value along) with an EV shift up and down while writing and reading from the accumulation buffer as described by Nathan Reed is a good step forward.

Finally we were also able to validate our ies profile parser/shader and our color temperatures behaved as expected:

Results and final thoughts

Integrating a high quality offline renderer like Arnold has proven invaluable in the process of validating our lights in Stingray. A similar validation process could be applicable to many other aspects of our rendering pipeline (antialiasing, refractive materials, fur, hair, post-effects, volumetrics, ect)

I also think that it can be a very powerful tool for content creators to build intuition on the impact of indirect lighting in a particular scene. For example in a simple level like this, adding a diffuse plane dramatically changes the lighting on the Buddha statue:

The next step is now to compare our results with photographs gathered from a controlled environments. To be continued...

Monday, July 3, 2017

Physically Based Lens Flare

While playing Horizon Zero Dawn I got inspired by the lens flare they supported and decided to look into implementing some basic ones in Stingray. There were four types of flare I was particularly interested in.
  1. Anisomorphic flare
  2. Aperture diffraction (Starbursts)
  3. Camera ghosts due to Sun or Moon (High Quality - What this post will cover)
  4. Camera ghosts due to all other light sources (Low Quality - Screen Space Effect)
Image credits: Die Hard (1), Just Another Dang Blog (2), PEXELS (3), The Matrix Reloaded (4)

Once finished I'll do a follow up blog post on the Camera Lens Flare plugin, but for now I want to share the implementation details of the high-quality ghosts which are an implementation of "Physically-Based Real-Time Lens Flare Rendering".

Code and Results

All the code used to generate the images and videos of this article can can be found on


The basic idea of the "Physically-Based Lens Flare" paper is to ray trace "bundles" into a lens system which will end up on a sensor to form a ghost. A ghost here refers to the de-focused light that reaches the sensor of a camera due to the light reflecting off the lenses. Since a camera lens is not typically made of a single optical lens but many lenses there can be many ghosts that form on it's sensor. If we only consider the ghosts that are formed from two bounces, that's a total of nCr(n,2) possible ghosts combinations (where n is the number of lens components in a camera lens)

Lens Interface Description

Ok let's get into it. To trace rays in an optical system we obviously need to build an optical system first. This part can be tedious. Not only have you got to find the "Lens Prescription" you are looking for, you also need to manually parse it. For example parsing the Nikon 28-75mm patent data might look something like this:

There is no standard way of describing such systems. You may find all the information you need from a lens patent, but often (especially for older lenses) you end up staring at an old document that seems to be missing important information required for the algorithm. For example, the Russian lens MIR-1 apparently produces beautiful lens flare, but the only lens description I could find for it was this:

MIP.1B manual

Ray Tracing

Once you have parsed your lens description into something your trace algorithm can consume, you can then start to ray trace. The idea is to initialize a tessellated patch at the camera's light entry point and trace through each of the points in the direction of the incoming light. There are a couple of subtleties to note regarding the tracing algorithm.

First, when a ray misses a lens component the raytracing routine isn't necessarily stopped. Instead if the ray can continue with a path that is meaningful the ray trace continues until it reaches the sensor. Only if the ray misses the sphere formed by the radius of the lens do we break the raytracing routine. The idea behind this is to get as many traced points to reach the sensor so that the interpolated data can remain as continuous as possible. Rays track the maximum relative distance it had with a lens component while tracing through the interface. This relative distance will be used in the pixel shader later to determine if a ray had left the interface.

Relative distance visualized as green/orange gradient (black means ray missed lens component completely)

Secondly, a ray bundle carries a fixed amount of energy so it is important to consider the distortion of the bundle area that occurs while tracing them. In In the paper, the author states:
"At each vertex, we store the average value of its surrounding neighbours. The regular grid of rays, combined with the transform feedback (or the stream-out) mechanism of modern graphics hardware, makes this lookup of neighbouring quad values very easy"
I don't understand how the transform feedback, along with the available adjacency information of the geometry shader could be enough to provide the information of the four surrounding quads of a vertex (if you know please leave a comment). Luckily we now have compute and UAVs which turn this problem into a fairly trivial one. Currently I only calculate an approximation of the surrounding areas by assuming the neighbouring quads are roughly parallelograms. I estimate their bases and heights as the average lengths of their top/bottom, left/right segments. The results are seen as caustics forming on the sensor where some bundles converge into tighter area patches while some other dilates:

This works fairly well but is expensive. Something that I intend to improve in the future.

Now that we have a traced patch we need to make some sense out of it. The patch "as is" can look intimidating at first. Due to early exits of some rays the final vertices can sometimes look like something went terribly wrong. Here is a particularly distorted ghost:

The first thing to do is discard pixels that exited the lens system:
float intensity1 = max_relative_distance < 1.0f;
float intensity = intensity1;
if(intensity == 0.f) discard;

Then we can discard the rays that didn't have any energy as they entered to begin with (say outside the sun disk):
float lens_distance = length(entry_coordinates.xy);
float sun_disk = 1 - saturate((lens_distance - 1.f + fade)/fade);
sun_disk = smoothstep(0, 1, sun_disk);
float intensity2 = sun_disk;
float intensity = intensity1 * intensity2;
if(intensity == 0.f) discard;

Then we can discard the rays that we're blocked by the aperture:
float intensity3 = aperture_sample;
float intensity = intensity1 * intensity2 * intensity3;
if(intensity == 0.f) discard;

Finally we adjust the radiance of the beams based on their final areas:
float intensity4 = (original_area/(new_area + eps)) * energy;
float intensity = intensity1 * intensity2 * intensity3 * intensity4;
if(intensity == 0.f) discard;

The final value is the rgb reflectance value of the ghost modulated by the incoming light color:
float3 color = intensity * * TempToColor(INCOMING_LIGHT_TEMP);


Image credits: 6iee

The aperture shape is built procedurally. As suggested by Padraic Hennessy's blog I use a signed distance field confined by "n" segments and threshold it against some distance value. I also experimented with approximating the light diffraction that occurs at the edge of the apperture blades using a simple function:

Finally, I offset the signed distance field with a repeating sin function which can give curved aperture blades:


The starburst phenomena is due to light diffraction that passes through the small aperture hole. It's a phenomena known as the "single slit diffraction of light". The author got really convincing results to simulate this using the Fraunhofer approximation. The challenge with this approach is that it requires bringing the aperture texture into Fourier space which is not trivial. In previous projects I used Cuda's math library to perform the FFT of a signal but since the goal is to bring this into Stingray I didn't want to have such a dependency. Luckily I found this little gem posted by Joseph S. from intel. He provides a clean and elegant compute implementation of the butterfly passes method which bring a signal to and from Fourier space. Using it I can feed in the aperture shape and extract the Fourier Power Spectrum:

This spectrum needs to be filtered further in order to look like a starburst. This is where the Fraunhofer approximation comes in. The idea is to basically reconstruct the diffraction of white light by summing up the diffraction of multiple wavelengths. The key observation is that the same Fourier signal can be used for all wavelengths. The only thing needed is to scale the sampling coordinates of the Fourier power spectrum:

(x0,y0) = (u,v)·λ·z0 for λ = 350nm/435nm/525nm/700nm

Summing up the wavelengths gives the starburst image. To get more interesting results I apply an extra filtering step. I use a spiral pattern mixed with a small rotation to get rid of any left over radial ringing artifacts (judging by the author's starburst results I suspect this is a step they are also doing):

Anti Reflection Coating

While some appreciate the artistic aspect of lens flare, lens manufacturers work hard to minimize them by coating lenses with anti-reflection coatings. The coating applied to each lenses are usually designed to minimize the reflection of a specific wavelength. They are defined by their thickness and index of refraction. Given the wavelength to minimize reflections for, and the IORs of the two medium involved in the reflection (say n0 and n2), the ideal IOR (n1) and thickness (d) of the coating are defined as n1 = sqrt(n0·n2) and d=λ/4·n1. This is known as a quarter wavelength anti-reflection coating. I've found this site very helpful to understand this phenomenon.

In the current implementation each lens coating specifies a wavelength the coating should be optimized for and the ideal thickness and IOR are used by default. I added a controllable offset to thicken the AR coating layer in order to conveniently reduce it's anti-reflection properties:

No AR Coating:

Ideal AR Coating:

AR Coating with offsetted thickness:


Currently the full cost of the effect for a Nikon 28-75mm lens is 12ms (3ms to ray march 352x32x32 points and 9ms to draw the 352 patches). The performance degrades as the sun disk is made bigger since it results in more and more overshading during the rasterisation of each ghosts. With a simpler lens interface like the 1955 Angenieux the cost decreases significantly. In the current implementation every possible "two bounce ghost" is traced and drawn. For a lens system like the Nikon 28-75mm which has 27 lens components, that's n!/r!(n-r)! = 352 ghosts. It's easy to see that this number can increase dramatically with the number of component.

An obvious optimization would be to skip ghosts that have intensities so low that their contributions are imperceptible. Using Compute/DrawIndirect it would be fairly simple to first run a coarse pass and use it to cull non contributing ghosts. This would reduce the compute and rasterization pressure on the gpu dramatically. Something I intend to do in future.


I'm not sure if this approach was ever used in a game. It would probably be hard to justify it's heavy cost. I feel this would have a better use case in the context of pre-visualization where a director might be interested in having early feedback on how a certain lens might behave in a shot.

Image credits: Wallup

Image credits: Wallpapers Web

Finally, be aware the author has filled a patent for the algorithm described in his paper, which may put limits on how you may use parts of what is described in my post. Please contact the paper's author for more information on what restrictions might be in place.