Large Scale Ambient Occlusion: Introduction

Ambient occlusion is a very important part of achieving realistic lighting. This is especially true for games like Homefront: The Revolution where all the lighting is dynamic so couldn't just be precalculated and stored in lightmaps. Games typically use Screen Space Ambient Occlusion (SSAO) to provide ambient occlusion on a small scale (like darkening under the brim of someone's hat) but we wanted to go further than that. We wanted to produce ambient occlusion on a large scale such as the darkening in alley ways between buildings or making the lighting inside an interior look realistic.

In the past, to achieve similar results as these, artists would place negative lights, (they made the area darker instead of brighter) and it's a testament to the skill of the artists previous games looked as good as they did. The down side was, it was a time consuming approach and for a relatively small team doing an open world game, it was clear we needed to work smarter. Therefore, we developed our own Large Scale Ambient Occlusion (LSAO) system which we'll describe over a short series of blog posts.

This is the system we used for ambient occlusion in Homefront: The Revolution, and we'll go over how it works, how we used it and the results in this series.

Large Scale Ambient Occlusion on / off comparison

As can be seen from the examples above, LSAO had a large effect on the lighting. Interiors were realistically dark, meaning the High Dynamic Range (HDR) would do it's job and make exterior lighting look appropriately bright by contrast and thus making the lighting look a lot more vibrant. Homefront: The Revolution was one of the first games to implement real world dynamic range correctly which is part of the reason our HDR implementation for PlayStation 4 Pro and Xbox One X was so highly praised.

The Data

The LSAO data itself was made up of a number of cells. Each cell contained six values for the brightness of the ambient light coming in from six principle directions (+X, -X, +Y, -Y, +Z, -Z).

Cells were grouped into volumes of regularly spaced cells. These volumes were implemented as a pair of 3D textures (+X,-X,+Y,-Y in one, +Z and -Z in the another) which were applied as a post process step. Any pixel that was within a volume would have its ambient occlusion calculated by using its world normal to linearly interpolate between the six values of the cell it was in.

Generation

So how did we generate this data? It was clear from the start that the LSAO would have to be something we generated offline ahead of time. This went against one of CryEngine's mottos of "Real-time all the time" but we thought it was worth it for the results. Even so, we weren't used to waiting for results so we set ourselves the following goals:

  • Take as little time to process as possible.
  • The absolute worst case acceptable was if an artist checked in a change last thing of the day, LSAO must be rebuilt and in the following morning's build
  • We set ourselves the unofficial target of full "level" should take less than an hour at production quality (so could be done during a lunch hour)

The non-performance goals of LSAO generation were:

  • Artists mustn't have to do any markup to support it (remember, this is supposed to save them time)
  • Require as little hardware as possible.

Simple Approach

To calculate the ambient brightness, the incoming ambient light needs to be gathered and averaged. The obvious approach would be to fire out rays in a cone from each sample point around each axis and average the results. If the ray hits something then we should add the brightness of the surface we hit. If the ray doesn't hit something then we must be able to see the sky so add the brightness of the sky.

**

Ideally this process would be recursive. To calculate the brightness of a hit surface we need to know it's ambient brightness (which is what we're currently calculating). To simulate multiple bounces of the sky light we need to be able to light the surfaces with the current results when we hit a surface.

Our Approach

It became clear early on that we would want to run the generation on the GPU to achieve our performance goals. We also wanted a method that played to the GPU's strength. The main realisation was instead of firing multiple rays out from a single point, we could instead very efficiently fire millions of rays in a particular direction using plain old rasterisation with an orthographic projection. Rendering at even a low resolution like 1024x1024 would correspond to over a million rays.

Imagine a grid of rays

Normally when you render you only want to find the closest geometry to the camera. In this case we want to be able to "X-Ray" through and record all the geometry the ray passes through. We did this by rendering to a linked list. Every entry in the list was the depth, the brightness of the geometry (sampled from the results so far) and whether it was facing towards or away from the camera. Each pixel would have a head pointer that pointed to the first entry in the list it owned. This was all done with UAVs and atomics in a pixel shader.

Once we had the results from the rays we had to apply them to the cells. We'd render the volumes but this time for every pixel we overlapped we'd follow the linked list to find any pairs of planes (a backface hit followed by a frontface) within the volume. If a pair was found, ambient light information was added to the cells between them. Pairs of planes facing away from each other were spaces inside objects so those results were ignored. The start of the ray was an implicit back face (as if the sky was a bright physical object) and the end of the ray an implicit back face (as you'd only be able to see it from underground so should be ignored). When we added ambient light to a cell we actually got two directions for the price of one as the results for the reverse of the ray direction is just as easy to add at this stage.

We repeated the process for around 4000 directions to achieve the final result. Fun fact: The directions we used had to be well distributed so we repurposed a table of normals we used on the GameCube versions of TimeSplitters 2 and 3.

Conclusion

Homefront: The Revolution shipped with LSAO so we should be able to evaluate how closely we got to the goals we set out at the start. It certainly didn't require much hardware. At the end of the project the LSAO for the entire game was being processed to production quality by just 4 ATI R390s. As it was GPU intensive but light on CPU, these machines could also have done other build tasks taking the cost down to a very reasonable level. The goal of very little artist time was achieved with the only setup being marking areas where higher quality results were needed.

And as for the performance goals, we did achieve the first target of being able to do the whole process overnight (with a slight cheat that the LSAO data would poke itself into the completed build when it was ready so both could run in parallel). However our unofficial target of not exceeding an hour was breached for three out of the 11 levels that made up the game. The outlier was nearing 7 hours at the end of the project. Ouch! However looking at the stats this level isn't much more expensive than any other that are completing within an hour. So, what gives? We suspect we blew the memory budget of the graphics cards (8Gb) causing lots of swapping across the PCI Express bus and slowing everything to a crawl. We saw these same cliff edge performance problems earlier on in the project on GPUs with less RAM.

This is all very simplified and as always if you have one good big idea then you have to follow it up with a load of smaller ideas to make everything work. There's lots of interesting stuff that we've glossed over here and will be the subject of the next blog post in the series.

About the author. Charlie Cole is Principal Core Technology Programmer at Dambuster Studios and has participated in all of the studio's projects since TimeSplitters: Future Perfect.