Sound in The Banner Saga
Greetings! As you guys know, we do a monthly progress report to keep you up to date on how progress is coming on the game. We'll have the usual progress report coming just next week with lots of progress happening this month on the single player Saga.
Today we have an excellent update from Kpow Audio, who have been doing an astounding job with the sound and audio implementation on the game, which you can already get a good dose of in the Factions beta.
Kpow Audio is Michael Theiler and Peret von Sturmer, working remotely from Sydney, Australia, and what you hear from them are thanks to your support, without which we'd be doing our own foley, and that would not have been pretty. I very highly recommend checking this out even if you don't know much about sound in games. The depth in each branch of game development can be pretty fascinating stuff.
I'll leave it to Michael from Kpow to explain:
Situating an Ambience
When creating ambiences for games (this applies equally to film), I am striving to make them blend into the background, and not mask any important in game sounds. For most ambiences, these are the most important qualities that I am attempting to resolve.
In order to achieve this, I need to firstly focus on the repetition and timing between audio occurrences in the sounds. This means spacing sounds, and adding and removing sound occurrences in my audio sequence. I then work on the frequencies in the sounds, using equalization to mold them into the right sound. Finally, I work on their sound propagation, and the sound of the space in which they are to inhabit. These are the steps necessary to mould sound into something suitable for the space. Just adding reverb is not enough - the sound needs to be purpose built for the space’s reverberation and delay treatment.
The first task I need to do to ensure the ambience retreats into the background is to select the correct sounds. The more particular you are about the sounds you choose, the better results you will get. Don’t settle for an slightly inappropriate sound if you know its going to be a lot of work to massage the audio to make it sound right for the space. Often I will find nice long stereo files that contain approximately the right sounds, but they always need some work to be made to fit the particular space I am attempting to create. Usually they will need to be edited, removing anything that pops out and distracts you from the space and time of the location. I say time because often with an ambience the frequency of occurrence of particular sounds is something that needs to be considered. If there is too much happening, the space feels cluttered and busy. Even if you are depicting a busy location such as an outdoor market, or a busy mall, too frequent a bunch of sounds together and you have a mess. This kind of cacophony can be used as an effect, but in games you don’t have control of the player’s orchestration of the world you are creating. Therefore care must be taken to design the sounds in a pleasing, but apparently random manner. These same sensibilities are used when designing a more molecular, procedural ambience - tuning time between ambient audio events is what makes these spaces feel ‘right’. It is something that I learned after doing this for a long while, less is more often than not, more. Keep most things subtle, and let the occasional sound pop out only if it sounds perfect to do so.
Removing anything that pops out is a balancing act. A space is determined by how the sound and its propagation ‘sits’ in the space. We manipulate this with delays and reverberation. If nothing pops out at all, the reverb doesn’t have the material with which to bloom, and therefore describe the space. So I am not trying to get rid of every descriptive sound, I am trying to ensure every sound is right. It sounds the right distance away for the space I am trying to describe, it sounds at the right level (usually low, but not always), it occurs infrequently or frequently enough to be believable.
The next consideration when building an ambience is the frequencies, their relationship, any build-ups of particular frequencies, and the overall mix (which actually comes last). As I mentioned before, I am trying to ensure the ambience sits behind any important close sounds. Every audio building block needs to be eq’d so it plays a background role. Usually the frequencies I am concerned with are the middle frequencies, from around 350Hz to around 2kHz. This varies of course, and I will have chosen audio already that doesn’t contain important information that is loud and overbearing in these frequencies. Also, I am talking by degrees. Everything has these frequencies in them to an extent, I just ensure they are not overwhelming or distracting in any way. After a multitrack of up to 30 channels has been eq’d, usually if an individual channel is solo’d, its surprising how much frequency content has been removed. The overall feel of the ambience needs to be fairly light. If there is too much bass it means when in game you get used to the bass, you start to ignore it, then a bass-heavy weapon or impact or vehicle sound loses its impact as the player has learned to become accustomed to these frequencies, so the contrast is not there. So keeping things well tamed is important.
The final element, often the most time consuming, and probably the most important, is getting the space right for these sounds. They usually need to be pushed back. They often need to be diffuse, and the distance and diffusion usually needs to sound a little more exaggerated than they would in real life in order to create some contrast and depth to your sound design. I like to think of these ambiences as layers. They have depth also - the sounds in them are not from the same distance or perspective - they are from over there, behind that, a block away and behind that hill.
In order to give perspective to these sounds, my setup usually starts at this first example, and morphs from there. I am going to take you through a couple of different mix setups. I use ProTools, so will be describing these techniques using the much maligned behemoth, but these techniques apply equally well to other DAWs.
Spacialisation Setup Number One
The first setup is one I used many times on the ambiences for LA Noire. We didn’t have a procedural system in place for LA Noire, instead we had tracks between 1 min 30 seconds, and three minutes long, that looped, providing the ambience for different locations. There were over 100 different ambience tracks for exteriors and interiors, which faded in and out based on locations.
Once I had chosen the sounds and eq’d them to feel right, I would set up my delay auxiliary channel, and my reverb auxiliary channel. Both the delay and the reverb channel would be set to pre-fader. This way, I can control the apparent distance away that the sound appears, by reducing the amount of direct sound on the sound’s channel. So the signal you hear might be half direct, but all reverb, and some delay, creating a diffuse distant sound.
Automating the send level of the direct to the reverb and to the delay, and automating the reduction of the direct level of the individual tracks leads to a huge amount of variation in perceived distance from the space that the sound has propagated within. It lets me have a version of a sound get louder by having a good amount of direct sound, a little less reverb, and a little delay, therefore sounding closer, but then blooming within the space. Reducing the direct sound and keeping the reverb send up gives the sound a diffuse, more distant quality. Tuning the delay lets you describe the slap off of building, or hillsides, or distant mountains or canyons, or an alley way. This slap is something that quickly clues your ears in to the space around the sound. It may seem a little more crude than a reverb, but it can sound very beautiful, can be instantly evocative, and I believe is a very important ingredient in describing a space.
The relationship between the direct sound, reverberation and the delay begins to build a picture of the space the sounds are occurring in. But if we stopped here, we would not have all the depth that can be achieved with the next step. I always, and I mean always, set up an equalizer after my reverbs and delays, on the same aux channel. This is hugely important, and is a technique that I have found improves mixes in any circumstance. Once its in, I usually start by carving out some highs and lows, usually shelves, but sometimes, if I need to be a little more brutal, I will use a low pass or high pass filter. With ambiences, I am trying to reduce their weight. I need the frequencies that give the ambiences their heavy, suffocating qualities, for other more important sounds, so these areas need to be tamed. The high frequencies also need to go. Often it can feel like the sparkle is leaving your mix, but very high frequency reverbs are not realistic, and often sound bad after any compression. I tend to avoid this sound in my reverbs unless they are for a specific effect. As I do this, I am often fairly brutal at first, and I am listening to the sounds behind the reverb become unveiled. This is where I create depth. The separation between the reverb and the direct sound, as if they are not in the same plane, but a related parallel plane. This is what gives the mix space. It lets you hear the reverb as support, and lets the main sounds function better.
Once the reverbs are done, I do exactly the same thing to the delay. Often with the delay I am a little more extreme, but I also find here is where I can give the description of space even more value. For example, the fast delay in a wood room sounds different to a glass room. Obvious I know, but here is where I can quickly simulate that sound by eq’ing the delay. For a forest, its going to be pretty dead in the high frequencies - I might dampen everything above 1.2kHz. A quarry is going to be a bit more live, but still not have the higher frequencies very present.
And that’s basically it. I am sending all the tracks to a master bus, which has an overall eq that I might take out a little more bass, and a limiter on it that usually never gets hit, and I record out. Then its a case of bringing that stereo file back in, making it loop seamlessly, and I am done.
Spacialisation Setup Number Two
For The Banner Saga, we are doing a more procedural ambience. I approached this a little differently. There are many different spaces in this game, but one of the more important ones is the opening view of the City of Strand. It is a distant view of a small city. I wanted to create the feeling that the city is alive and populated, audibly so from this distance. As it is a seaside city, I used seagulls as my close perspective sounds, and added wind and sea for the mid-distance. I then used shouting and the sound of anvils being struck for the distant city sounds. This worked out well, but getting the distance on these sounds correct took some time.
To get them sounding right, once I had picked out the sounds, I arranged variations of them sequentially in ProTools. This is procedurally generated content, so I needed lots of individual sounds with which to randomise their playback using FMOD Designer, in order to create the ambience. Once arranged, I setup a convolution reverb on their channel, set to something that approximated the slap and diffusion of being in the narrow streets of a stone and wood city I then sent varying amounts of this to my usual reverb auxiliary channel, but this time I had two delays setup to provide the slaps.
These delays were to give a sense of bouncing off the surrounding mountains. There was a medium delay time of around 300ms, and a longer one around 600ms. Once setup the length of delay meant you could hear that these were facsimiles of the sound repeated exactly (despite some eq), so I needed to add another reverb after the delays (on each delays channel), to wash out the delays a little. This started to sound right, but was still a little heavy sounding for the distance, so I went a little harder on the eq’s, with my final settings including some shelving of -16dB at 130Hz and below, and a low pass at 18kHz. I then also have some cutting at 990 Hz of around 10dB and also at 6.5 kHz of around 7dB.
The eq’s on both delays are similar but have some differences - for example the eq on delay 2 has some boosting in the high frequencies as shown in the second eq image.
The delay aux channels levels are also both set fairly low, around -18 to -19, so they are only contributing their character subtly to the mix - it’s not a huge effect, despite being necessary for the sound
The main reverb is a convolution reverb, with an Impulse Response that mimics the sound of a block party, it has some diffusion, as well as some hard surfaces slapping.
The eq after this just tames some bass and cuts some highs, with some lower mids also scooped. This created the sound I was after. Its a fairly diffuse distant yelling and clanking, with a diffuse echo that sounds like it is carried off by the wind. Its a more high frequency echo, so simulates the effect of bass reduction over distance, and the whole effect, although subtle in its execution, is slightly exaggerated and stylised compared to how the sounds would be in real life.
You can hear how it eventually came out here: http://youtu.be/k9JAepiBpLs
And that’s it. The setups I’ve explained above work in almost all cases, and offer a huge amount of controlled tuning to create most spaces you will need.
The only thing left is to go through the specific tools used, though as long as you have a good reverb to start with, most DAWs come with everything else you need.
Equality by DMG Audio
This eq is very transparent. It does not provide any colouration, but it is crystal clear, dependable, and very easy use. The display does a great job of showing what it is doing, and it provides a built in frequency analyser which is often handy.
Reverberate by Liquid Sonics
Until Altiverb is released for PC’s, this is the convolution reverb I have settled on. It is infinitely tweakable, which can be a time sink, but it is usually easy to get the sound you want without too much tweaking. I have boosted its functionality by purchasing some IR’s that are great for Post Production. I believe the ‘block party’ preset I used above was from ‘kinetic sound prism’.
Valhalla Room by Valhalla DSP
This is a beautiful sheeny reverb great for User Interface sounds, but can be used for real spaces if you show some restraint. Its also great for music. I used this to smear the long delays on the distance sounds for the city screen in The Banner Saga.
L2007 Limiter by Massey Plugins
All Massey’s plugins are well priced and extremely useful. I love this limiter for its sound, very transparent. Its fast to setup, and always does what I expect.
Delays are usually just the standard ProTools delays. In the above instances I used the Extra Long Delay II.
An important consideration for me is how quickly the tools I use can get me to the sound I want or hear in my head. The above plugins don’t have a particularly coloured sound, so may not be great for anything that requires a lot of character, and they don’t have a lot of bells and whistles, but for my needs they are great in that I can be efficiently setup, and provide the sound I want, fast. I have created templates that have my sessions set up with all the channels, auxiliaries and sends outlined above, ready to go. This lets me sit down and create, rather than jumping constantly from left to right brain states.