Remember, this was written to accompany a Keynote presentation, but I can't post that here, so you'll have to imagine it. It looked awesome ;-)
Today I’m going to be talking about audio mixing;
what’s the purpose of mixing over and above getting the levels right, and why a
good mix is so important. I’m also
going to talk about the stages a mix engineer will go through when working in
linear media, and what lessons we can learn, and what techniques we can use from
the linear world when working with interactive material.
So, here’s what I’m going to talk about
today. Firstly, a short
introduction. Secondly, I’ll ask
“what is mixing?” What’s the
purpose of it, and what’s to be gained by good mixing practices.
Thirdly, I’d like to talk about different
approaches to mixing systems within interactive entertainment.
Fourth, I want to look at some of the tools
and techniques we can use when mixing our games.
And lastly, I want to look at monitoring
and mixing standards in our industry.
INTRODUCTION
The industry has come a long way in the
last couple of years in terms of the quality of the audio content created for
games. Big developers now spend a considerable
amount of money recording new audio content, and usually even more money on
the writing and recording of original music for their titles.
However, the audio assets that go into a game are only 50% of the complete experience. The other 50% is down to implementation
of those assets, with a good mix being a large part of that.
In my view, the mixing process is something that in the past has been overlooked. In my experience, this is usually down
to the minimal amount of time scheduled between when a game is content complete
and when the game is mastered.
WHAT IS MIXING?
OK, so let’s start with the basics. Mixing is the process
of bringing together all of your audio assets, sound effects, dialogue and
music together and making them sit together nicely so that the whole becomes a
coherent audio experience. Technically,
it’s about achieving clarity.
If you’ve got too many sounds sharing the same sonic characteristics
being played at once, the whole thing will just mush together and you won’t be
able to discern any detail.
Artistically however, mixing is about
focus.
It’s about using all of the audio material that you have put together in
your title, and modifying it in real time, in order to manipulate the person
playing the game into feeling what you want them to feel, and to make them
focus on what you feel is important. By dynamically changing the mix, we have a
massive amount of power over how the player perceives the situation they’re
in.
This definition of mixing is the same
whether you’re mixing a game, film, TV or music. However, when it comes to mixing for games specifically,
mixing processes fall into two categories.
Active and Passive mixing
Active Mixing is where event triggers that come directly from within the game itself
change the audio mix. An example
of this may be the recalling of a set of volumes for a group of sounds, a
snapshot, triggered by an in-game event, or for example the tinnitus effect
when grenade goes off close to you, in a first person shooter, where all the
sound is filtered, except the ringing in your ears effect.
The other category is Passive Mixing. This is a bit more
subtle, and is more akin to the way that you would configure a music or film
mix in ProTools or Nuendo. Passive
mixing is what I would describe as the configuration of dynamics processors,
and how they interact with each other.
As an example of passive mixing, here’s probably
the simplest setup possible.
Here we have a simple routing diagram, detailing how the dialogue and the
music interact with each other.
We have a compressor on the music, with a
side chain input coming from the dialogue. So, although the compressor is on the music track, it’s not
actually listening to the music, it’s listening to the dialogue.
The louder the dialogue, the more the
compressor will attenuate the volume of the music track.
As I’ve said, this is a very simple
setup. I’ll go into a bit more
detail on routing a bit later.
The key difference between active and passive mixing is that
with active mixing the mix changes are triggered by events in the game. The system isn’t actually aware what is
coming out of the speakers, only that a certain event has been triggered.
With passive mixing the mix system is
actually listening to the audio signals themselves and then adjusting levels
depending on actual volume levels of the channels or sub-groups. As I said, this is more akin to how you
would set up the routing and audio processors on a mixing desk.
So, in my view, a perfect setup would be a
combination of both active and passive systems.
Different approaches to mixing
In videogames, developers use a wide
variety of different mixing techniques, depending on what technology they have
available to them. I’d like to
show you a couple of these different approaches.
Snapshots
The first of these I want to touch on is
the snapshot mixing system. This
is probably the easiest to implement, and because of that, the most widely used
technique. It’s something that’s
been used for years on mixing consoles used in film and music production, and
now the technology has trickled down, it’s found on a lot of live music mixing
consoles.
It enables the sound designer or mixer to
take a snapshot of the volumes of all the different channels at any particular
time, and then recall them as and when they’re needed. In live music, for example, an engineer
might have one snapshot for the support act, and then switch to another for the
main act, or they may want a different mix for each song.
Similarly, in games, we may want a snapshot
to be recalled when we reach a certain stage or location in the game. In order to heighten the tension we may
choose to reduce the ambience and music in a certain location and push up the
foley, footsteps or the protagonist’s breathing, for example.
If we do this over a slow enough period of
time, 9 times out of 10 the player will not notice this change has taken
place. This is a very powerful
hook we can use to get to the player.
Certain subtle changes in the mix over time, triggered by snapshots can
be used to make the player, either consciously or subconsciously, focus on
whatever the designers feel it is that they should be focusing on.
Or, we may want to recall a snapshot when the
pause button is pressed so that all the in-game sounds are muted, and then
recall another snapshot when the player continues in the game. This, by the way, has the added
advantage that if you use a mix snapshot when pausing a game, you don’t have to
individually stop every single sound, and then restart each and every sound
when the player continues. You
just recall a snapshot in which the in-game sounds carry on in the background,
but have their volume reduced to zero.
In Motorstorm, the team deal with snapshots
slightly differently. They use
snapshots within a much smaller timeframe, changing the mix for specific
events. They have a default mix
that’s set up initially, and then game events trigger changes in the volumes of
groups of sounds for very short events such as car impacts, that last for the
length of the impact, and then revert back to the default mix.
As I’ve said, snapshots are the easiest and
most widely used run-time mixing technique used these days. However, in the last couple of years,
other ways of doing things have started to appear.
HDR Audio
Certain developers, such as DICE, Splash Damage and our own Guerrilla Studios have started to reduce the problem of huge amounts of sounds being triggered at once using what's been called High Dynamic Range, or ‘HDR Audio’.
The way it works is this.
A window is defined between a range of dB
values to cover the dynamic range for a given system. The size of that window is governed by the type of sound
system the user is using, whether it be headphones, TV or home cinema type
system.
Templates are created which contain the
samples as well as control data, including real-world loudness data for each
sound.
If a really loud sound is triggered, the
window will jump up so that that sound is contained at the top of the window,
and any other sounds that are triggered that fall below the lower limit of the
window are discarded. As the loud
sound dies away, the window, whose size overall doesn’t change, will move down
again. And so what you have is
this range of a constant size, moving up and down the loudness range, letting
sounds pass through if they’re within the range, and blocking other sounds if
they’re not.
In some cases, this system can deal with about
80% of the mix changes within the game automatically, and then snapshots changes
are used for special cases.
Self-aware systems
Up until recently, most mix systems were
event based. Send the system an
event, and it would then change the overall mix in some way that you’ve
specified in advance.
However, with the increase in computational
power available to designers and programmers working on current generation
machines, it’s now possible for us to go one stage further and have systems
that are aware of what they’re actually outputting, and to make passive mix
decisions accordingly.
If you can store spectral information as metadata
about each audio file in your game, and you know how loud each sound is being
played and where it is in the 3D world, the system can know exactly what is
coming out of each speaker at any moment.
By laying down a set of rules beforehand,
you can increase or decrease the volume of any sound or set of sounds, or, more
crucially, increase or decrease certain spectral components of any sound or set
of sounds on the fly, to leave space in the overall mix for the stuff you want
to cut through. Basically we’re
talking on the fly EQ’ing automated by a set of rules you specify beforehand.
Obviously it’s still early days, but I know
of at least a couple of developers, ourselves included, that are doing work on
this type of tech at the moment.
TOOLS AND TECHNIQUES
So leaving aside these systems, I want to
focus now on some of the more basic tools and techniques that can be used for
mixing game audio.
In order to do the job effectively and give us as much control as possible, we need to arm ourselves
with the right tools. In this
section, I’m going to give you a brief overview of the sorts of tools and
technologies that are useful to you when mixing.
I’m going to talk about the importance of
setting priority values for different set of sounds, so if there’s too many
sounds playing you can intelligently drop sounds that the player doesn’t need
to hear, or won’t notice if they stop playing.
Next, how to organise the different types
of sounds into sub-groups so you can manipulate large amounts of sound
simultaneously.
I’ve already spoken about the ‘whys’ of
mixer snapshots. I’d like to speak
a little about the ‘hows’.
And lastly, I’ll talk about dynamics
processing, and how to use them effectively. And by dynamics processors, I mean compressors and limiters.
Prioritising Sound Effects
I worked on a PS3 project a couple of years
ago, and the audio engine we used had 40 channels that could be used at
once. If you have 40 channels
already being used and another sound is called, it will steal a voice from
something else, and something, somewhere will stop playing. If that sound is a looped ambience that
gets stolen, it will stop, and it won’t be started again, which could destroy
the atmosphere of a game.
Now 40 sounds may sound like a lot, but
when you take into consideration that the ambiences in Heavenly Sword took up a
minimum of say 10 channels, the weapons took up between 10 and 15, footsteps
could take up to 10 channels, depending on how many characters you had close to
you, you soon realise that 40 channels doesn’t really go a long way.
So, in order to make sure that important
sounds are always played and unimportant sounds are stopped when the number of
channels available is low, each sound effect needs to be given a priority
value.
A sound with a low priority value should
never steal a voice from a sound with a higher priority value. Similarly, if a sound effect is
triggered with a high priority value and all the channels are being used, the
system should stop playing the sound with the lowest priority value.
This way you can always be sure that the
sounds that are crucial to the gameplay experience such as critical dialogue or
the players weapon sounds will always be played, and sounds such as footsteps
which, if there’s lots going on and the player would never notice if they were
missing, give way to more important sounds.
It also helps, in the case of a game in a
3D environment, to weight these priorities depending the intrinsic loudness of
a particular event, and on it’s distance from the camera. Even if there’s a lot going on, you
still may want to hear really close footsteps for example, or you may want to
ignore really distant explosions.
Sub-Grouping Your Sounds
Grouping your sounds is very important, as it enables you to manipulate the levels of large groups of sounds
together, instead of having to modify the volumes of large amounts of single
channels simultaneously. It’s a
lot easier to put 30 different sounds on one mix group and then modify the
volume for just that one group than it is to modify those 30 sounds
individually.
How you group your sounds depends on the
sort of game you’re making, but you also have to take into consideration how
you want to manipulate the mix in real time to achieve the right effect from an
artistic standpoint, and then group your sounds accordingly.
When mixing a game with a large amount of
sounds, or lots going on at once, it’s always easier to pre-mix your sounds
into groups first. On the majority
of the cutscenes I’ve worked on I’ve had upwards of 100 different channels in
total.
Say you have 15 different channels that
contain all the elements of the game ambience. If you pre-mix all the ambience elements first, so they’re
all at the right levels in relation to each other, then put them all into one group,
you only then need to worry about the volume of that one group during the final
mix.
On the final mix of a game, or a cutscene,
I tend to pre-mix everything so that I end up with sometimes just 5 faders for
the whole game. They might be:
·
Dialogue
·
Ambience
·
Sound Effects
·
Music
·
UI
The other advantage that grouping sounds
together gives you is that you can use dynamics processing on whole groups,
instead of individual sounds. I’ll
explain the benefits and give you some examples of this later in the talk.
More About Snapshots
I’ve already spoken a bit about mixer
snapshots.
The main requirement of a decent mixer
should be that levels can be adjusted in real-time, without having to restart
the game. If you have to restart
your game every time you make the tiniest change, you’ll be there forever. On a recent project, we didn’t have
this functionality and in order to hear mix changes I had to change the volume
of each sound individually, rebuild the data and start the game again. This means it took approximately 10
minutes to change the volume of a sound and then hear that change in the game.
Dynamics Processing
When I talk of dynamics processors, I’m
talking about compressors and limiters.
I think it was a Pirelli advert that the
tag-line was ‘Power is nothing without control’. Well, the same applies to the tools we use to make our
games.
Compressors and limiters are all about
control, and if you’ve got literally hundreds of sounds potentially all being
triggered at the same time, then without control, things can easily get out of
hand.
One way of keeping things under control is
to put dynamics processors on subgroups that have the potential to get out of
hand, groups that are likely to have lots of big transients on them, such as
weapons, explosions, or in the case of a driving game for example, car impacts.
Putting compressors and limiters on all of
your sub-groups and setting them up correctly, will help you to maintain control and give you more clarity
when there’s a lot of sounds being played at once.
Routing Example
Here’s a diagram of a routing
configuration for a hypothetical game, indicating
how I would set up the subgroups, and where I would use dynamics
processors.
The red lines indicate side-chains from
channels or groups that are fed to what’s called key inputs of compressors on
another channels or groups. This
is one example of passive mixing I mentioned earlier.
When using key inputs, the compressor isn’t
actually listening to the audio on the thing that it is actually on. It’s reducing the gain on the channel
depending on what another channel is doing.
There’s a couple of points of interest
here.
Firstly, the dialogue has been split up
into critical and non-critical dialogue.
Critical dialogue is dialogue the player absolutely must hear. Non-critical dialogue is throwaway
stuff that merely adds to the mood and sets the scene. In this example, the critical dialogue
is sent to the key inputs of compressors on the sound effects, music and
non-critical dialogue. If critical
dialogue is triggered and there’s a lot happening on the other channels the
compressors on the other channels will reduce the gain of them so that the
critical dialogue comes through.
The other example here is the bullet-by and
NPC weapons. The player's weapon
and the bullet-bys are sent to the key input of the compressors on the NPC
weapons so if the player fires their gun, or the bullet-bys are triggered
loudly, they will automatically reduce the volume of the NPC weapons.
MONITORING
I’d like to say a quick word about
monitoring your game. Regardless of the type of game you’re making, it’s
preferable to mix your game in a critical listening environment. By a critical listening environment, I
mean a room that is acoustically accurate, with a monitoring system that has
been properly calibrated.
I know that not all developers have the luxury of custom built recording studios or an acoustically accurate room, but there
are plenty of commercial recording studios that are dying to get into game
audio. With the music industry in
a bit of a mess, a lot of the major recording studios in London are now
actively courting game developers, trying to bring in new business.
In my view it’s a very worthwhile
investment book a couple of days in a studio to play through the game on a
monitoring system that is different from what you’re used to working on.
One of the most common mistakes is that when listening to a game on a consumer surround setup, the
chances of that system being set up and calibrated properly are minimal.
Therefore, listening to your game in a
properly calibrated room on a properly calibrated monitoring system will show
up any problems with your mix that may not be obvious on your own setup. If it sounds good on an accurate
system, it’ll sound good anywhere.
STANDARDS
The film and TV industries have had audio
standards for years. The audio
systems in most cinemas are setup in a certain way, and film and TV mixers know
exactly how their audience will be listening to their content.
However, in the world of video games, no
standards currently exist, and to be honest, it really shows in the wide
ranging differences between the sound from one game to another.
Generally, games have been excessively loud
in the past. Do any of you
remember the startup sound on the Playstation 2? That sound was played on the machine at full volume, as loud
as it could be played. So, when
you switched your machine on, you would set the volume on your TV based on that
sound.
That meant that most games would need to
make all of their audio really loud in order to match the volume of that
initial startup sound.
Now, a really loud game, means very little
dynamic range. No light or shade
at all. I remember producers
telling me that they wanted ‘everything louder than everything else!”
However, we as an industry are starting to
rectify this, and standards are beginning to emerge*. Most of the people involved are talking of a dialogue level
standard of between 18-22dB RMS.
When mixing, we at Sony are adopting the
DVD standard reference level for mixing at 79dB. DVDs are obviously tailored for viewing in the home, on
consumer level equipment, and so it makes sense for us to follow that lead.
* The section on standards was written before the advent of BS1770.
This article has beautifully explained a complex subject such as Mixing...A must read especially for students like me...Thanks a lot !!
ReplyDeleteExccellent read, very well laid out. I will be keeping a close eye on your posts. Much appreciated!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI like all details that you provide in your articles.
ReplyDeleteuseful information
amazing article thank you!
ReplyDeleteI hope you will share such type of impressive contents again with us so that we can utilize it and get more advantage.clash of clans triche astuce
ReplyDeleteGood blog along with the excellent quality stuff and I’m sure this will be greatly helpful.ffxiv gil
ReplyDeleteTo make us read a well curated article is written on online mixing and mastering important of real time. In order to get better information and guidance can be taken from this blog specifically. It plays a vital role in taking us through. It can be really great for people like me who are looking for grabbing more knowledge about.
ReplyDeleteHey nice blog,Thanks for this helpful information come back again for more interesting information. Keep it up!
ReplyDeleteAudio Post Mixing
Excellent Information, very well laid out. I will be keeping a close eye on your posts. Much appreciated!
ReplyDelete5.1 Mixing In Los Angeles
https://transportedaudio.com/studio/