Paul White explains some of the principles behind 3D sound perception, and shows how you can apply some of these to your mixes using nothing but simple effects and a little EQ.
Despite the promise of Dolby Surround CDs, most music still has to be mixed to work in good old‑fashioned stereo, and no matter what trickery we employ, the sound ultimately emerges from just two point sources — the stereo speakers. Recently, several devices have come onto the market, all designed to create the impression of three‑dimensional sound from a standard two‑speaker stereo system, the most well‑known of which is probably Roland's RSS (Roland Sound Space) processor. Such devices are only partially successful in emulating the spatial characteristics of true quadraphonic sound, and in any event, they're far too costly for the project studio owner to consider buying (though they can be rented). Nevertheless, such systems have at least promoted an awareness that there's more to stereo recordings than simple panning. The idea behind this article is to examine some of the auditory clues that affect the way we perceive sound, and to look at ways these may be approximated in the studio using conventional equalisers and effects units. Some truly dramatic effects may be achived with signal processing, though I don't think anybody has really got close to making a pair of loudspeakers produce the same sense of natural space as real life. As the famous Scottish poet once so nearly said, "The best laid pans of mics and men...".
Our eyes give us stereoscopic vision by presenting two simultaneous viewpoints of the world that are almost, but not quite, the same. By processing the slight differences between the images, our brains can gain some clues as to the distance of objects. Our ears work on a similar principle — from the slightly different signals coming in from both ears, information can be extracted relating to the direction of the sound source, and, in some cases, the distance to the sound source. An analysis of the reflected or reverberant sound also helps us learn something about the immediate environment, even in total darkness.
Only if a sound is either directly in front or directly behind us (and if there is no significant level of reflected sound) do both ears receive exactly the same signal. Move the sound source to one side, and it is obvious that the sound must arrive at one ear before the other; the speed of sound is roughly 1100 feet per second, which works out at a little over 1 foot per millisecond. This time difference (or inter‑aural delay for those who like jargon) is used by the brain to determine direction — but this isn't the only difference between the signals arriving at the two ears.
Take the case of a sound source positioned to the right of the listener in an otherwise empty space; the sound arriving at the right ear will be unobstructed, while the sound arriving at the left ear will be masked or shadowed by the head itself. This reduces the level of the sound reaching the left ear, but because of the physical size of the head compared to the wavelengths of audible sound, not all frequencies are reduced in level by the same amount. In fact, low frequencies are relatively unaffected, because their wavelength is significantly greater than the diameter of the head, but the higher the frequency, the greater the masking effect.
This gives the brain three sets of parameters to juggle with in order to compute direction — time delay, level difference, and a form of low‑pass filtering. This is clearly far more complex than the level‑only difference we get when shifting a mono signal from one side of a mix to the other using a pan pot. Even though a pan pot is quite successful in making a sound appear to come from one side or the other, it comes over as rather two‑dimensional, while real‑life sounds exist in three dimensions.
There's one major aspect of the human hearing system I haven't covered yet; how you tell whether a sound is in front or behind you. If a sound is directly in front or directly behind, the signals arrive at both ears at exactly the same time — there's no delay. If a sound originates directly over the head, the same thing applies. A clue to how we can discriminate between front and rear is provided by the fact that our ears are recessed inside fleshy funnels, and it seems reasonable to assume that when a sound is coming from behind, the outer ear actually masks the sound in some way, changing its perceived frequency characteristics. Measurements confirm that this is indeed the case, so it may be possible to simulate the difference between in‑front and behind sounds using some form of EQ.
Roland's RSS 3D sound system was apparently researched with the aid of a dummy head, complete with synthetic outer ears, to monitor and record sounds originating at different positions. The relative left/right level changes, time delays and filtering effects were all analysed and then duplicated, using DSP‑controlled filters, delays and level shifters. One problem with this approach is crosstalk; when we listen on speakers, the right ear hears some of the output from the left speaker and vice versa, so the only way to achieve sufficient separation is to use headphones. This clearly isn't acceptable for a commercial system, so Roland went a stage further and generated a crosstalk cancelling signal. Put another way, what you shouldn't be hearing in the left ear (from the right speaker) is synthesized and then reversed in phase before being added to the normal left signal, and vice versa for the right. On paper, this would suggest that if you're sitting at exactly the right position between the speakers, you'll hear the same signal as you'd have heard on headphones without the cancelling signal.
However, in the real world, people don't sit in exactly the right place, room reflections interfere with the direct sound from the speakers, and different people respond in different ways. The outcome is that systems working on this principle seem to work well when sounds are being panned in 3D, but are less convincing when stationary sounds are panned to a position behind the listener. This would suggest that there's still a lot about the human hearing mechanism that isn't understood, and we don't yet know enough to fool it reliably. Another potential problem is that any system that works by adding delays or manipulating phase is likely to sound quite different when the recording is heard in mono, and as long as mono TV sound and mono radio receivers are in common usage, mono‑compatibility is a major cause for concern.
So far I seem to have painted a depressing picture — if even systems costing tens of thousands of pounds don't work as well as we'd like, what hope is there for the rest of us? As it turns out, there's quite a lot you can do to enhance a stereo recording, even if you can't actually move sounds behind the listener. Take conventional panning for instance — how can we improve on that? We know that in real life, there's a difference in level and also a delay in the arrival times of the sound at the two ears for all off‑axis sources, and knowing that the speed of sound is around 1 foot per millisecond, delaying the quieter side by about one millisecond should sound more convincing than simply panning hard to one side or the other. You can try this using just a simple digital delay unit, and the setup is shown in Figure 1. After that, you can decide whether the result was worth the extra effort.
Put simply, for something to sound up‑front in a mix, something else needs to sound further away, and one of the biggest mistakes people make at the mixing stage is to try to make each sound as big and up‑front as possible.
If you think this produces a more useful way of panning some sounds, you could try to emulate another of nature's parameters, head masking, by rolling off a little of the top end from the quieter side signal, and adding a little top end to the signal at the louder side. The delayed signal only needs to be 3dB or so lower in level than the undelayed signal; don't pan the sound hard to one side or you'll lose the delayed signal altogether. Also, experiment with the delay time, because this can affect the apparent source of your 'phantom' sound. Processing the sound in this way will inevitably affect its mono compatibility, so check your mix frequency using the 'mono' button to ensure that the mono result is still acceptable.
If varying the delay time changes the apparent position of the sound, then it doesn't take a great mental leap to consider the effects of modulating the delay. A slow, shallow vibrato should create a noticeable sense of movement, which can be used as an alternative to the more usual stereo chorus. More obvious versions of this effect can be very effective in adding a bit of life to synth strings and pad keyboards. In real life, we are hearing slowly‑modulated delays all the time as we move towards or away from reflective surfaces such as walls. In this scenario, the time delay between the original sound source and its first reflection from the wall depends on the position of the listener relative to both the sound source and the reflective surface. If any of these is moving, modulation will take place.
So far, we've looked at ways of enhancing the sense of left/right space within a mix, by combining delay and EQ with conventional panning. However, we haven't examined the role of reflected or reverberant sound, and in a real‑life situation, there's nearly always a significant amount of this.
While the original sound source may be a single point source that the ear can localise, the reflected sound will emanate from all reflective surfaces, and in a typical room or hall, there will be little difference in level or spectral content between the reverb arriving at the left ear and that arriving at the right. Digital reverb units simulate this by taking a mono input and creating a pseudo‑stereo output, where both outputs are similar in character but where the individual short delays making up the reverb have different timings between the two outputs. Indeed, so convincing is artificial reverb that it can go a long way towards covering the shortcomings of simple panning when it comes to generating a real sense of stereo width.
Perhaps more interesting is the concept of creating a sense of front‑to‑back perspective. The positioning of stereo speakers makes it very easy when it comes to moving a sound from the left‑hand side of the room to the other, but it gives us no real help when we want to move a sound forwards or backwards. To do this, we have to simulate the audio cues that exist in real life, and in the absence of any more sophisticated processing equipment, we have to rely mainly on reverb, level and EQ.
Starting with the most obvious, we can use level; if we hear a familiar sound, we know how loud it's supposed to be, so if it sounds quieter, we assume it's further away. To demonstrate this, record a whisper onto tape and play it back at normal speech level — it'll sound very close indeed! But level on its own isn't particularly convincing. In real life, the air absorbs high frequencies more readily than it does low frequencies, and also delays high frequencies slightly, as the speed of sound is frequency dependent. It follows that the further away a sound is, the more high‑frequency detail is lost, and the more the remaining high frequencies are delayed relative to the low frequencies. We can approximate these effects using simple hi‑cut EQ, as most equaliser circuits affect both signal level and phase. Conversely, we can make a sound seem nearer by giving a slight emphasis to high frequencies, and this is a factor exploited in harmonic enhancers, which tend to produce a very 'in‑your‑face' sound. It is often also the case that any reverberation accompanying a distant sound will have less of a stereo spread than that produced by a nearby sound, so if you want to place a sound at a distance, try panning the reverb to mono — and don't forget to roll some top off the reverb too.
A further factor is that as you get very close to a sound source, you'll hear more of the direct sound and less reverberation, so having your lead vocal awash with reverb isn't going to help it stay in front. If you need to use a strong lead vocal reverb, try using between 50 and 100mS of pre‑delay to separate the vocal from the reverb.
In a mix where you may want to use several reverb settings, consider dedicating one unit to distant sounds. You can do this by picking a patch with plenty of high‑frequency damping, using it in mono, and panning it to the same position in the mix as the sound being processed.
Digital delays can always be used to emulate the way discrete echoes behave in real life; as with any other distant sound, there will be a degree of high‑frequency roll‑off, and as each subsequent reflection has further to travel, you'd expect each reflection to sound duller than the previous one. This progressive dulling happened as a natural side‑effect of tape echo units, but when working digitally, it has to be set up deliberately. Many digital effects units now provide the facility for imitating tape‑delay, but if yours doesn't have it, you can still set up the effect using your mixer as shown in Figure 2. Here the effects unit is set up to produce a single delay, which is fed into the mixer via an input channel rather than via an effects return. If the digital delay is being fed from (say) Aux 1, you can then use the Aux 1 control on the delay unit's return channel to loop some of the delayed sound back to the input, creating repeat echoes. Using the channel EQ to roll off a little top will result in successive echoes sounding less bright, which is exactly the effect we're after. Note that if Aux 1 is a post‑fade effects send, the amount of feedback will vary if either the input or return channel levels are changed. Because of this, the effect works best on signals that can be set up and left alone, though you could route both the original and delayed signals to a separate Group if you need to change the level during a mix.
We've discussed individual sounds, but as the number of different‑sounding loudspeakers on the market has proved, the human ear will get used to anything if given time (the eye is the same; wear pink sunglasses all day and everything will look perfectly normal — until you take them off again!). It seems that the brain doesn't deal in absolutes, but instead prefers to compare one thing directly with another, and whether dealing with light or sound, this comes down to contrast. Put simply, for something to sound up‑front in a mix, something else needs to sound further away, and one of the biggest mistakes people make at the mixing stage is to try to make each sound as big and up‑front as possible. This clearly doesn't work, because if everything's fighting for a place in the front row, you lose the means to build a front‑to‑back perspective (and you almost certainly set up a cluttered mix into the bargain). Instead, decide what's most important (usually the vocals or a solo instrument), push that to the front, and place the other sounds behind. If you're doing dance music, the rhythm section takes front‑stage and the vocals may have to sit a little further back, but whatever the style, it's usually obvious what needs to lead and what needs to take a supporting role.
The following check list recaps the main principles discussed here, and offers a few pointers to help you add that extra dimension to your mixes.
TO BRING SOUNDS TO THE FRONT:
- Keep the tone fairly bright or use an enhancer.
- Don't over‑use reverb, and use a brighter reverb for close sounds than for distant ones. Pre‑delay can help prevent the reverb from pushing the sound back into the mix.
- Give the reverb a wide spread.
- For very close sounds, such as whispers, keep the sound very dry and bright.
TO PUSSOUNDS BACKWARDS:
- Make the sounds quieter than the sounds you'd like to appear at your imaginary 'front‑of‑stage'.
- Use high‑frequency roll‑off to simulate the natural dulling that occurs with distance.
- Use narrow panned or even mono reverbs.
- Use less bright reverbs than for the 'front‑of‑stage' sounds.
- Use a higher mix of reverb to direct sound and experiment with longer reverb times where you can do so without muddying the mix.
- Use echoes that lose HF as they decay.
- Don't use an enhancer on distant sounds — instead, consider patching your enhancer in via a pair of subgroups rather than via the master insert points. This way, you can send just your up‑front sounds via the enhancer group, and this will help to produce the contrast you're trying to achieve.