Just how does stereo work, and how can you manipulate a stereo audio signal to your mix's advantage?
Everyone is familiar with stereo sound, but when it comes to mixing there's more to it than being able to place a sound at a point between two speakers. Various processes can be applied to a stereo signal to rebalance a stereo image, to make mono sources appear to be heard in stereo, or just to make something sound more impressive in stereo. However, for some processes there are trade‑offs: have you ever used a 'widener' to pan guitars outside the speakers, only to find that you can't hear them in mono? This article explains how stereo works, explores what you can do to manipulate stereo files, and discusses the trade‑offs. I'll start with a little history...
The origins of stereophonic audio reproduced over two channels can be traced back to Clément Ader and the Paris Electrical Exhibition of 1881, but the real basis of two‑channel stereo as we know it today dates from the pioneering work of Alan Blumlein and his EMI colleagues in the early 1930s. Blumlein realised that sound reproduction using multiple speakers inherently means that both ears hear all of the speakers. Consequently, trying to reproduce time‑of‑arrival differences captured by spaced microphones would be extremely problematic: the physical spacing of the speakers relative to the listener would add further time‑of‑arrival differences, compromising the accuracy of the imaging.
Blumlein saw that this apparent problem (both ears hearing both speakers) could be used to advantage, if only level or intensity differences were relayed from the two speakers. If the physical placement of the speakers was controlled, the inherent time‑of‑arrival differences between the speakers and ears could be used to fool the human hearing system into converting the source‑signal intensity differences into perceived time‑of‑arrival differences — and hence creating believable and stable stereo imaging. For this reason, Blumlein's stereophonic system was originally referred to as producing 'Intensity Stereo.'
A handy advantage of Blumlein's approach is that it is inherently mono‑compatible: combining the two channels results in a clean mono mix, with no unwanted coloration. To work correctly, the physical relationship between the listener and the two speakers is constrained, such that they each sit at the corners of an equilateral triangle, typically with the length of each side between two and four metres, depending on the size of the speakers and the room. The interaction of the signals from both speakers arriving at each ear results in the creation of a new composite signal, which is identical in wave shape but shifted in time. The time‑shift is towards the louder sound and creates a 'fake' time‑of‑arrival difference between the ears, so the listener interprets the information as coming from a sound source at a specific bearing somewhere within a 60‑degree angle in front.
If the two speakers produce equally loud sounds, the signal combinations at both ears are identical, so there are no apparent time‑of‑arrival differences and the sound image is perceived to be directly in front of the listener, as a 'phantom centre' image. Varying the relative levels of the two channels introduces apparent time-shifts, and offsets the perceived source position towards the louder side. Although the exact level offset needed for a given position varies slightly with hearing acuity and the monitoring conditions, a figure of 12‑16dB is generally sufficient to place a sound firmly over to the louder side.
The inter‑channel level differences required to create the illusion of a sound source somewhere between the speakers can be created artificially using a pan pot, of course, but real spatial information can also be captured when recording, using a coincident microphone array.
If, rather than sitting at the apex of the ideal listener‑speaker triangle, the listener moves over to one side, the stereo image quickly collapses into the nearer speaker, because the signal from the closer speaker arrives much earlier than that from the more distant one. The resulting physical time‑of‑arrival differences completely swamp those generated by the inter‑channel level differences of the sounds they are reproducing.
Let's move on to consider the different ways of controlling and manipulating the stereo image. The obvious starting point is the pan pot, originally called the 'panoramic potentiometer' and invented in 1938 by Disney's sound department as part of their pioneering work for the film Fantasia. The pan pot is a device with one input and two outputs, and varies the signal level reaching each output. When set to a central position, equal amounts of the input signal are passed to each output. There's no inter‑channel level difference, so there's a phantom centre image. As the control is rotated towards one side, that output receives a constant amount of input signal, while the opposite side receives less and less. The resulting inter‑channel level difference creates the required stereo image position from the speakers.
On stereo channels, the pan pot is usually replaced with a balance control. This type of control usually works by pulling central sounds away from the centre of the sound stage, while leaving the edges of the stage firmly at the speakers (although their levels may be altered). Imagine the left and right sides of the stereo image represented by rubber bands, each attached to their respective speaker and joined at the centre. Moving the balance control squashes up one rubber band while stretching out the other — and the stretched side is usually attenuated proportionally. So when the balance control is turned to the extreme left or right, the quieter channel may be faded out completely, giving the impression that the sound stage has moved completely over to one side. This is not usually the case, however, as the remaining channel is only one half of the stereo pair, rather than a panned mono signal.
A more sophisticated approach is to provide two controls notionally called 'Width' and 'Offset'. A width control allows the image width of a stereo signal to be increased or decreased using simple Mid/Side processing. As the width is adjusted, central sounds remain in the centre of the sound stage; it is the edges that are pulled in or pushed outwards. The width control usually ranges from 'Narrow' or 'Mono' to 'Wide' or 'Spread',with the centre position being 'Normal' or 'Stereo.'
If the stereo width is reduced, an offset control can be used to reposition the narrower image in the full sound stage. This control alters the relative gains of the two channels, decreasing one while increasing the other, exactly like the balance control. In fact, if the width is unchanged (or widened), the offset control behaves exactly as a normal balance control.
The Mid/Side (M/S) technique was yet another of Blumlein's revolutionary ideas. Instead of thinking about the stereo image as having a left half and a right half, the M/S approach essentially considers it as being comprised of central and side elements. The Mid signal is the mono sum of both left and right, and basically describes those elements present in both channels. The Side signal is the difference between the two channels, and describes those elements that contribute to the stereo width.
It follows from this that the balance between the Mid and Side signals determines stereo width. If the Side signal is removed completely, all that remains is a mono sum — and the resulting sound is often not quite what you might expect or hope for. For example, if the stereo recording was captured with spaced mics, or has timing differences between channels (such as caused by an azimuth error on a tape machine), the mono signal may well sound dull compared with the stereo version. This is a surprisingly common issue with some samples and loops and comes back to our old friend, mono compatibility!
Increasing the level of the Side signal relative to the Mid increases the significance of the difference elements within the stereo image, giving the effect of a wider image: elements panned towards the edges become more dominant.
Stereo sound can be captured and conveyed in either the left/right format (used in conventional systems like mixing consoles and CD players), or in M/S format (which is used for FM radio broadcasts, and is effectively at the heart of stereo vinyl records). It's simple to convert between the two formats using a 'phase amplitude matrix'. The same process is used to create M/S from L/R, or L/R from M/S, and the necessary equations are:
Mid = (left + right) –3dB
Side = (left – right) –3dB
Left = (mid + side) –3dB
Right = (mid – side) –3dB
The 3dB attenuations are optional: they're included so that a complete round-trip process (say L/R to M/S to L/R) doesn't result in an increase in signal level. Many matrix systems don't apply the attenuation as part of the conversion, so the overall level may need to be reduced manually after multiple passes through the matrix.
Most DAWs include an M/S conversion plug‑in, but plenty of third‑party plug‑ins can do the job — such as the freeware Voxengo MSED (www.voxengo.com/group/freevst) — and hardware equivalents are available too.
Using a dedicated conversion matrix is the easiest way to convert between the formats, but the matrices are trivially simple to create manually in hardware or software mixers. To convert L/R to M/S, you need to both sum the two channels together (which is exactly what a mixing bus does) and subtract them. For the subtraction, all that's required is to flip the polarity of one channel and then mix them together again: if the two channels are carrying the same material there will be no output (because there's no difference), and if they're carrying different things there will be an output.
The first thing to do is route the matrix input channels to a pair of buses (let's say 47 and 48). Route the left input equally to both buses, and duplicate the right input, with the original version going to bus 47 and the duplicate to bus 48. This duplicated channel also requires a polarity inversion. So, bus 47 receives the left and right inputs (L+R = Mid), while bus 48 receives the left input and a polarity inverted right input (L–R = Side).
Obviously, the two right channels must have perfectly matched signal levels, and the overall gain must equal that of the left channel through to the buses. Depending on the specific configuration of the mixer, it may therefore be necessary to fine-tune the signal path gains to match levels properly. Often, where a pan‑pot has to be used as part of the signal routing, the built‑in attenuation (or gain) of the pan‑pot when panned fully to one side, or in the centre, will mess up the levels slightly. So, having set the routing up, it's worth checking and adjusting the level from the inputs through to the buses with a reference alignment tone or similar.
Exactly the same routing arrangements are used to convert back from M/S to L/R. The Mid signal feeds both left and right buses, while the Side signal is duplicated with a polarity inversion in the feed to the right bus. In this case, though, being able to adjust the level of the Side signal is often useful to vary the overall stereo width — and for this it's necessary to link the two Side channel faders so that the +S and –S elements track together properly.
As mentioned above, a stereo width control employs exactly this kind of Mid/Side processing. The original stereo signal is converted to the M/S format, the level of the Side signal is boosted or attenuated as appropriate, and the result is then converted back to the L/R format. Adjusting the Side signal relative to the Mid allows the perceived image width to be reduced (all the way to mono, if required) or expanded. The danger with the latter is that a 'hole' might appear at the centre of the image, making the whole sound rather unfocused and unpleasant. In broad terms, the level of the Side signal shouldn't exceed that of the Mid signal: when the two are exactly equal, the stereo image fully occupies the entire width between the speakers. Any increase in the level of the Side signal from here pushes the sound outside the speakers — which might initially sound impressive, but will cause phase and mono compatibility problems.
The M/S format is a very useful one for signal processing, because it allows treatments to be applied separately and differently to those elements at the centre of the image and those predominantly at the outer edges. However, it must always be remembered that any process that changes the relative levels of the Mid and Side signals will also change the image width for those processed signals. That alone can be a useful facility: you can widen a stereo image to create a 'hole' in the centre, in which to place another source. For example, if an acoustic guitar is recorded with a stereo coincident mic pair, this will tend to give a fairly strong centrally‑based stereo image. By deliberately widening this stereo source to create a 'hole', you can 'sit' a vocal into the mix without trampling quite so heavily over the guitar parts. Care needs to be taken not to push the guitar signal too wide, which would risk mono compatibility problems, but it can be a useful technique.
M/S processing like this became a very important technique in the late 1950s and '60s, when stereo vinyl records became commonplace, and following the work of Holger Lauridsen, an engineer with the Danish State Radio in the 1950s. The Mid element of a stereo track determines the horizontal or lateral deflection of the disc groove, whereas the Side determines the vertical deflection. Too much Side signal can cause all manner of tracking problems — including throwing the stylus straight out of the groove!
Although this may have become widely adopted due to the physical limitations of vinyl, it's still relevant today. By forcing the low frequencies to the centre, you ensure that their greater energy is distributed equally between the two speakers. So 'lateral and vertical' (or 'lat and vert') processing has been a core mastering technology for at least half a century, and survives to this day with such functions as Brainworx's 'mono maker' process included in many of their M/S plug‑ins.
Another simple application of fiddling with the balance of Mid and Side signals can be to help tame an over-reverberant stereo recording. Captured reverberation in a stereo signal is normally incoherent between the two channels, and so tends to exist mainly in the Side channel. Reducing the level of the Side signal can reduce the apparent amount of audible reverberation to a useful degree. With judicious use of equalisation in the Side channel, it's often possible to rein in the more obvious reverberation elements without squeezing the wanted signal back to mono too. Lots of plug‑ins are available to perform this kind of processing with greater or lesser degrees of sophistication, such as the various offerings from Brainworx, and there are also some hardware units, such as the Rupert Neve Designs Portico 5014 Stereo Field Editor, which do much the same thing.
In fact, the use of equalisation on the Side signal is fairly fundamental to the strength of the technique as a processing format. For example, a little boost across just the top octave or two (boost above 8‑16kHz) generally adds width at the top end, making the mix sound a little more spacious, airy and open. If the stereo source being processed was captured with coincident mics, or derived using pan‑pots with multitracked sources, the image will become wider, but also remain precise and sharply focused.
However, if the stereo signal was captured using spaced mics, increasing the Side signal will tend to blur the imaging even more than spaced mic arrays do naturally — resulting in a wider, but less well defined stereo image. In practice, this is unlikely to be an issue, and the perceived benefits of the wider width will probably outweigh the less accurate imaging.
Applying some LF cut to the Side signal has the effect of narrowing the bass, making it much easier to cut as a vinyl record, and often making it sound more cohesive and punchy at the bottom end too. Conversely, boosting the LF in the side signal will make it sound much more spacious and natural — although it's a good idea to limit the boost to no more than 6dB with a shelf equaliser, having maximum boost below about 250Hz (and turning over below 600Hz). Once again Blumlein got here first, with a process he called 'Shuffling' which, in effect, converts small LF phase differences between channels into useful level differences that enhance the stereo spread. This technique works very well with simple coincident mic arrays, and is well worth experimenting with.
You can also use the M/S domain to process the dynamics of a mix, taking advantage of the format to affect central sounds independently of more widely spaced sounds. For example, if the lead vocals (which are normally central) are a little too loud in the mix, compressing the Mid channel independently of the Side channel can often help to re‑balance things without disturbing the more widely spaced backing singers, guitars and drums. Often, in this kind of application, though, it helps to use a multi‑band approach — using equalisation to restrict the part of the spectrum over which the dynamics processor has effect. Again, the Brainworx BX dynEQ allows this kind of approach, as does a new model called BX Shredspread, which is optimised for dealing with electric guitars. Another application combining equalisation and dynamics is the processing of sibilance in a vocal without affecting the brilliance and clarity of the more widely spaced cymbal crashes or guitar and keyboard parts.
Some of the more sophisticated M/S processing plug‑ins also allow some manipulation of the phase relationship between the Mid and Side signals. This affects how they recombine when converting back to L/R stereo, and has the effect of altering the perceived depth of the stereo image, essentially allowing central sources to be pulled forward or pushed back relative to the edge sources. Again, the Portico Stereo Field Editor includes this kind of feature.
A common requirement is to create a stereo effect from a mono source, and there are several different ways of achieving this, each with various pros and cons. One very effective, and totally mono‑compatible, solution is to treat the mono source as the Mid element of a Mid/Side stereo signal, create a fake Side signal to go with it, and then decode them together to form a normal left‑right stereo signal. This is essentially what most stereo enhancers actually do — though possibly with a few extra bells and whistles thrown in...
To understand how this idea works, consider a singer in front of an M/S microphone array placed in a room. The forward‑facing Mid mic will capture the direct sound from the singer (our mono source) and relatively little else. The sideways facing Side mic will not capture any direct sound from the singer at all, but will capture the reflections from the walls of the rooms.
If we take a simplistic view here, the main difference between the direct and reflected sounds is the latency between them — the reflected sound takes longer to reach the wall and bounce back to the microphone than the direct sound path.
Consequently, to create a simple fake Side signal for the mono source, all that's initially required is to delay the mono source: anything between about seven and 70 ms will work. The longer the delay, the larger the 'room' will appear to be (because it has taken longer for the direct sound to reach the walls and bounce back), although how convincing the effect will be depends to an extent on the nature of the material. A 70ms delay may produce a credible effect with a mono orchestral recording, but will sound totally unconvincing for a solo acoustic guitar. As with all M/S processing, the level of the Side signal relative to the Mid determines the impression of the overall stereo width, so it's a case of experimenting with the delay time and the level of the Side signal to arrive at a convincing and satisfactory balance. In practice, the more spectrally complex the source sound, the better this effect works. The big advantage, in many cases, with this process is that summing the left‑right channels to mono completely removes the (fake) Side component, leaving only the original mono source — hence perfect mono compatibility.
But how does this fake stereo process create an impression of stereo? What it effectively does is to chop the original sound up into narrow spectral components, and then spread them out across the soundstage. The delay time determines how the signal is divided (it determines the lowest frequency affected) and the level of the Side signal determines how widely they are spaced. The process relies on the creation of a pair of interleaved comb-filters in the left and right channels. The matrix that converts M/S to L/R adds and subtracts the Mid and Side signals, and if you combine a signal with a delayed version of itself the result is a comb filter.
In effect, the delay introduces a phase shift that varies with frequency. At low frequencies, where the wavelength is long, the delay imposes little phase shift, so the low frequencies will add constructively on the left channel (M+S) and cancel on the right channel (M–S). (This is actually a bit of a problem, which I'll come back to in a moment...). Moving up the spectrum, at some frequencies the delay will create a phase shift where the original and delayed versions add, and at others where they cancel. The result is a series of deep notches that look a little like comb teeth; hence the name comb filter.
As the Mid and fake Side signals are added to create the left output and subtracted to create the right, the two resulting comb filters work in opposite polarity to each other: where one has notches, the other has peaks. This means that some frequencies will appear only in the left channel, while others will appear only in the right. Our easily fooled sense of hearing interprets that as a blurred stereo spread; there isn't any real imaging information, of course, but it is a surprisingly effective trick.
An unwelcome side‑effect of this approach is that the left channel will end up with lots of bass (because of the summed Mid and fake Side signals) whereas the right‑hand channel won't have any (because they are subtracted). The reason being that the delay used to create the fake 'S' is usually too short to produce much phase‑shift at low frequencies, so LF signals tend to combine more or less in and out of phase on the two sides respectively, making the stereo feel lopsided. The way to cure that problem is to use that mono‑ing idea we've already looked at: applying a high‑pass filter to the fake Side signal means that there can be no stereo width at LF — and all the bass is now locked to the centre. A high‑pass filter at about 100Hz usually works well.
An alternative but equally useful fake-stereo technique is to use reverb. This idea is based on the fact that most musical instruments are too small to create much sense of size for a listener sited more than a few feet away (grand pianos and pipe organs excepted! ). What creates the sense of a stereo spread in these cases is the room reflections — and that's fundamentally what reverb processors are designed to generate.
If you set up a small room reverb patch that has very strong early reflections and no (or the barest minimal) reverb tail, you'll find it will create a very convincing stereo effect for a mono source. Ideally, you'd set it so that there was no reverb tail at all (most reverb processors can be persuaded to do that with a little tweaking). Importantly, mono compatibility should still be good, although there will probably be some remnant of the artificial room reflections mixed in with the original mono source (usually, not all the room reflections cancel out when summed to mono). The level balance between the dry sound and the reverb reflections will determine the perspective of the sound (how far away the source appears to be), while the composition of the early reflections will determine how big or small the room appears to be — and thus the perceived stereo width. Most reverb units allow the nature of the early reflections to be altered to some extent, and experimentation is the key to creating a believable effect.
Short delays can also be used to create a fake stereo effect, either by copying and sliding tracks within a DAW, or using outboard hardware delays or ADT (Automatic Double Tracking) processors. The technique works by panning the original and delayed version widely across the image, using delay times of anything between about five and 15 ms (there is considerable scope for experimentation). However, when the stereo mix is reduced to mono, the combination of original and delayed signals produces a comb-filtering effect, often colouring the instrument's sound severely. Reducing the level of the delayed version, or using multiple delays, minimises this problem but weakens the stereo impression.
When altering the stereo width of a signal it's very easy to fall into the trap of making things sound impressive but completely incompatible in mono. The problems can range from mild coloration or even subtle phasing effects and poor balance translations, right through to completely missing instruments. Continually checking the mono compatibility is therefore a very good reality check: simply press the mono button on the monitoring section of your mixer or DAW regularly as the mix and the M/S processing progresses, and listen for anything that's not quite right!
It also helps a lot to have proper phase metering monitoring the left‑right output, and there are two basic types: a simple phase meter and a Lissajous display, both of which are available in plug‑in and hardware form. The simplest is the basic phase meter, which has a scale from –1 through 0 to +1. It may take the form or a conventional analogue meter with a waggly needle, but is more commonly an LED bar-graph these days. Usually the region from 0 to +1 is coloured green and from 0 to –1 is red. If the left and right channels are exactly the same (ie. a central mono source), the two channels are fully coherent — there is no phase difference — and the meter will show +1. As the level of incoherence between the channels increases the meter will move down towards the zero mark. A coincident mic array capturing an orchestra will tend to produce an almost fully incoherent stereo signal, and the meter will hover close to zero. In effect, a zero reading indicates a 'perfect' stereo image.
If the two channels have significant out‑of‑phase components, the meter will move into the red zone and head off towards –1, reaching that mark when the left and right channels are identical but of opposite polarity. Minor brief skirmishes just below the zero mark are rarely a problem, but serous deflections towards –1 or prolonged periods below the zero mark indicate potentially severe phase problems and mono compatibility issues. The aim is to get the phase meter close to zero when maximum stereo width is desired, without going over into the red zone!
The alternative — and to my mind more useful and informative — form of meter is the Lissajous display, or 'Goniometer'. This is the 'ball of string' display which is included in many DAWs as an option, and is basically a two‑dimensional bar-graph meter. Imagine that the right channel is shown as a horizontal bar-graph and the left channel as a vertical bar-graph — and the display itself is made up of a dot in the area between the two bar-graphs, tracking the movement of both. If the two channels are displaying the same thing (a central mono source), a thin diagonal line will be produced, whereas if the two channels are carrying different signals, a complex pattern will be traced, which often resembles a tangled ball of string.
To make this basic Lissajous display a little more meaningful in audio applications, the display is usually rotated anti-clockwise by 45 degrees, so that when both channels are carrying the same material the thin line is vertical — essentially now pointing at the centre of the image. If only the left channel has a signal, the line tilts 45 degrees left, while if only the right channel is active, it tilts 45 degrees to the right. The length of the line indicates the amplitude of the signal, of course, but it is usually arranged to grow symmetrically out from the centre of the display.
When panning a mono source, this kind of display makes the actual position within the stereo image very clear indeed, and that's a large part of its strength. I find it particularly useful when trying to optimise the stereo recording angle of a coincident mic pair, to ensure that the sound sources occupy the required image width, for example. As the stereo width of the source increases from mono, the thin line bulges out into a vertical ellipse of tangled string until, eventually, it becomes circular, as the width reaches the widest possible level while still maintaining mono compatibility. This equates to the zero mark of the phase meter. If the channel incoherence increases, the circular ball flattens out into a horizontal ellipse, and when the two channels are carrying the same material, but in opposite polarities, the display will show a thin horizontal line. So the aim, quite clearly, when mixing or processing signals is to keep the display image somewhere between a circle and a vertical line — but never horizontally flattened!
It might sound complicated, and initially the display can be utterly mesmerising, but as experience grows and you learn to interpret the display, it quickly becomes an invaluable tool that provides an immense amount of information about the stereo signal. I rely on this kind of display when recording and mixing and would now be lost without it as much as trying to mix without basic level meters! In fact, if I had the choice I would choose the Goniometer over the bar-graphs any day!
With this kind of metering, over-wide processed guitars or keyboard pads are instantly obvious as a horizontally flattened ball of string, and sources that are panned off‑centre are obvious as the ellipse leans towards one side or the other. Subtle effects that might not be immediately audible often stand out very clearly on the Lissajous meter, and you can then go back and tweak the problem source or adjust the processing to correct the problem.
There are many useful techniques that enable you to get the best out of a stereo track, whether it's an individual instrument or a complete mix. Remember, though, that there are often trade‑offs — and while it's your choice as the engineer which compromises to make, and what balance to strike between stereo enhancement and mono compatibility, it always pays to be aware of those trade‑offs as you mix!
There are several different pan pot 'laws' in common use that determine the relationship between the control rotation and the inter‑channel level difference, generally following a sine/cosine relationship. An important aspect of the panning law is the actual output level from each channel when the pan pot is at the centre position, and common options are 3, 4.5 or 6dB of attenuation.
Why the different options? In systems where mono compatibility is critically important — such as in broadcast environments — it makes sense to ensure that the signal level of a source doesn't vary as it is panned across the sound stage. When panned centrally, the input signal is obviously sent to both output channels, and when summed to mono those two channels are added together. Summing two identical signals electrically results in a 6dB increase in level, so to maintain a constant derived mono signal level regardless of pan position, the centre point of the pan pot needs to attenuate both outputs by 6dB relative to the extreme edge positions. This is called a constant‑voltage law.
Some systems achieve this level variation by passing the input signal unchanged when panned fully to one side or the other, but attenuate it progressively as the pan nears the centre. Others increase the signal gain as the input is panned towards the edges, and some use a combination of both techniques. There are advantages and disadvantages of each approach, but the important aspect is that when panned centrally, the level at both outputs is reduced relative to the level when panned fully to one side.
As we have seen, for ideal mono compatibility, 6dB of attenuation is required for centrally panned sources. However, when listening to stereo speakers their outputs combine acoustically, not electrically, and this implies that a different attenuation amount is required. When reproducing the same signal from both speakers, the perceived level increases by only about 3dB compared with the same signal from one speaker only. So to maintain a constant perceived level as the pan pot is rotated, the centre position attenuation needs to be just 3dB rather than 6dB. This is called a constant‑power law.
While a lot of modern DAWs allow the user to choose the most appropriate pan law for their application, it isn't very practical to reconfigure analogue mixing consoles for different pan laws. So a compromise option has been widely used for many decades, providing 4.5dB of attenuation for central sources. For someone listening in mono, panning a source across the sound stage using this compromise law will result in a barely noticeable 1.5dB 'bulge' around the central position. A stereo listener would hear a similarly marginal drop in level across the centre. In practice, these pan level 'errors' are usually negligible and few casual listeners even notice. Nevertheless, it does pay to be aware of the effect of different panning laws, as they do affect the fine balance of panned elements within a mix when auditioned in mono and stereo.
When using two dynamics processors, such as limiters or compressors, to process a normal left‑right stereo signal, it is normally essential that the gain reduction is applied identically for both channels. In practice, this is achieved with a 'stereo link' switch, which ensures that the control voltages generated by the two side‑chains are applied equally to both channels. Depending on the design of the units, one channel might provide complete control of all parameters (threshold, ratio, attack, release and so on) for both units, but more usually each unit has to be adjusted for identical settings, so that both side‑chains work in the same way.
Linking the two channels modifies the overall level of the stereo signal in the appropriate way, and the stereo image is not affected. If the two compressors were not linked, a loud sound on the extreme left‑hand side, say, would trigger the left channel compressor to reduce the gain, while the right‑hand channel compressor would not react at all. As a result, central sources (which should have equal level in both channels) would appear to rush to the right-hand side, as they would be louder in the right channel than in the compressed left channel. As the left channel's compressor dumped the gain reduction, the image would drift back towards the centre. Clearly, this kind of image shifting is normally very undesirable, hence the need to link compressors when processing left‑right stereo.
However, when using dynamics for Mid/Side processing, the aim is deliberately to process the Mid and Side channels independently, specifically to achieve a rebalancing effect. So the stereo‑linking facility is not used when processing Mid/Side signals — and if the Mid channel is compressed slightly without corresponding compression of the Side channel, the re‑converted left‑right stereo signal will appear to 'breathe' in and out: the stereo width will tend to vary as the processing is applied. Fortunately, most people are very insensitive to this kind of image variation, and perceive only the altered dynamics, which will have the effect of rebalancing the centre and edge sounds.
Before considering the various ways of processing stereo material, it would help to understand what stereo actually is and how it works. Our human hearing system, although only equipped with two sound receptors, is capable of detecting and locating (with varying degrees of accuracy) sounds in the full sphere surrounding us, and it does this using a combination of three basic techniques: time‑of‑arrival and phase differences, level differences, and spectral analysis.
The primary method involves the detection of time‑of‑arrival‑differences of a sound at each ear. A sound wave takes roughly one millisecond to travel one foot (3.4ms per metre). So for any sound source that is not directly in front of or behind the head, sound waves will reach one ear fractionally before the other. The average spacing of an adult's ears is a little over six inches (15cm), suggesting that the largest possible time‑of‑arrival difference for a sound to reach each ear is about 0.5ms. In practice, most people can determine the bearing of a sound in front of them (in the horizontal plane) to within about two degrees, and that corresponds to a time‑of‑arrival difference of less than 0.01 milliseconds!
The brain is believed to 'latch on' to the leading edge of a transient sound as the reference point for measuring time‑of‑arrival differences, and in normal life most sounds contain plentiful transient information on which to base arrival time difference measurements. It follows, though, that if a sound does not contain transients, a time‑of‑arrival difference cannot be calculated — and this is revealed clearly when trying to locate the acoustic source of a continuous tone signal.
While time‑of‑arrival differences provide left‑right positioning information, they're unable to differentiate between frontal and rearward sounds. The time‑of‑arrival difference for a sound at 45 degrees (front‑right) to the listener is the same as that for a sound approaching at 135 degrees (the same angle (back‑right) behind). This inherent ambiguity is resolved by small automatic and unconscious movements of the head to augment the time‑of‑arrival information. By rotating and tilting the head slightly, the time‑of‑arrival difference changes and, for example, rotating the head to the left will decrease the time‑of‑arrival difference if the source is in front, but will increase it if the source is behind the listener.
If you are unable to move your head it becomes almost impossible to tell whether a sound source is in front or behind, and this may be the reason most people have great difficulty in perceiving believable frontal sounds when listening to binaural recordings: head movements provide no additional time‑of‑arrival information when you are wearing headphones and, since you can't see the source in front of you, the brain decides that it must be behind! The significance of sight should not be underestimated either, because sight is the dominant sense for most people, and the direction‑finding ability of our hearing is generally used to guide our eyes to the source — so if you can't see the source of sound, the chances are it is behind you!
If the time‑of‑arrival mechanism relies on transients, and transients fundamentally comprise high‑frequency sound elements, how do we locate low‑frequency sound sources? Well, the phase difference between sounds arriving at each ear is used for this. The size of the average head means that phase differences become ambiguous above about 2kHz, because the path length around the head produces phase shifts in excess of 360 degrees — in a relative phase difference of 20 degrees is indistinguishable from a relative phase shift of, 380, 740, or 1100 degrees — but these absolute phase differences imply very different source bearings. However, for low frequencies the wavelength of sound is so long that the path length around the head becomes insignificant and phase shifts become a meaningful way of measuring source location.
However, within a closed environment like a typical project‑studio control room, LF reflections and standing waves within a room create additional phase differences between the ears which vary unpredictably. As a result, the multitude of contradicting phase shifts makes it impossible to determine the bearing of a low frequency sound source with any accuracy. So whereas it is possible to detect the direction of a low‑frequency sound source out of doors, it is almost impossible indoors — and that can be used to advantage, for example, in placing a sub‑woofer as part of a 2.1 monitoring system: in effect, the subwoofer can be located virtually anywhere in the room, with no detrimental effect on the stereo sound stage.
Another mechanism used for the detection of sound direction is the difference in the intensity of sounds arriving at each ear. Sound intensity diminishes with distance at the rate of about 6dB for each doubling of distance. However, this fact alone is of little help, because the distance between the ears is usually a lot less than the distance between the source and each ear, and consequently the level differences detected by each ear are extremely small. To put some figures on it, a source four metres away and bearing 30 degrees produces a level difference between the ears of just 0.2dB, which is undetectable.
All is not lost though, as another useful acoustic property comes into play. When the wavelength is smaller than the dimensions of an object placed in the path of a sound wave, a 'sound shadow' — a region where the sound intensity is greatly reduced — is created on the side away from the source. Conveniently, our head creates a sound shadow for frequencies above about 2kHz, and this exaggerates the level difference between the ears for any source which is not directly in front, above or behind. The shadowed level difference increases with frequency, reaching about 20dB at 15kHz. Moreover, for frequencies above about 7kHz, the pinnae (the outer ear flaps) also create shadows for sounds approaching from behind and above. Again, the directional information derived from these high‑frequency sound shadow effects can be enhanced by small movements of the head to move the distant ear in and out of the sound shadow.
The final weapon in our armoury of sonic location‑finding relies on spectral analysis. The human hearing system seems able to recognise specific peaks and notches in the frequency spectrum of received sounds which are caused by a comb filtering effect, resulting from the mutual interference of sounds reflecting from the pinna and the shoulders. The nature of these spectral notches relate not only to the bearing of the sound source but, critically, to the specific shape of the pinna. So this is very much a personal direction‑finding mechanism, and a specific set of spectral notches could imply a different source direction for different people. This technique is not particularly accurate, but it does help to remove ambiguities in the other mechanisms. Some pseudo‑surround‑from‑stereo systems have made use of this effect in the past, but with variable and unpredictable results, as you might expect.
There's plenty of commercially released material in which you can hear interesting stereo‑manipulation processes at work, or where mono and stereo playback affects the tonality due to phase cancellation when summed to mono. I've commented on a few examples below.
This album mostly contains Madonna's hit tracks remixed using a processing system called Archer Q Sound (now called QSound Labs), which, it is claimed, gives a 3D soundstage via stereo replay systems. It certainly produces impressively large and wide stereo sound, but is generally less satisfactory (or even unpleasant) in mono. 'Vogue' has huge keyboard pads from the start, the tonality of which changes significantly and not for the better when auditioned in mono. Many other instruments also change character in detrimental ways, including some vocal lines and percussion.
Midway through this album track, there is a sequence where the main motif starts to revolve in a circular motion around the listener. It's a very powerful effect on headphones, and actually remains quite effective on stereo speakers. However, when auditioned in mono, the volume of the track dips to nothing repeatedly. The reason is that the rotating effect is achieved with a co‑ordinated combination of auto‑panning and polarity reversals — the latter being responsible for the level nulls when the pan passes through the centre!
Andy Wallace's mix of this song may be impressive, but if you compare the track in stereo and mono, you'll notice that the tonality of the guitar parts changes significantly, with an obvious attenuation of the higher harmonic frequencies when listened to in mono.
In all of these examples, the mono version is perfectly acceptable, but the sound quality or character is definitely compromised in comparison with the stereo original. They provide good illustrations of how the engineer, producer or artist has struck a balance between creating interest in the stereo version and keeping a solid and acceptable mono mix balance.