Hugh Robjohns continues SOS's series on surround sound with a detailed look at Dolby Pro Logic, the widely used surround system originally developed for use in the cinema, and considers how you might adapt your home recording setup to allow mixing in this format.
In Part 1, I recapped in some detail the background and history of surround sound over the last 50 years or so. In this issue we will take a rather more practical approach to working in surround by looking more closely at one of the two technologies which survived the quadraphonic era and which is still in widespread use today — Dolby (Matrix) Surround, also known today as Dolby Surround or Dolby Pro Logic. I will be entering into a full discussion of the other surviving system, Ambisonics, which I also mentioned last month, in Part 3 of this series.
Despite the current marketing prevalence of 5.1 surround formats, the Dolby matrix system should not be dismissed out of hand. There are literally millions of Pro Logic compatible decoders in use around the world, both in stand-alone form and built into domestic hi-fi and satellite TV receivers, televisions and home-theatre systems, as well as in cinemas where the use of the system originated. There are also plenty of media which can support matrix-encoded surround but not 5.1 — analogue television broadcasts, computer games, CDs, cassettes, videos, and so forth. It also has the advantage of being an easy format to work with and is extremely effective when used appropriately. Compatible alternatives to Dolby's equipment are available from the likes of RSP Technologies (Circle Surround) and Lexicon, although if you have access to a Pro Logic decoder (for example in an AV receiver or home theatre system) and can lay your hands on five monitor speakers, it is also possible to create matrixed material using a conventional mixer and digital delay line, which provides a low-cost way of getting into surround sound. More on this in the box at the end of this article.
Dolby Encoding
As explained last month, Dolby matrix encoding is used to convey four channels (Left, Centre, Right and Surround, or LCRS) over a two-channel medium. However, trying to convey four discrete channels of audio over a two-channel system is akin to the 'quart in a pint pot'. It clearly won't fit, and something has to be lost in the process — in this case, channel separation — which is the primary drawback of the approach.
The reason Dolby wanted to encode four channels into two in the first place was principally for backwards compatibility. Analogue sound is carried on film in an optical soundtrack adjacent to the picture frames. Dividing the available area into two to allow the use of stereo sound had already degraded the noise performance quite considerably — which is how Dolby became involved with the film industry in the first place (Dolby A or SR noise-reduction systems are used on all analogue film soundtracks). Dividing the optical track area into four was simply not a practical idea.
However, by matrixing four channels over two tracks, replay by a stereo or mono projector system still produced acceptable mono or stereo results (albeit with the complete exclusion of surround channel effects over a mono replay system). Furthermore, by employing a stereo replay head followed by a suitable decoder, the four original channels could also be reconstituted to a large extent.
Figure 1 shows how the Encoder passes the original left and right channels straight through to the output, but mixes into them modified centre and surround information. The centre channel is reduced in level by 3dB to preserve headroom in the stereo output, but the clever bit is how the surround channel is handled. This is also reduced in level by 3dB but then passes through a band-pass filter to remove frequencies below 100Hz and above 7kHz. The low frequencies are removed simply to allow small surround loudspeakers to be employed. The high frequency roll-off is for two reasons, firstly to reduce the audibility of any high-frequency crosstalk caused by imperfectly aligned replay systems (ie. azimuth errors in the optical replay heads), and secondly to enhance a psychoacoustic effect. Human hearing attenuates rearward sound sources above about 7kHz, so by further reducing these frequencies in the Dolby surround channels, the system can fool the already weak directional sense of the brain at these frequencies into thinking the surround channel is behind, even if the nearest speakers are to the side.
Following the band-pass filter, the signal is passed through a simplified Dolby B-type noise reduction process. As mentioned briefly last month, in the original system, the Dolby B was included to tackle the noise introduced by the analogue delay lines used in the decoding part of the process (of which more in a moment).
The last stage splits the signal in two with ± 90-degree phase shifts before adding to the left and right outputs. Thus the surround signal is encoded in opposite polarity on the left and right channels, but is also phase-shifted in respect of the original left, right and centre channels. This last point is essential as it would not be possible to 'fly' sounds overhead from surround to centre, or from one side to the rear, without it.
The complete matrix-encoded output is referred to as Lt and Rt (left and right 'total'), and can be recorded on any conventional two-channel medium. The only essential requirement of the recording medium is that it must possess both amplitude and phase linearity between the two channels — any errors will cause excessive crosstalk and image instability during decoding.
Decoding
Assuming an accurate stereo record and replay medium, there is no loss of separation between left and right channels, nor between centre and surround. However, separating centre and surround from left or right is another matter entirely! Simple sum-and-difference decoding achieves barely 3dB separation between the centre and left/right or surround and left/right channels.
Dolby's arrangement for cinemas, with three frontal channels and a single, widespread surround channel, actually goes some way to mitigating the problem. A degree of crosstalk from surround to the left or right has little impact on the perceived sound stage — the sound is mainly associated with the cinema screen at the front anyway. However, the reverse condition, that is crosstalk from left and right channels to the surround, is less acceptable. The situation is helped greatly by the low-pass filtering and the use of a short time delay imposed during decoding to the surround channel. As a result of this, any crosstalk from the surrounds will arrive at the listener about 20mS after the direct sound from the front channels. It is a well-documented psychoacoustic phenomenon that if the same sound arrives twice at the human ear in rapid succession, our hearing suppresses the perception of the second instance, even if it is louder — a process known as the Precedence or Haas effect. As a result of this, the crosstalk from the surrounds is not noticeable.
Active Decoding
The only way to improve the channel separation, and therefore the imaging accuracy of the system, is to use an active, adaptive decoding system to manage the crosstalk from moment to moment.
Active decoding was used in some of CBS and EMI's SQ quad decoders in the 1970s, using simple gain riding to enhance the perceived separation between matrix-encoded tracks. The idea was that voltage controlled amplifiers (VCAs) controlled the replay level of each channel, such that unwanted crosstalk in adjacent channels could be silenced by reducing their level. For example, with a loud vocalist on one channel, the two adjacent channels were turned down to make the crosstalk inaudible. However, the obvious drawback was that other instruments playing on these channels suffered obvious and unacceptable level changes, and the image localisation became unstable.
Instead of gain-riding to reduce the audibility of crosstalk, Dolby's approach was to try to cancel the crosstalk out by adding in the signal in opposite polarity (see Figure 2). Each channel's signal is therefore inverted and a portion added into both two adjacent channels, thus cancelling the crosstalk — the necessary level being controlled by a VCA. Detection circuits monitor the strength and spatial position of the 'dominant' (ie. most audible) signal at any moment in time and determine how much crosstalk needs to be reduced in which channels, changing the gain of the appropriate VCAs accordingly.
By way of an example, consider a strong centre-channel signal — dialogue, say — which will inevitably also appear on the left and right channels thanks to the action of the encoding matrix. The centre signal's crosstalk on the left channel can be removed by introducing a portion of inverted right channel to the left — the centre component, being equal in both left and right, will then cancel out completely.
That was the good news... the bad news is that an amount of inverted right-only signal has now been added into the left channel, and because the same cancellation process will have had to have been performed on the right channel too, there will be some inverted left-only on the right channel. Consequently, although this arrangement avoids the unacceptable level changes caused by a simple gain-riding approach, it does suffer from what Dolby sweetly like to call 'spatial redistribution'.
Fortunately, this spatial redistribution of background sounds maintains constant acoustic power in the auditorium, and the potential imaging problems tend to be masked by the improved focus of the dominant signal at any moment in time. The amount of signal cancellation (and therefore spatial redistribution) is also adjusted continually according to the degree of dominance. If sound in one channel is much louder than that in the others, the maximum degree of cancellation is applied, but as the difference in levels falls, less cancellation is needed as a larger degree of self-masking occurs — that is, the wanted sound in each channel tends to mask the crosstalk from the other more effectively. If similar sounds are present in all channels at the same time (for example when rain or winds effects are all-encompassing), no directional enhancement is needed at all, and the signal-cancellation process is disabled completely.
You might think that difficulties would arise when two or more strong signals are present at the same time in different channels, as the decoder can provide directional enhancement for signals at only one position at a time. However, the system constantly assesses the location of the most dominant signal and applies the appropriate directional enhancement, even though this may need to be altered at great speed to accommodate the changing sound balance. The result is that each dominant signal is perceived as being completely separate and stable in its own location.
When there is no dominant signal, or when the degree of dominance between multiple signals is very small, the system reverts to a virtually passive mode, permitting more crosstalk in preference to creating an impression of 'nervousness' or instability in the sound stage. In practice, this adaptive decoder is capable of maintaining around 30dB of perceived separation between adjacent channels, with few side-effects.
Adaptive Matrix Technology
A practical decoder requires some complex electronics, or, these days, digital algorithms. In fact the majority of the Dolby decoder's circuitry is involved in analysing the input signals rather than processing the outputs. Figure 3 shows the structure of the decoder, with the Lt and Rt input signals being passed straight to a combining network from which the decoded outputs are obtained.
The inputs are also routed through a band-pass filter to simple passive sum-and-difference decoders. The filtering removes low frequencies which carry no useful directional information, and high frequencies which could be distorted by amplitude or phase errors in the recording medium.
The left-right and centre-surround channel pairs derived from the passive decoders are analysed independently to detect dominant signals. Four control voltages are generated (EL, ER, EC, and ES) corresponding to the relative strength of the dominant signal in the left, right, centre and surround channels. If a certain threshold is exceeded, they are used to adjust the gain of eight VCAs — four working on the level of the Lt signal and four affecting the Rt signal. This octet of gain-adjusted signals is then inverted and combined with the original Lt and Rt signals to produce the four 'directionally enhanced' output signals — L, C, R, and S.
Dolby Artefacts
It will come as little surprise to learn that the adaptive decoding can introduce 'artefacts' to the encoded material. One of the most obvious effects of the decoder is the way panning becomes 'sticky'. In trying to pan a sound from one channel to another, the pan control seems to do nothing over a large part of its travel, then suddenly everything will happen at once! This is a direct result of the crosstalk-cancelling process, and all you can do about it is learn the new characteristics of the panner. It is also important never to pan a signal in isolation, as the imaging will be affected by other dominant signals in the mix.
The inherent spatial redistribution can also become rather obvious in some circumstances. If there are two equally loud signals in separate channels, the system will decide from moment to moment which is dominant and continually adjust the crosstalk cancellation such that both will sound distinct and with precise spatial positioning. However, if one is reduced in level, its dominance will fall below a threshold and only the louder sound will benefit from crosstalk cancellation. The lower sound will then be spatially redistributed so that it suddenly becomes audible everywhere as crosstalk in the adjacent channels! The only way to overcome this is through careful control of relative levels and also by introducing sounds in other channels.
Because of the unpredictable and sometimes unintuitive nature of the problems that mixing in Dolby Surround can create, Dolby insist that any surround mixing using the format is performed through a complete encode-decode system. This means that the four-channel LCRS mix (sometimes known as the 'stem' mix) from the mixing desk is passed to the encoder, the output of which is recorded as the LtRt signal as you would expect — but the LtRt signal is then also passed on to a decoder, and it is from here that the LCRS monitoring speakers are fed. In other words, you monitor the results of an encoded and decoded mix, thus getting a better idea of what people will hear when they play back your mix through their decoders. This way, you will notice any anomalies thrown up by the encode/decode process and be able to address them during your mixing sessions.
Introducing conventional stereo material to a surround mix needs some care. Typical multi-miked music is inherently phase-coherent and tends to be reproduced with a narrower sound stage than normal. Using a stereo-width enhancer, if available, can redress the balance reasonably well. In contrast, artificial reverb present in the material usually contains incoherent phase information and will tend to be decoded as surround information. This may produce a pleasant effect but, if not, you will need to narrow the stereo image to pull the reverbs away from the rear. Clearly, a degree of compromise is going to be needed between these two aspects of a stereo track.
Stereo material recorded using spaced-mic techniques (including binaural methods) contains a lot of phase-incoherent information and therefore tends to be reproduced by the decoder over all four channels, a large amount of the signal coming from the Surround channel. This characteristic can be used to advantage — binaural sound effects automatically generate an encompassing soundscape, for example. A lot of special effects for TV and film are recorded binaurally for this very reason.
In Part 3...
In the next instalment of this series, I will describe the Ambisonics system and why it was so far ahead of its time when first introduced over 25 years ago.
Surround Sound Explained: Part 1 Foundations
Surround Sound Explained: Part 2 Dolby Pro Logic
Surround Sound Explained: Part 3 Ambisonics
Surround Sound Explained: Part 4 5.1 Surround
Surround Sound Explained: Part 5 Metadata, Upmixing, Downmixing & The Centre Speaker
Surround Sound Explained: Part 6 Setting Up A Surround Recording System
Surround Sound Explained: Part 7 Mixing In Surround
Surround Sound Explained: Part 8 Surround Production
Surround Sound Explained: Part 9 Surround In Your DAW
DIY Matrix Surround
If you want to try mixing in Dolby matrix surround for yourself, all you need is a stereo mixing desk with a phase-reverse facility on at least one input channel, a digital delay line, five loudspeakers for LCRS monitoring, and a Pro Logic decoder. You can buy these as stand-alone units, but they are much more commonly found in modern AV and satellite TV receivers, home theatre systems, and even some hi-fi systems. See the diagram below for the complete DIY Dolby Surround mixing setup.
First, you need to connect the five loudspeakers to the five outputs of your Pro Logic decoder (Left, Centre, Right, and two speakers for the Surround channel), as shown in the diagram. A word on speaker placement is advisable here; unlike in a conventional stereo setup, the left and right loudspeakers of a Dolby system should be placed at an angle of 22.5 degrees left and right of the centre speaker, relative to the listening position (normal stereo employs a ±30-degree angle). The reason for the narrower configuration is to match the width of the frontal sound stage to the size of the screen — the speakers are normally placed behind the screen in a cinema. Setting the speakers too wide will over-emphasise some of the negative effects of the decoder processing. Connect the decoder to your mixer's ordinary stereo outputs, which should also feed your stereo recorder. Now all you have to do is arrange the routing in your mixer so that the matrixed Lt and Rt signals emerge at the stereo outs.
To this end, any stereo material feeding the desk should be panned hard left and right in the usual way, to spread out across the sound stage automatically. If the desk has width controls, you may find increasing the width of multi-miked material useful. Spaced mic material may need to be narrowed to pull it towards the front speakers and out of the surrounds. Mono channels can be routed directly between left, centre and right as necessary using the pan pot in the normal way — just beware of the unusual panning law.
To send to the surround channel, the easiest technique is to use an auxiliary output, configured pre- or post-fader as required. Feed this aux output through a digital delay set to somewhere around seven or eight milliseconds (it is not critical), and split its output to feed two mixer channels, with a phase reverse switched into one. Match their levels carefully and pan hard left and right to the stereo output. This provides the opposite-polarity surround signal mixed into the left and right outputs, while the delay decorrelates the surround signal from the centre/left and right channels and allows the flying of sounds around the sound stage. By adjusting the balance between the main stereo output and the aux send of each channel you can control the position of each sound. The only real trap is never to put any important sound, such as critical effects, dialogue or lead instruments, into the surround channel only, as this is lost completely if the two-channel master is replayed in summed mono. You should also bear in mind that this is not a discrete-channel system and it has limitations — use the surround channel sparingly to enhance the sense of immersion and involvement, or for occasional dramatic effect. Have fun!