SOS's guide to the world of surround sound comes right up to date with the first of two looks at the modern 5.1 standard, and its use with DVD‑Video and DVD‑Audio discs.
The previous parts of this series have all been concerned with the earliest forays into surround sound, such as those from the quadraphonic era, and the surviving systems from that period, such as the matrixed Dolby Surround format and Ambisonics. For the next couple of months, I want to focus attention on the various surround sound systems being promoted today, mainly in association with DVD for the domestic market, and also in the cinema. In particular, I will explain the terminology, features and capabilities of these systems with the aim of providing a basis from which to explore some practical implementations and applications of surround in further parts of this series.
To start with, it is important that everyone understands the loudspeaker format adopted for most current surround systems — the infamous '5.1' arrangement. This was actually defined as long ago as 1987 (although it didn't become a commercial reality until 1993) and generally requires three full-range speakers across the front of the listener, two behind, and a subwoofer (fed by the so-called low-frequency effects channel or LFE, of which more in a moment). The system, including the requirement for a subwoofer, was essentially derived from the experience gained with the 70mm film format in the cinema.
The term '5.1' is seen by many as the definition of surround sound, without really understanding what these seemingly innocuous numbers actually represent. There are in fact many alternative modern surround sound arrangements with different, but related descriptors.
As I explained last month, the Ambisonics (B-format) surround system is the only format which is completely independent of the loudspeaker arrangement in the listening room. As long as the decoder in an Ambisonics system understands where the speakers are positioned in your room, you can put them almost anywhere you like. However, the commercial 'discrete-channel' surround sound formats (including 5.1) require the loudspeakers to be in precisely defined positions relative to the listener in order to reproduce the intended surround sound field.
Surround systems are frequently defined in terms of the number and nature of independent sound signals being conveyed, or their intended allocation to physical loudspeakers in the room for reproduction. By way of example, an ordinary stereo signal can be denoted as either 2/0 or 2.0 — and the use of the slash or the point is critical to the meaning. The first of these terms, 2/0, means that the system provides two independent signals intended for frontal loudspeakers, with none for the rear. The second form, 2.0, implies that just two 'full-bandwidth' channels are required to convey the signal.
What is meant by the term 'full-bandwidth channel' here is a medium capable of conveying audio signals spanning the audible frequency range between at least 20Hz and 20kHz. The alternative would be a 'restricted-bandwidth channel', which is generally understood to mean a medium supporting low-frequency signals only — say between 5 and 120Hz.
The reason for the use of the '0.1' term in 5.1 systems is because the LFE channel, which feeds the subwoofer in 5.1 systems, uses a much lower sampling rate than the main channels. The standard sampling rate is 48kHz, but the LFE is sampled at 240Hz (providing a maximum audio bandwidth of 120Hz). This lower rate is 1/200th of the principal rate, or 0.005, but the marketing guys weren't keen on promoting a '5.005' system, hence the simplification to 5.1! In fact, the spectrum from 5-120Hz actually occupies roughly five octaves in the 12-octave span to 20kHz — that is 40 percent of the entire bandwidth!
As a further example of the use of these descriptive denotations, the matrixed Dolby Surround system described in Part 2 of this series (see SOS September 2001) is often referred to as 3/1, to indicate three independent signals intended for frontal loudspeakers, plus a single signal for the rear (even though this is usually shared among multiple speakers). However, this surround format could also be denoted legitimately as 2.0, as the matrixed signal is conveyed on just two channels, Lt and Rt, as we have seen.
The almost universally employed '5.1' term, then, refers to that system's five full-bandwidth channels plus its additional, restricted-bandwidth channel. These channels are conventionally described as '3/2L', meaning three independent frontal speaker feeds, two independent rear speaker feeds, and an LFE signal for low-end effects — at least, that is its use in the cinema. In music applications, the use of the LFE channel is open to argument. Many tracks make no use of it at all, although some engineers use the LFE channel to carry elements of the drums and bass. The problem is that this approach can lead to some rather unpredictable results in the home, distorting the balance of the musical instruments if end-users haven't set up their surround system correctly. With LFE effects in films, of course, setting the subwoofer too loud or quiet changes only the relative impact of the film.
The table below shows the complete set of common format and loudspeaker descriptions, with comments as to their typical application.
|2.0||3/1||The original matrixed Dolby Surround.|
|3.0||3/0||Used rarely, where a film soundtrack employs three front channels but no surrounds.|
|4.0||2/2||The classic quadraphonic arrangement intended for speakers positioned in the four corners of a square. There are a few films on DVD presented in this format, but it is most widely used with re-released quadraphonic music such as Mike Oldfield's Tubular Bells.|
|5.0||3/2||The full, modern surround sound format but without using the LFE channel. Often used in music applications.|
|5.1||3/2L||As above but with the extra LFE channel. Widely used for most modern film soundtracks, including the Dolby Digital and DTS systems.|
|6.1||3/3L||This is an arrangement with three rear surround channels plus three frontal channels, plus the LFE channel. It was a format introduced by Dolby for Star Wars Episode 1: The Phantom Menace and is also referred to as EX (for Dolby-encoded material), or ES (for DTS-encoded material).|
|7.1||5/2L||This format is only used in large-screen cinemas and employs additional loudspeakers between the left-centre and right-centre pairs. The Sony SDDS film format uses this arrangement.|
|10.2||6/4LL||This is a configuration proposed by Tom Holman (of Lucasfilm and THX fame), in which two 5.1 systems are used, one at floor level and the other at ceiling level, thereby having the ability to convey height information as well as horizontal surround imaging (making the system 'periphonic').|
Dolby have now introduced a system of so-called 'sound mode icons' for use on DVD packaging to indicate to the consumer which of the many possible audio formats is used on the disc. The icons diagrammatically represent the number of speakers used for reproduction of the sound format employed on the disc, from mono up to 5.1 surround (see diagram).
Of all these possible formats, 5.1 seems most likely at the time of writing to be of most appeal to the home-based musician wanting to create mixes for release on DVD. A closer examination of the details of this format is therefore in order, to aid those thinking of setting up a 5.1-capable recording and playback system in their studios.
The physical positioning of the loudspeakers in a 5.1 system has been defined by the International Telecommunications Union (ITU). The definition states that the front speakers should be arranged left, centre and right, with the angle between the left (or right) and centre speakers being 30 degrees (see diagram below). A narrower angle of 22.5 degrees has been suggested for use in cinematic systems to comply with another requirement that the left and right speakers should be within four degrees of the edge of the screen. However, the ITU standard is always used with music-only systems, and personally I prefer the wider angle with films too, as it provides a larger sound stage with the screen acting as a window into it. The 30-degree angle also means that conventional stereo can be auditioned correctly using just the left and right speakers. The three front speakers are usually denoted as 'L', 'C' and 'R' for Left, Centre and Right respectively.
All three speakers should be positioned at the same distance from the listener, lying on an arc (not a straight line) centred at the listening position. In other words, the centre speaker should be set back slightly from a line drawn between the left and right speakers. This is not always practical, though, and in cases where all three front speakers must be placed along a straight line (perhaps because of soffit mounting), the centre speaker feed should be delayed to reinstate the correct time alignment. This is a facility provided by many of the better surround-sound controllers, of which more later in this series.
The cinema standard is to set the sound level from each channel (front and rear) such that pink noise at -18dBFS on a digital replay medium produces 85dB SPL from each channel. Pink noise is used as it provides equal energy in each octave band and is a more reliable test signal for acoustic measurements. A fully modulated digital signal should therefore produce peaks of 103dB on each channel. This is asking a lot in a small room with more modest loudspeakers, and so an alternative standard is to use 80dB SPL. Of course, lower monitoring levels are perfectly acceptable if you prefer — the important thing is that the level of each channel is correct relative to the others.
The rear loudspeakers should ideally be placed at the same distance from the listening position as the front speakers, at an angle of 110 degrees (±10) from the centre front. This angle is a compromise between producing the best sense of 'envelopment' in the sound stage (at 90 degrees) and the best rear-quadrant imaging (at 135 degrees).
As already mentioned, some surround speaker controllers provide facilities to introduce delay, which enables the rear-channel speakers to be mounted closer to the listening position than the ideal recommendations while maintaining correct time alignment. I prefer to denote these speakers as 'SL' and 'SR' for Surround Left and Surround Right, although you will often see them as 'Ls' and 'Rs'.
There is a degree of confusion involved in the required replay level for the rear channels. In music and television applications, as well as in domestic replay equipment, all five channels are set up with identical replay levels. However, for film and theatre systems, each of the rear channels is calibrated 3dB lower in level than the front channels. This is so that their acoustical sum produces a level equivalent to one front channel, and is a hangover from the use of Dolby Surround, where the single rear-channel signal of that format would be reproduced by both rear speakers of a 5.1 system.
This disparity is taken into account during the transfer to DVD of films mixed for theatrical release, with a -3dB level adjustment to each surround channel. Some surround controllers also incorporate facilities to implement this level correction locally if desired.
In large rooms (and cinemas) more than two rear-channel speakers are often employed to improve the sense of 'envelopment', although these are grouped in two arrays and driven by the SL and SR signals. In this case, the speakers should be distributed evenly about the nominal 110-degree angle from the centre front, and not exceeding the limits of 60 and 150 degrees. A recent addition to cinema sound is a third, central surround speaker. The signal for this is matrix-encoded within the SL and SR channels using a revision of the original Dolby MP matrix system.
Unlike the early Dolby Surround system, the rear channels in modern 5.1 surround systems are at full bandwidth and require, in theory at least, full-range speakers. This is often not feasible in a domestic situation, so a large number of consumer surround systems employ small (even tiny) speakers for the five main positions, with a subwoofer which handles the low-frequency components of all the channels, including the LFE requirements (such as the Videologic Digitheatre DTS system shown below). Depending on the precise design of the system, this often imposes a reduction in spatial accuracy at mid and low frequencies, but this is usually accepted as a fair trade-off for domestic purposes.
The redirection of low frequencies from the main channels to the subwoofer is called 'Bass Management' and is an increasingly common technique which is certainly not restricted to consumer systems. A large number of professional systems employ bass management as well, although in this situation it would more typically only redirect frequencies below about 80Hz from the main channels to the subwoofer. One of the main arguments for using bass management in professional environments is that even if the main monitors have the ability to handle the full sound spectrum, acoustical summation of low frequencies sounds different to electrical summation, because the room in which playback takes place has an appreciable effect on how we perceive the low-end frequencies. Since virtually all consumer surround systems employ bass management, it therefore makes a lot of sense to employ similar facilities in any surround studio you set up.
The turnover frequency of the LFE channel is generally defined as 120Hz, although I have come across an alternative recommendation which employs an 80Hz crossover. The most important aspect of the LFE channel however, is that it is designed with 10dB of extra headroom and replays with 10dB more level in each third-octave band of its operational range, compared with the five main speakers. This is definitely not the same as producing 10dB more level overall!
Setting the replay level of a subwoofer with an ordinary sound level meter is not appropriate since the reproduced bandwidth is restricted — a third-octave analyser should be used (ie. one which splits the sound it is measuring into a series of bands, each a third of an octave in size, before carrying out its analysis), and the measurement should be based on a per-band analysis. However, if you only have a broad-band sound level meter is, a useful rule of thumb is to adjust the gain on the sub until a pink noise signal at -18dBFS generates a sound pressure level about 4dB higher than that of the five main channels.
Surround systems are obviously more complex than stereo systems. Clearly, more channels of amplification are required and consumer systems generally use a multi-channel amp, often with the necessary preamp facilities built-in. However, it is perfectly acceptable to construct a system from existing stereo amplifiers (or powered/active loudspeakers) — provided that the facility exists to fine-tune the relative levels of each channel. If you take this approach, give careful thought to the quality and dynamic capability of each of the amps used, allocating the best to the left and right speakers, followed by the centre and sub, and then the surrounds, in that order.
It is very important that all five main loudspeakers have identical (or at least very similar) frequency and phase responses, and dispersion. Ideally, five identical speakers would be employed, but where this is not practical, the next best would be for three identical front channels with different but matched rear channels. Most loudspeaker manufacturers can offer special front-centre channels with similar performance to main speakers, but in a more convenient (often horizontal) enclosure.
The reason I have laboured the standard specifications of the surround loudspeaker monitoring system is because mixing on non-standard systems will have undesirable side effects when your material is played on a consumer's system. Just as mixing music on bright-sounding speakers tends to result in a dull finished track, mixing surround on a system which over-emphasises rear imaging, for example, will often lead to the use of too much rear reverberation. It is therefore very important that everyone mixes using the same (or at least very similar) monitoring arrangements, so that mastered surround material is produced to a consistent standard. This approach works well for stereo and is just as important (if not more) for surround.
The most common replay source for discrete surround material is the DVD‑V, the video replay version of the DVD format, although some laserdiscs also carry discrete surround tracks (usually DTS-encoded discs — see below). The audio facilities offered by DVD players vary widely. Many are equipped with stereo analogue outputs plus a digital output. In this case, multi-channel surround sound is only available via the digital output with an external decoder. This is usually built in to the surround controller or preamp, although many stand-alone decoders exist too. Some more expensive DVD players have built-in decoder facilities and provide analogue outputs for the left, centre, right, two surround and subwoofer signals, which can be passed on to a multi-channel amplifier.
There are plenty of magazines and other publications covering the hi-fi aspects of DVD, and I don't want to go into too much of that detail here. However, I would like to remove any confusion from the fact that the physical surround sound arrangement is often inextricably bound up with bespoke data reduction systems within the DVD‑V format. A great many readers will probably associate Dolby Digital with the 5.1 surround sound format when, in fact, the Dolby term refers only to a multi-channel data-reduction scheme. It is just as likely to be carrying a mono sound track as it is the ubiquitous 5.1. Admittedly, in the table earlier in this article, I made specific connections between 5.1 and the Dolby Digital and DTS formats, but these are only the most common associations. Dolby Digital and DTS are, in fact, both multi-channel audio data-reduction systems first and foremost. The number of channels they can convey is fairly flexible, and Dolby Digital, in particular, can actually handle all formats from 1.0 to 7.1 if so required.
There are three principal digital audio data-reduction schemes employed in the cinema sound world, and unsurprisingly, some of them are now being used in DVD‑V manufacture, in order that all the video for a feature film and also the soundtrack can be crammed onto the DVD (which would be next to impossible if the audio was put on the disc at full resolution, let alone in higher-quality formats such as 24-bit, 96kHz).
The most widespread film data-reduction format is the Dolby Digital format, which uses an algorithm called AC3 ('Audio Coding 3'). This is a very sophisticated perceptual coding system — for more on this and other data-reduction techniques see SOS August 1998). AC3-encoded audio is carried on 35mm film prints as a block of 78x78 black-and-white pixels recorded between the sprocket holes on the film (see photo). Until recently, most Dolby Digital films employed the 5.1, 3/2L surround arrangement, but a lot of the latest films have adopted the Dolby Digital EX format with the third, central rear channel.
Sony's less common SDDS (Sony Dynamic Digital Sound) format uses ATRAC algorithms (similar to those employed on Minidiscs) to reduce the amount of audio data required to convey surround sound in a 7.1, 5/2L configuration. This data is recorded as blue-and-white pixels distributed on both edges of the film, outside the sprocket perforations (seen on the close-up view).
The third system is the DTS, or Digital Theatre Systems format. The data-reduced audio information is actually replayed from separate CD‑ROM units which are synchronised to the film via a bespoke optical timecode track, recorded between the standard stereo analogue audio sound tracks and the film's picture frame.
In the DVD‑V home theatre market both Dolby Digital and DTS data-reduction systems are employed. However, the DVD‑V format only specifies the use of Dolby Digital (AC3) coding and MPEG — DTS is an optional format, and although some players can also decode it, most can only dispatch the data stream through the digital output for an external decoder to deal with. MPEG data reduction is another sophisticated and high-quality perceptual coding system capable of dealing with multi-channel sound, but it does not seem to be widely used, mainly, I suspect, because there is no cinematic equivalent.
The latest addition to the DVD family is the DVD‑Audio or DVD‑A disc. This is intended as a very high-quality audio-only format typically providing full 5.1, 3/2L surround sound material, recorded linearly in 24-bit resolution at up to 96kHz or higher sampling rates. I mention the format here because although it does not use any form of 'lossy' audio data-reduction scheme such as AC3 or DTS, a bespoke 'data-packing' technique is nevertheless employed, simply to maximise the playing time of the disc.
The scheme adopted in DVD‑A is called MLP (Meridian Loss-less Packing), a system designed by the British hi-fi company Meridian. The concept is not dissimilar to the PKZIP, Winzip or Stuffit programs used to compress computer files, which pack multi-channel audio data in a very space-efficient manner, typically achieving a 2:1 compression. On decoding, the data is fully reconstructed to its original form, without any losses whatsoever. Although a British design, licensing of the MLP algorithms is handled exclusively by Dolby, simply because that company is so well set up to handle licensing agreements and is intimately involved in the DVD industry. Nevertheless, MLP should not be confused with Dolby's own data-reduction systems.
In the next part of this series I'll explain the ideas behind DVD metadata — information encoded with the 5.1 audio on a DVD which instructs the player how to reproduce the audio, including replay levels, dynamic range compression, and even how to down-mix the 5.1 information to stereo.
Just as a good two-channel (stereo) recording can be made with two mics, so a good, natural-sounding recording suitable for playback on a 5.1 system can be made with (you guessed it) five mics, with each mic producing the signal for one of the main playback channels. However, exactly as with stereo, opinions differ amongst those making 5.1 recordings as to the relative merits of spaced and coincident mic techniques. I described the Ambisonic coincident mic arrangement at some length last month, so perhaps I should redress the balance a little by talking about five-channel spaced mic arrays. As with stereo spaced mic techniques, there are several alternatives, but most engineers are now beginning to converge on one increasingly popular technique — more on this in a moment.
First, for the benefit of those who have not previously encountered the distinction between spaced and coincident techniques even in stereo, allow me to recap (for more detail, see the dedicated two-part feature on this subject back in SOS February and March 1997). When recording in stereo using a pair of mics, different recording engineers tend to employ one of two principal techniques. Stereo 'purists' favour the coincident system, which uses a pair of closely mounted and angled directional mics designed to capture only amplitude differences between the two channels, courtesy of the microphone's polar patterns, without any timing or phase differences.
The alternative spaced arrangement usually employs omnidirectional mics (although directional designs are sometimes also used). These mics are placed apart such that they capture timing differences between the two channels (in addition to relatively small amplitude differences). The best known spaced-mic arrangement is probably the so-called 'Decca Tree' which uses three large-diaphragm omnidirectional mics arranged at the ends of an inverted T, typically measuring roughly 2.0 metres across and 1.5 metres deep. However, there are many alternatives and variations including the Faulkner Array, and binaural techniques such as the Jecklin Disc.
Both coincident and spaced mic techniques can produce very pleasing results — the coincident arrangement tends to provide more precise imaging (to my ears at least) but the spaced techniques usually convey more, well, 'spaciousness'! Not surprisingly, many alternative techniques have been devised to try to incorporate the best of both systems — the ORTF arrangement being a classic example.
When it comes to selecting a stereo mic technique for a given recording situation, there is no absolute right or wrong, and personal preferences play a strong part. Exactly the same situation pertains with surround sound. The Ambisonics approach is equivalent to the stereo coincident system, and a spaced surround array is the equivalent of the Decca Tree. There are, however, as many, if not more, variations in mic technique for surround as there are for stereo recording, although because the medium is still relatively new, a lot more experimentation is taking place as engineers learn what does and doesn't work.
One commercial example of a bespoke spaced surround mic array is available from the German outboard manufacturer SPL in conjunction with Brauner microphones. This array is called the ASM5 (adjustable surround microphone) and is derived from the INA5 design developed by Volker Henkels and Ulf Herrmann (see diagram). It employs a total of five spaced cardioid microphone capsules, the front three arranged in a configuration resembling the Decca Tree, but with fairly close spacing (the three arms measure only 17.5cm from the centre hub). The two rear mics are located about 60cm behind the centre hub, at 60-degree angles. The relatively distant placing of the rear mics introduces a degree of additional delay which helps to ensure frontal dominance in the reproduced sound stage. The polar patterns of the mics in SPL/Brauner's system can be changed between omni, cardioid and figure of eight, plus all the intermediate responses, to suit the required sound pickup and acoustic environment.
As is always the case in the audio industry, everyone has their own preferences and favoured techniques. However, it seems many engineers are using a similar basic microphone arrangement to the INA5 design, but perhaps with larger arm dimensions, or different angles, and perhaps using omnidirectional mics instead of cardioids, arrived at through experimentation in a particular situation.
Once again, there are no rights and wrongs here, only compromises, although it pays to be very careful about checking compatibility between surround, stereo and mono if your surround recording also needs to be used in one of those formats with fewer channels. With spaced mic surround arrays, the issues of stereo and mono compatibility in folddown mixes can become rather difficult. As the physical separation between mics increases, this situation is usually made worse, as the inherent timing differences when sounds are captured by multiple mics can create phasing when the various mic signals are combined in a folddown mix. A fantastically spacious surround playback can sometimes turn into an horrendously phasey mush in stereo... and the mono usually doesn't bear thinking about!