You are here

CUTTING EDGE

Dolby's DP571 Dolby E encoder and DP572 decoder.Dolby's DP571 Dolby E encoder and DP572 decoder.

As a musician, you may well have heard of Dolby A, B and C — but Dolby E? Dave Shapton introduces the format and speculates about its possible implications for music recording.

One of the things I try to do with Cutting Edge is report on developments outside what you'd consider to be the music technology industry. This is because music technology is more 'consumer' than 'industrial' (I don't mean that in any derogatory sense) and we often don't see new developments in the shops until they can be produced cheaply enough for a mass market. The same is definitely not true of the television broadcast industry, where price tags are often mistaken for serial numbers.

I was talking to a Product Manager from JVC Professional the other day, who told me about a development that will enable them to put eight channels of full‑bandwidth audio onto their Video Tape Deck's PCM stereo audio channels. It sounded interesting, so I pressed him for details.

The technique, it turns out, is called Dolby E. Well, like me, you've probably heard of Dolby — and 'E', for that matter. But I'd certainly never heard of Dolby E, most likely because it is not a consumer format. It's actually only intended for professional use, especially in editing suites and video post‑production facilities. What it does, quite simply, is compress a 5.1 mix (see 'Surround In Brief' box), plus an additional stereo pair, so that they can be conveyed on a conventional AES/EBU signal path.

Adding multi‑channel sound to a facility is an expensive process, but the cost can be minimised by using existing infrastructure as far as possible. This is where Dolby E comes in. Because it can compress a 5.1 surround mix, plus a conventional stereo channel, into a single AES bitstream, every existing stereo connection can effectively carry eight channels. So there's no need to modify patchbays, switches, and — most importantly — tape decks.

Another clever thing about Dolby E is that the compression 'packages' the data so that the packets start and end in the same place as the frames of video they inevitably accompany. So not only can you edit the compressed audio with the video, it ties in nicely with the way most video tape decks interlace audio with the video scan lines. If it didn't work like this there would be a possibility of clicks and pops every time there was a video edit, as a data frame was sliced into two meaningless pieces.

You Don't Do Heavy Metal In Dobly!

Everything you always wanted to know about Dolby Digital, surround, and lots more, at www.dolby.com.Everything you always wanted to know about Dolby Digital, surround, and lots more, at www.dolby.com.

Interesting though this stuff is, why am I dwelling for so long on a process that is intended for delivery rather than multitrack recording? Dolby E is a good, relatively high‑bit‑rate compression format that is claimed to be able to withstand 20 generations of compression and re‑compression, but you wouldn't want to use it, or any other kind of lossy compression scheme, for your multitrack masters.

So let's think for a while about what we, on the musical side of things, could do with a multitrack delivery format.

We could deliver proper discrete surround sound — and indeed we can already, with the new DVD Audio 'Standards'. But I think we can potentially go beyond that. The simple fact is that before long, storage will be so cheap that we could probably release records with as many constituent tracks as we like: 24 or even 48. New variants of DVD‑type discs with tens of layers exist in the lab, and the only reason they're not in production today is probably that there would be no obvious use for all that storage as an entertainment delivery medium. There wouldn't be much point in putting a literal clone of the original multitrack tape in the shops, because to do so would be to miss out one of the most essential parts of the production process: the mix. But there is a way round this. We need to look again at Dolby E.

Meta Matters

In addition to providing eight channels of audio compressed to the space occupied by two, Dolby E includes a mechanism for embedding 'metadata'. Metadata's a big and important topic and I want to deal with it more thoroughly in future articles, but for our purposes here, think of it as 'data about data'. The best way to understand this is to think of a theoretical digital loudspeaker: one where the only connection between it and the music source is, let's say, an S/PDIF cable. Sounds like a good idea, doesn't it? You'd be sending what was essentially a digital clone of the original material directly to the loudspeakers. Great. But how do you control the volume?

An S/PDIF digital audio signal contains only information about the waveform of the audio represented by the data. Feed it into an A‑D converter and the assumption will be that you want it reproduced at maximum level. The signal will have been optimised to make the best use of the 16 bits available (in the case of a CD). The trouble is that there is nowhere in the signal path to control the level. Essentially, there is no equivalent of an amplifer with a big knob marked 'Volume'.

You could emulate an amplifier by doing DSP (Digital Signal Processing) on the digital loudspeaker feed, to reduce the volume, but the resultant digital signal would have a much lower resolution than the optimum 16 bits. In fact, a very quiet setting could result in only three or four bits being used. You could only use this method of control on audio which had been digitised to a meaningful 32 bits or so — but this would be an inappropriate use of the technology, at best.

The sensible way is to use metadata, sent with the digital bitstream to tell the receiving device — the digital loudspeaker — how loud to play the audio. The metadata, in this example, is created by the familiar 'Volume' knob on the 'digital amplifier'. As you turn the knob, the metadata is multiplexed with the digital audio bitstream and decoded by the loudspeaker, which tells the (analogue!) voltage‑controlled amplifier in the loudspeaker how loud it should be. Note that the digital audio is still sent and decoded at its optimum resolution. It's not until it reaches the analogue domain that the digital metadata is used to control the loudspeaker volume.

Now, if we can do this with volume, why not do it with every other aspect of a mix? Not only levels of the individual tracks, but all the other settings as well: EQ, dynamics, reverb and other effects. Dolby E actually has the ability to send metadata to the end user's equipment to control the average level, dynamic range, and so on. But imagine a consumer format where the raw multitrack recordings (at their full digital resolution) are packaged with all the metadata needed to recreate the original mix — and all variations of it as well. Not only would it sound better, but you could give a certain amount of power to the user to create their own mixes. At a more mundane level, you could 'optimise' the mix to take into account room acoustics and other environmental factors.

General MIDI For Audio?

It's not that simple, though. I've missed out the rather important question of where the equipment comes from to recreate a complex mix in someone's living room. Where are they going to put all those racks of outboard gear?

Well, we're getting a bit ahead of the current state of the art here, but we are not that far away from being able to reproduce an entire recording studio's worth of gear in a desktop computer. And what you can do with a desktop computer today you could do with processor‑packed DVD player tomorrow. Fine: so we'll have the horsepower to do the processing — but how do we make sure it sounds the same as the producer intended?

I can think of two ways. First, you could have an arrangement similar to General MIDI, where metadata calls up a 'standard' effect: a bit like the way you know that any GM keyboard always has a distorted electric guitar on the same patch number. As you know, though, these patches, despite conforming to the general sound category, can sound pretty different from each other. Inevitable, really, when the person responsible for making the sample only had a written description of the sound to go on.

A better way would be to standardise the DSP algorithm used to create the effect. You'd be restricted in the studio to using exactly the same effects as the user, but at least you could guarantee that a 'multitrack + metadata' mix created in the user's home would sound the same as in the studio.

Absolutely the best way to do it would be to supply the DSP effects themselves as metadata. You'd need to have a standardised processor (or an abstraction of it — like a Java virtual machine) in the user's disc player or hi‑fi, that could run any process or effect asked of it, but if you did it like this you'd have the ultimate flexibility both in the studio and in the home, to either listen to the original mix as the producer intended, or to mess around with it at home as if you were in the studio yourself.

If you find all this stuff about mixes and metadata a bit far‑fetched, the surprising news is that we are doing something like this already. It's what happens every time we save a project in Cubase, Logic, or Reason.

It's Behind You! Surround In Brief

Let's look at what we mean by the term 5.1. This is something we see more and more of these days, but not everyone knows what it means. It's quite simple.

In the days before stereo, when we only had one ear, in the middle of our foreheads, there was mono. At least, that's what it sounded like, on a good day. Curiously, if you play mono through a stereo speaker system, and get one of the channels out of phase, it's not so easy to locate the sound, and you lose certain frequencies, particularly bass ones, as the equal and

opposite sounds from the loudspeakers cancel each other out. It's possible to use effects of phase in a beneficial way as well.

Stereo film soundtracks brought with them some positional information, and also opened the way for Dolby Stereo (what your television set might refer to as 'Dolby Pro Logic', which is essentially the domestic version).

Dolby stereo uses phase encoding to 'bend' the positional information in the stereo track so that, in addition to left and right, you have 'rear' and 'centre' information. These are not discrete tracks, and they overlap significantly, but the technique does succeed in wrapping the sound‑stage round the listener to a useful degree. (The 'centre' track, by the way, is used to anchor the on‑screen dialogue to the middle of the screen. Without this, to anyone sitting to either side of the screen, the dialogue would appear to be off‑screen as well.) Curiously, although, most Pro Logic setups have two rear speakers, they both carry the same — ie. mono — signal.

So Dolby Stereo is a useful advance over stereo for surround reproduction, and it's about as good as it gets when we watch analogue or digital TV in this country. The main limiting characteristic of Dolby Stereo is that it uses phase encoding to multiplex left, centre, right and rear channels into a stereo pair, and doesn't give you complete separation between the channels.

Why do you necessarily need complete separation? Well, you don't, in absolute terms, and if you're going to do your encoding and multiplexing digitally you might as well do it this way, because you can. In any case, it will tend to give you better positional accuracy, and fewer 'phantom' positional anomalies caused by environmental factors such as the room acoustic or the nature of the loudspeakers.

Dolby Digital encodes five full‑bandwidth audio channels: front left, front centre, front right, rear left and rear right, plus one with a limited frequency response for the sub‑woofer channel. Most film soundtracks have a version that is mixed for Dolby Digital, and for some time high‑end mixing consoles have offered 5.1 mix busses to allow for this.