Distributing Surround Mixes On The Net By Steve Marshall

Published January 2008

Until now, huge file sizes and the need for dedicated equipment have prevented the distribution of surround mixes over the Internet. That could be about to change...

A properly set-up surround speaker system allows the listener to localise sound from all directions. The idea behind HeadRelated Transfer Functions is to mimic the spatial cues that tell us where a sound is coming from.It's free, it can compress 5.1 surround files down to almost the size of a stereo MP3, the sound quality is excellent and there's a plugin for Winamp. There's also a binaural option for playing 5.1 on stereo headphones, so you don't even need surround speakers. Does all that sound too good to be true?

MP3 Surround could eventually transform the way we listen to music, and it probably will. Yet since its 2005 launch, it has been largely ignored by the music industry and the media — who, quite justifiably, think they've heard this before.

Grounded Surround

The story of surround sound in the home has so far been one of heroic failure. Over the last 40 years, a succession of brilliant innovations have arrived with fanfares — then faded away to public indifference. The Quadraphonic system, launched in 1970, was implemented beautifully: to record four synchronous audio tracks onto a vinyl record is no mean achievement. But there was no agreement between manufacturers on standardising the system; technical difficulties and conflicting formats meant that very few people felt confident enough to buy into it, and so Quad died. But Quad was never going to accurately reproduce a 3D soundfield anyway, due to a basic flaw in the maths: four speakers placed in the corners of a room cannot produce the 'phantom images' that stereo or 5.1 can. At 90 degrees, the angles are too wide. It can give some good effects, but it's not proper surround sound.

Next came Ambisonics, again born in the '70s and still popular among some audiophiles and academics. Ambisonics is genuinely brilliant — it's still arguably the most accurate and versatile way to record and reproduce a 3D soundfield, and can even reproduce height. Endlessly adaptable, the system can incorporate any number of speakers, in almost any positions. (The Soundfield microphone is based on Ambisonics and can record 360degree surround sound onto four tracks.) Ambisonics' failure has been attributed to its being promoted by the British governmentfunded National Research and Development Council — the same people who brought you the hovercraft. Or didn't.

From the '80s onwards came a succession of surround formats that readers may be more familiar with: Dolby Pro Logic, Dolby Digital, DTS and so on. The concept of placing the speakers in positions based on a listening circle was established; subwoofers provided the 'point one', and then innovation was simply a case of adding more and more satellite speakers to reach 5.1, 6.1, 7.1, 10.2...

Designed by the inventor of THX, the 10.2 format is claimed to be 'twice as good as 5.1' and includes two channels for height. The latest and most lavish surround format, though, is 22.2! With two entire levels of height and two subs, it is the companion to Ultra High Definition Video and I don't expect many of us will be buying it just yet.

Surround Today

So who does buy this stuff? Surround sound is regarded by the maledominated hifi world as having an extremely low WAF, or 'wife acceptance factor'. How many people do you actually know who have a 7.1 system in their home? I know just one and yes, he's a bachelor. But surround sound is good value for money. As the number of surround channels has increased over the years, so the sound quality and dynamic range have improved, yet the cost continues to fall. So if surround sound has never been better or cheaper, why isn't it more popular?

It's not only surround sound that's experienced this increase in quality: digital stereo equipment has been going the same route. Super Audio CD and DVDAudio offer much greater bandwidth and dynamic range than CD, yet today's most popular format turns out to be lowbitrate MP3! How can this be?

The only answer is that the hifi approach actually leaves most people cold. Most people are not stupid; they know that MP3 is not such high quality as CD but they simply don't care. They don't want or need super-high fidelity; they want technology that fits in their pocket and is cheap, or preferably free. MP3 fits the bill precisely. The high data-compression of MP3 results in smaller files and makes it feasible to send tracks over the Internet quickly and easily, thus creating a new market for music sales. Hifi hasn't gone away, but it has become something of a niche market, like surround sound.

MP3 Surround

The public's unwillingness to invest in surround sound has not dented the audio industry's conviction that surround is the future. And now there is finally a chance that the industry could be proven right! For some years, surround capability has been creeping into music recording packages. Logic comes well equipped for up to 7.1 surround, and Steinberg's audio engine for Nuendo and Cubase was 'engineered from the ground up' for surround. PCs have been coming equipped with 5.1 surround cards for a while now — all of this for no particular reason, other than a commonly shared hunch that surround would somehow happen eventually. Enter MP3 Surround.

In this, the audio industry seems finally to have come up with a surround product that there is a demand for, and one that uses existing hardware. Investment in new equipment is optional but not essential: the important part is that the new medium is driven by free software. MP3 Surround was developed mainly by Fraunhofer, inventors of the original MP3 codec, and is currently available as a free evaluation download from their web site (www.iis.fraunhofer.de). The free software is available for PC, Mac or Linux. What's more, MP3 Surround is completely backwardscompatible: surround files will play on any of the previous generation of MP3 players, albeit in stereo only.

There are three main parts to the MP3 Surround system. First is the encoder/decoder or 'codec' (which means 'compress and decompress'). The encoder is the cleverest part of MP3 and is the result of detailed research into psychoacoustics, combined with some very serious numbercrunching. By removing elements of the original recording that are inaudible, file sizes can be drastically reduced. In MP3 Surround the compression is astonishingly powerful, resulting in 5.1 surround files that are only about 10 percent bigger than stereo files!

The decoder comes separately and there are two versions: a standalone MP3 Surround player and a plugin for the freeware Winamp that enables streaming. The second part of the system is Ensonido, a binaural simulator that allows the playback of MP3 Surround using only stereo headphones, using HRTF (Head Realted Transfer Function) technology to simulate the effect of a 5.1 soundfield. The third component of MP3 Surround is MP3 SX or Stereo eXtended. By analysing the ambience of a stereo recording, SX can synthesize a pair of rear channels and create artificial surround sound.

Downloading and installing the MP3 Surround software takes only a few minutes, and already there is some free music to listen to on the download site. Fraunhofer's business partner, the US Thompson company, has a much wider variety of tracks and styles on their site at www.all4MP3.com.

For best results you'll need a PC or Mac with a 5.1 soundcard, running five speakers and a subwoofer. If you don't have a suitable soundcard you can still listen in binaural surround by using Ensonido with headphones, but to appreciate how good the codec is, you should use speakers. The interface for the MP3 Surround player is as simple as it can be: you just drag and drop files onto it and select a playback method from 5.1 surround, Ensonido or stereo. Sound quality is extremely good, particularly when the amount of data compression is taken into account. I expected a thin, grainy sound but heard quite the opposite: the sound is solid and full, but with far more subtlety and detail than I would have thought possible.

MP3 Surround does have weaknesses, but they're really only apparent when directly compared with systems such as DTS or Dolby. MP3 Surround sounds impressive, but follow it with a DTS track and you'll immediately notice the rather deeper bass and brighter top of DTS. Dynamic range is also affected, but again, this is not obvious until systems are compared. It is only to be expected, though: MP3 Surround runs at a constant bitrate of 192 kilobits per second (kbps) but DTS runs at 1411kbps, while Meridian's 'lossless packing' system (MLP) used on DVDA discs runs at a variable bitrate of between 6000 and 9000 kbps! Take all of that into account and MP3 Surround is all the more amazing.

Making Surround MP3s

Unlike other surround codecs, MP3 Surround uses 'Binaural Cue Coding', in which the signal is represented as a single mono sum channel plus some difference data that is used to reconstruct the other channels. The MP3 Surround encoder is very easy to use. Easy, that is, when you've already made a multichannel WAV. Then it's simply a matter of dragging and dropping the WAV onto the MP3 Surround interface and naming the new file. Initially, I objected very strongly to this method, because I couldn't actually make a multichannel WAV and had to get a friend to do it for me! I would have much preferred it if the encoder could just be fed with six mono WAVs, but this is the demo version, and it is free.

My gripe is that multichannel WAV is only supported by the latest version of most music packages, and that users (such as myself) may not want the disruption or expense of upgrading. For simply turning mono WAV files into multichannel WAV's, Fraunhofer technicians recommended Copyaudio, part of the AFSPlibrary, which is freeware. I would regard it more as 'boffinware', as I found it completely unusable, but you might fare better. However, if you have the latest version of Logic or Soundscape, or any of the Steinberg products, making multichannel WAVs should be very simple.

Comparing DTS & MP3 Surround

For the past few years I've been producing my own work in surround sound, using binaural location recordings made with a dummy head (or, more recently, with mics in my ears). I convert the binaural material into DTS 5.1 and make a kind of hifi musique concrete with extremely vivid spatial imaging — sounds come from every direction, even from overhead where there are no speakers. I have to sell it on disc because although the quality is superb, the file sizes are very big.

In order to put binaural samples on the 'net, I've already converted binaural material to stereo MP3, and with great success, even at low bit-rates. MP3 Surround will only code at a constant bit rate of 192kbps, which is considered to be hifi by most MP3 fans (MP3 can run as high as 640kbps, but rarely does). Fraunhofer's published listening tests indicate that most people can barely discern a difference in quality between their compressed version and the original audio, so I wanted to hear for myself, and directly compare MP3 Surround with DTS, using my own material.

On my Bilocation web site is a short demo mix in DTS 5.1 that can be downloaded and burned to an audio CD for playing on a 'home cinema' surround system. To me, the DTS compressed version sounds exactly the same as the six original WAVs, but it's a big file: at three and a half minutes in length, the DTS file is 36MB. I set about making a surround MP3 of the same mix so I could compare the two.

The first step was to go back to the original six WAV files — one for each channel of the 5.1 surround mix and all of exactly the same length.

The next step was to interleave the six WAVs by converting them into a multichannel WAV. This must be done in the right order: Left Front, Right Front, Centre, LFE, Left Surround, Right Surround. This produced a big uncompressed file of 107MB (all my work is done at 44.1kHz but MP3 Surround will also accept 48kHz). The final step was to drag the multichannel WAV onto the MP3 Surround window and marvel as the file was compressed in seconds, down to an amazing 4.86MB! And that's all there is to it.

Comparing the two formats was now simply a matter of switching between the DTS version playing from disc and the MP3 Surround version playing from the PC's hard drive. The Bilocation mix contains a lot of spatial movement, all of it 'real' rather than panned. MP3 Surround reproduced this quite well, with all the imaging pretty much as it is in the DTS version. There were differences, though, mostly with the 'above' effects. There was still a sense of 'above', but it was a bit vague, and not always in the right place. I assume this was due to the MP3 process removing too many 'unnecessary' frequencies. (I do realise this test is a bit unfair, as 5.1 isn't even supposed to have height!) One part of the track was recorded in a huge dome in India, where children clap and shout into a natural repeat echo. Although this still sounded good in MP3 Surround, it was here that the limitations showed up. The echoes that faded to silence were OK, but at one point there is a maxlevel (0dB) handclap, right next to the mics, that in the DTS version opens up the acoustic and explodes into ambience. The MP3 Surround version of this is disappointing and sounds very obviously 'compressed'. But really, that is my only criticism! Also, my material is hardly typical. If you'd like to try this experiment for yourself, both of the files are still available at www.bilocation.co.uk.

Look, No Speakers

One very exciting addition to the MP3 Surround system is Ensonido. Based on HRTF technology, it can simulate the effect of a 5.1 speaker system binaurally, for headphones. HeadRelated Transfer Functions have been around for a long time (at least 30 years), but thus far they have been used mainly as an acoustics research tool. They are algorithms that mimic the contribution of the pinnae — the shapes and folds of the human outer ear — to our hearing. As sounds approach us from different directions, they are 'coloured' by the pinnae, in a way that the brain can decode as spatial cues. This is the principle behind binaural recording: the HRTF information doesn't have to be artificial, it can be obtained simply by stuffing a small pair of mics in your ears!

Laboratory HRTFs are made using a special dummy head with microphones inside it. The head is placed in an anechoic chamber to eliminate room ambience, and test tones or broadband noise are played in many different positions all around the head. The resulting set of recordings can then be analysed and subjected to Fast Fourier Transformations to extract the spatial information. This process is not cheap, quick or easy; until quite recently it was almost impossible to obtain HRTFs, and none were in the public domain until 2001.

Ensonido comes readyfitted to the MP3 Surround player and the Winamp plugin. There are four HRTF sets to choose from, and users are advised to experiment in finding which of the four best suits their ears. In my case, none of the four options gave very good results. I don't think my ears are particularly unusual, but although I could detect a difference in timbre between the four, I didn't really feel that I was hearing surround sound with any of them. Although the sides were good, I felt the front and back imaging was very shallow, only extending out by a foot at the most. I've heard far better imaging with oldfashioned stereo binaural, though friends have reported better results and I don't know why that should be.

I asked Fraunhofer how the four HRTF options differ: are they differentiated by size, sex or ear shape? It appears that the modes derive from composite HRTF measurements taken from dummy and real heads, combined with different room acoustics, and were "selected in listening tests to provide a good combination of localisation and spectral neutrality".

And what type of headphones are meant to be used? Open, closed or earbuds? This is always a problem area for surround sound listening. Closed headphones can physically distort the pinnae. Open headphones are preferable, but the pinnae will colour the sound to some extent. Earbuds, on the other hand, bypass the pinnae altogether, but tend to be rather compromised in sound quality due to their size. And although it's rarely mentioned, virtually all headphones, even the most expensive hifi ones, have a notch at 5kHz as part of their design. Fraunhofer had the difficult task of optimising Ensonido for all these factors, but they say that Ensonido works best for open headphones and earbuds with as flat a response as possible.

The evaluation versions of the MP3 Surround encoder and decoder are simple to use.How Does MP3 Surround Work?

The MP3 format actually dates back to 1992, and has been in common use for over 10 years. For better or worse, it's fair to say that the MP3 format has transformed the music industry in that time. The format has proved to be extremely versatile, supporting all common sampling rates from 16 to 48 kHz and bit rates up to 320kbps. Because MP3 is so well established, it made sense to develop a new format for surround sound that would be completely compatible with existing stereo MP3, as well as other surround systems and playback equipment. This was made possible by the technique of 'Binaural Cue Coding' (BCC). In essence, this is not very different from the Middle + Sides (M+S) technique used historically for recording stereo, but it is a radical departure from previous surround codecs. BCC combines all the separate audio channels into one 'sum' channel, which is augmented with spatial 'side information' — in other words, instructions on how to separate them again. File sizes are drastically reduced, because the spatial side information for each channel is relatively small.

A fairly wide range of MP3 Surround recordings can be downloaded from Thomson Media's web site.Downmixing is central to the BCC technique and is what makes it compatible with existing stereo MP3. The fact that the 'sum' channel has been separated from the spatial side information means that it can be reassembled with any number of channels. It can be treated as a mono downmix to be played on its own, and a stereo mix can be easily constructed. This can be done automatically, but there is also an option in the system for a manual or 'artist' mix instead, though this is not yet available in the free evaluation software.

Although it currently only works with 5.1, the MP3 Surround format should be able to cope with any number of channels. If you started with 22.2, it should be possible to construct any format between that and mono, which makes the codec ideal for radio and video use. MP3 Surround is already in use by the latest generation of Internet video sites offering HD and surround sound. The new DivX video codec uses MP3 Surround, and the French commercial station Radio Classique has been using it for streaming surround sound from the Internet since 2004.

Is This The Future?

For many years, musicians and producers have been searching for a way to sell surround mixes over the Internet, but until now, file sizes have just been far too big to make this practical. MP3 Surround thus has an awful lot going for it. File sizes are only 10 percent bigger than stereo, sound quality is excellent, new hardware is not essential, the system is expandable and it is backwardscompatible with existing stereo MP3. Many audio manufacturers have already taken out licences, including Philips, Sony, Samsung, Yahoo and Yamaha. Funai launched the world's first MP3 Surround DVD player this year and more hardware products are on their way. Steinberg have worked closely with Fraunhofer for some years, and Cubase 4 and the upcoming Nuendo 4 offer full support for MP3 Surround. The codec has already been accepted by the video industry and is in use by several Internet radio stations, while gaming applications are on their way.

There's no word yet from the oldschool record industry, whom I think we can assume know nothing about MP3 Surround and probably don't want to. Who cares? We are the record industry! If anyone with a home studio and a broadband connection really wants to, they can sell surround tracks right now.

Ensonido is great in principle, as it solves the problem of all those speakers. It's a great shame that it doesn't actually work, for me at least! It may work for many other people — I don't know yet. But this is software and it can be improved. My feeling is that surround sound for headphones could be hugely successful when thirdparty developers start to offer customised or modified HRTF sets for Ensonido.

And the quality issue? The audiophile in me doesn't honestly really approve of MP3, but I want to sell surround recordings as Internet downloads for a mass market. Only MP3 Surround makes this a practical proposition, so I accept the slight loss in quality. And if anyone else shares my love of hifi surround sound, they can always buy the same tracks on disc: they'll cost a lot more, they'll have to be posted, but they'll be actual 'objects of desire', with nice packaging that you can store on a shelf, in a collection. I can envisage a market where many different formats and resolutions are offered, some physical, as well as differentsounding mixes of the same track. Why not? The 'long tail' economics of the Internet suggest that there may be more profit in lowvolume diversity than in a handful of blockbusters. If the medium is effectively free, as with MP3, than diversity is no problem. Stereo MP3 was not an overnight phenomenon: it actually took six years to take off, even though a vast body of stereo material already existed. So it will take time for a new format to grow, but let's get busy mixing in surround — it does seem to have a future at last. And really, for musicians and producers struggling to make a living in a chaotic market, this is a gift!

Transfer Your Own Head

Creating HRTFs is a complex and expensive business. Here at Salford University, a dummy head is placed in an anechoic chamber and test tones are recorded from strategically placed speakers. A great deal of research is going into Head Related Transfer Functions at the moment, largely because of the mobile phone industry. The advent of stereo Bluetooth means that some people will soon be wearing stereo headphones most of the time! Personal gadgets are all rapidly merging into allpurpose 'devices' that combine phones, music and video players, Internet browsers, games consoles and so on. With HRTF technology comes the capacity for adding surround sound and 3D 'mobile environments'. The idea is that you can wear a stereo headset and take phone calls, listen to music, chill out in a 3D tropical rainforest — that kind of thing. Some systems will also incorporate noisecancelling to remove unwanted ambient sound.

All of this is possible even without the use of headphones. Spatial sound environments can actually be projected into the air, simply from a pair of microspeakers an inch apart, mounted in one end of a phone. The old but effective 'transaural' technique has been enhanced by incorporating HRTF data: a combination of phase cancellation and time delays is used to eliminate crosstalk between two speakers, and to simulate the effect of wearing headphones. Once each of our ears is receiving only the signal that is meant for it (and none of the other channel), we're into binaural territory. HRTF coding can then be used to make the sound appear to come from much further away, and even from bigger, virtual speakers.

Amazingly, spatial sound is even possible from only one loudspeaker! A 'dipole' transmits sound from both the front and the back surfaces of a driver, radiating in a figureofeight pattern (the two outputs will naturally be out of phase with each other). It's then possible to process the resulting output using HRTF algorithms, simulating the effect of several speakers placed around a room.

It seems likely that some form of HRTFbased transaural technology will eventually become the most convenient way of listening to surround sound at home. All those messy satellite speakers, stands and cables could be eliminated by only using one or two speakers inside a TV to simulate a 3D surround system. With HRTF processing the effect can be spectacular: a convincingly solid soundfield is projected out into the room, which now appears to be full of correctly positioned surround speakers. Such systems have been tried in the past without much success, but as the technology continues to get better and cheaper, we can expect to see affordable systems that actually work.

But if all these techniques depend on Head Related Transfer Functions, don't we have a potential problem? Our ears are all different. Ears come in many shapes and sizes, and, like fingerprints, none are exactly the same. One approach is to search for the universal HRTF that suits most people, and there are already many patented variations on this theme.

Alternatively, technology is available that can tailor a unique HRTF to each individual. Either by a series of simple listening tests, or by inspecting a photo of your ear, it is already possible to generate custom HRTFs, and we can expect to see some radical developments in this area in the near future.

It will eventually be possible to modify Ensonido with custom HRTFs, although this has not been implemented in the evaluation version. Fraunhofer do stress though, that Ensonido is not a pure HRTF processor, and that: "It combines the acoustic reception of a human head, room acoustics measurements and simulations and equalisation. The individual HRTF measurement is only one element in the design of Ensonido."

Making HRTFs

I visited the acoustics department of Salford University to see a typical setup for measuring Head Related Transfer Functions. It's not cheap or easy to do! An #8000 Bruel & Kjaer dummy head, fitted with measuring mics, sits in the centre of an anechoic room that is fully floating and acoustically isolated from the rest of the building. The foam wedges on the walls have to be as long as possible, to absorb the lower frequencies (they absorb at quarter wavelength). Even the floor is absorbent and has thick wire mesh suspended above it to walk on. This room is totally anechoic down to 100Hz and cost almost a million pounds to build!

Point One

The 'point one' of surround systems is the LFE channel, socalled because of its very limited bandwidth compared to the other channels. LFE actually means 'low frequency effects' and it is supposed to be used for occasional film sound effects, such as explosions — not kick drums. When I had originally made my Bilocation demo mix I didn't have an LFE track, as the Minnetonka software that I'd used for DTS coding doesn't require one. The MP3 Surround encoder expects an LFE track whether it's used or not, so I had to record a WAV of silence that matched the other five in length and sample rate.

MP3 Stereo eXtended

When the world gets used to MP3 Surround, ordinary stereo will seem far too flat and boring for us to listen to, right? I guess that's the reasoning behind MP3 SX. Simple to use, it can create new surround mixes from stereo: you just drag MP3 files onto the interface. Allegedly "the ambience of the original track is analysed and two rear channels are created to match". I still have lots of old mono records and have never felt the need to convert them to artificial stereo, but I tried to get enthusiastic about MP3 SX anyway. The word 'ambience' is very tantalising here, but misleading. I imagined that dry sounds would remain unchanged in the stereo mix, and that reverbs would become threedimensional: quite an exciting prospect. So I made some MP3s to test it with.

David Bowie's 'Heroes' is well known for its innovative vocal treatment. The vocal starts off quite dry, but when Bowie starts to belt it out after about three minutes, the room acoustics open up and he seems to be louder. It's a great effect in stereo, but converting the track to MP3 SX had the exact opposite effect to what I'd imagined. The whole track was generally 'smaller' and there was a very unpleasant effect like a very slow, pumping compression where the overall level would suddenly drop for no reason. Tony Visconti would not be happy.

I tried some other tracks. Medeski Martin & Wood are a Hammond organ trio with a trademark sound that is rough and sounds very 'roomy'. Here again, the ambience was reduced by MP3 SX. Only cymbals seemed to be expanded into the rear channels, and they seemed to be coming through a cheap '80s chorus pedal. Brian Eno's 'Shadow' from the On Land album was the most successful track I tried, but only because it features a sound like tropical insects that happens to be at the right frequency to get copied into the rear channels. Tracks by Joe Meek and Abba, featuring spring and plate reverb, were very disappointing: hardly anything came out of the rear channels except a bit of hihat.

I give it two out of 10... See me later.