Binaural recordings can sound amazing on headphones, but don't work very well on conventional stereo speakers. Can they be adapted more successfully for surround systems?
When 5.1 surround sound started to become a viable consumer format, I got very excited about the possibilities. I expected lots of amazing albums to appear that really explored this new medium. Instead, look what's happened: surround remixes of Led Zeppelin, Fleetwood Mac, the Beatles... Just as when CD first appeared, the record companies are concentrating on their back catalogues. I really wanted to hear something that wasn't just a remix of a standard music album, so I set about making the surround album that I wanted to hear. I wanted sound that came from different directions around me, but I also wanted to hear sounds coming from above. This was a feature of the Ambisonics system invented 30 years ago, and I eventually found that it's possible in 5.1 surround — without hanging speakers from the ceiling! I also discovered that it's possible to make recordings with only two microphones and convert them into 3D surround sound, and I used both techniques in the creation of my album Bilocation.
Bilocation is an album in 5.1 surround. If anything, it's 'about' the magic of acoustic spaces. It's made up of many different recordings that I made over a long period of time. Some of the recordings are musical, and some are records of places I've been. I'm very fond of the stereo test records of the '50s and '60s, with their steam trains and fireworks, and wanted my album to have more than a hint of that. When I bought my first portable DAT recorder over 10 years ago, I set about recording everything around me. I lived in East London at the time and intended to eventually move, so I recorded the traffic, the trains, the helicopters. I wanted to eventually listen to them in a quiet place in the country. Then when I was commissioned to compose music for Channel 4's Wild India I took a trip to India with my DAT machine, to research Indian music. I made an epic five-week trip around India by train and came back with hundreds of hours of recording. Some recordings were of music, but most were of the amazing sounds of everyday life in India — parrots in the street, chanting inside the temples, the noisiest traffic I've ever heard. I also recorded lots of Indian radio, which can be very bizarre!
As soon as I got back from India I went to the huge annual fair in Hull and made lots more recordings — some of them actually on the rides. Eventually I did move to the country, and discovered a Saxon church with an amazing acoustic. The church is in the middle of a field and is hardly used, so it's practically empty and has a reverb better than anything electronic. It doesn't have electricity, though, and I had to work with batteries. I began playing back music and sounds in the church and rerecording them with my portable DAT. Snippets of Indian radio were played back in the church, under canal bridges and in empty farm buildings. The collection of sounds kept on growing.
My setup for recording whilst travelling has been the same for over 10 years, and it's never let me down yet. My portable DAT is a Casio DAR100. I chose this machine because it's reasonably small and light, and can be set to record at 44.1kHz. Replacement rechargeable batteries can still be found for it (direct from Casio). Its only disadvantage is that it's not possible to use dry batteries in an emergency. The mic amps, though, are very noisy, and not really usable for quiet recordings, so I use a custom-built high-quality preamp. This cost me about £80 and was well worth it. This means I can use the line inputs, which are very quiet. Minidisc recorders are smaller and lighter than DATs, but I took a brand new Sharp Minidisc to the Pyramids a couple of years ago, and it totally failed to work when I got there, so I'll be sticking to DAT in future!
I still use a pair of modified tie-clip electret mics bought from the now sadly defunct Tandy. I think they were only £30 for the pair. Electret mics like this are usually powered at 1.5 Volts, but this is just for the manufacturer's convenience — it's easy to fit a 1.5 Volt battery in a small box — and most electret mics can actually be powered at 9V, which dramatically improves their performance. They're usually omni-directional as well, which means a flat frequency response of at least 30Hz-20kHz. The tie-clip fitting is also useful when travelling. I always carry a tatty old canvas shoulder bag with my gear in it — it's as well to not attract thieves! When I want to record something, I just leave the DAT and preamp inside the bag and clip the mics onto the front of it, spaced at about six inches to a foot apart. This means I don't attract attention, and don't end up with recordings of people asking what I'm doing. It also damps the mechanical sound of the DAT recorder working. Sometimes I've rolled a towel up to something like a head shape, and clipped the mics where the ears should be. It's crude, but it works... My Tandy PZMs are also converted to run at 9 Volts. They are good, but a bit heavy when travelling and noisier. For a time I used them fixed to both faces of a ping-pong bat, with reasonable results.
Many of my recordings are binaural — recorded with mics either mounted either where the ears should be on a dummy head, or on a human head. (Some purists would argue that this is 'pseudo-binaural' as they put the mics inside their ears!) However it's done, the results can be startlingly real when played back on headphones. Sounds can be located around, above and below the head. But when played on speakers, the effect is not so great. The problem is crosstalk from the speakers. With headphones, each ear can only hear the sound recorded on that side of the head, but speakers spread the sound, and spill the sound meant for the left ear to the right ear, and vice versa. This seriously muddies the three-dimensional imaging.
One solution that's been around a long time is transaural processing. Take a standard pair of speakers, placed at the correct stereo angle of 30 degrees each side of centre. The output of each channel is fed out of phase to the other to cancel out the crosstalk. A small delay is also added to the out-of-phase component, to allow for the length of time it takes for the sound to pass from one side of the head to the other; if separate bands of frequencies are delayed by differing amounts, the effect is even more realistic. Much research is being done in this area, and it is widely reported on the Internet.
The transaural technique has one serious disadvantage: the 3D imaging is only really effective in a tiny 'sweet spot', which can sometimes be only a few inches wide! What I realised (and it's pretty obvious) is that with a 5.1 system the 'sweet spot' can be made much bigger, by using two speakers on each side. If the front and rear speakers are fed the same signals on each side, the sweet spot can be several feet across. I set up a 5.1 mixer on my Soundscape R*Ed system, and bought a fairly cheap surround decoder and speakers. I made sure I got a decoder with six external inputs for the surround channels, and started experimenting.
My most spectacular binaural recording is of a late-night police car and helicopter chase in London. On headphones, the effect is amazing — the helicopter really appears to be above your head. I played the DAT recording into Soundscape, and copied the take. On the copy I swapped the left and right channels, and reversed the phase. I then experimented with delaying the out-of-phase, channel-swapped copy. First I dragged the takes around on screen by small amounts, but I could hear an echo. Then I did some sums. Sound travels at about a foot per millisecond, so the delay across a human head should be around half a millisecond at the most. Soundscape has a Sample Delay module that can be dropped into the mixer channels. It allows a track to be delayed by any amount from 1 to 225 samples. At a sampling frequency of 44.1kHz, one sample is one 44100th of a second, so a millisecond is 44.1 samples. I found that delays of up to 20 samples were the most effective, but knew that in the higher frequencies there is more complex stuff going on. What our brains decode as directional information is derived from the varying audio spectra and delays produced when sounds come from different directions. The complexity of it increases with frequency. Many acoustics labs have developed intricate algorithms for dealing with this, but I didn't have access to them. So on the out-of-phase copy, I just EQ'ed the highest frequencies away with a low-pass filter. The result was like magic — a helicopter flying around my studio! I tried other recordings of crowds, spinning round on fairground rides, and so on, and they all produced spectacular results. The spinning sounds actually appeared to pan around the speakers.
I've used Soundscape for the last 10 years or so, but had never tried mixing for surround before. My system is a Soundscape R*Ed, with 32 audio tracks and 16 analogue inputs and outputs. I used six of the analogue outputs, and connected them to the phono inputs of my Sherwood decoder. I then made a mixer for surround. This is very easy in Soundscape: modules are simply selected and dropped into place. I built the mixer by starting with stereo input channels. Each stereo channel strip used aux sends for the different surround channels: a stereo send to the front pair (L and R), a mono send to the centre (C), another stereo send to the rears (SL and SR) and a mono send to the subwoofer channel (LFE).
With some of the tracks, I wanted to have a little stereo image that appears to sit a few feet in front of the centre speaker — a bit like a small stage. I used this for some of the material that was recorded from Indian radio and rerecorded in the church. To do this, I put the stereo source recording into two mono strips, each with only the 5.1 module in it. This appears on the channel strip as a small square that can be made bigger when clicked on. A dot representing the pan position can be moved around the square, and it outputs to all channels apart from the subwoofer. The module can be automated, as can every element of the mixer. One of my setups needed a 'V' formation between the rear channels and centre, so I dropped in four more mono strips that each had mono sends to all of the six channels. This way I could spread stereo tracks across any positions I wanted.
The transaural processing was done by making two stereo mixer strips. The first was for the original binaural recording, and was initially set up to send equally to the front and rear channels. A copy was made of the binaural audio in the arrange page, and assigned to the next pair of tracks. The copy was then phase-inverted and channel-reversed. The mixer strip for the copied channel was also fed equally to the front and rear channels, and contained a Sample Delay module and a low-pass filter. The delay was adjusted according to what worked best for each piece of audio. Delay times varied between 9 and 224 samples (at 44.1 samples per millisecond). I made sure that six tracks of Soundscape were left available to mix the six busses down onto, and the last few strips of the mixer were for monitoring those channels. Each channel strip also ended in a fader that fed another pair of busses for making a stereo mix. The mixing was done mostly by setting levels within the arrange page — I never move faders. The only thing that needed adjustment throughout the mix was in the transaural channels. Some sounds needed a different delay, and the front-to-back balance varied too. So I used the Soundscape automation in snapshot mode at the start of each of these sections. When I was happy with the surround mix, I digitally bounced it onto the six empty tracks. This produced six Soundscape takes of exactly the same length, which were converted to WAVs for encoding.
I had come up with the idea for Bilocation some time ago, as a stereo project, and had experimented with putting my sounds together in different combinations. I had to be able to make sense of the vast amount of material I'd recorded. I started by editing my best sounds, compiling them onto 10 CDs, and I made detailed track sheets. This all took ages, but was worth the effort. One thing in particular had put the stereo version of Bilocation on hold: mixing different recordings made in different acoustics together sometimes produced a muddy and confused result. In surround, by contrast, many different sources can be combined and still be heard separately. I found that the recordings made in the old church would fit with virtually anything, as they have such a distinctive acoustic character. Other sounds were difficult to combine, and some just sounded best on their own.
The stereo version of Bilocation had started with a recording of surf in the Indian Ocean. I'd EQ'ed the top off it, making pink noise that still sounded like surf. The idea was to 'wash out' the listeners' ears, so they could then hear more clearly. Some recording engineers use this trick, and listen to pink noise on headphones before a mix. This didn't really work in the 5.1 version — I needed something more readily identifiable. So I tried rain instead. Across the front left and right speakers I put a recording of rain in a London back yard, with water gurgling down the drainpipes. Then I put a recording of rain hitting a window across the rear left and right speakers, and a different bit of rain into the fronts. The result was very filmic, and firmly established England as a starting location! I then took a binaural recording I'd made at dawn in a banana grove in South India, and did the transaural treatment on it. This filled the room with tropical sound — parrots, insects, and so on. When the rain was very slowly crossfaded into the banana grove the effect was magical; rather like in the children's book Where The Wild Things Are, a forest appears to grow in the room.
It took me about three weeks of working every day to compile the 40-minute final version of Bilocation. I loved every minute of it! It was rather like editing a film. I'd find some sounds I liked from the 10 CDs, and just play with them until I found interesting combinations. I built up sections of a couple of minutes long, then found ways of joining them together. I really wanted 'magical' transitions between scenes, and this was achieved by a lot of finicky work with cross-fades.
I bought my surround system very cheaply, in the Richer Sounds January sale. For £80 I got a Sherwood tuner/decoder, with six RCA phono inputs for the individual channels. The Pioneer speakers are five satellites that go down to 100Hz, and a large subwoofer. All are solidly built from metal, and cost me £70. My studio is very large, so I set up my speakers in a 14ft circle, positioning them using string. I then mounted them all at ear-height on mic stands. I put a hefty chair exactly on the sweet spot, and marked its position on the floor.
When the mix was finished, I wanted some 'grot boxes' to check that the same effects could be obtained on a smaller, cheaper system such as a computer. I found a set of five speakers and a subwoofer in Safeway for £30! I set these up in a six-foot circle around my PC, and found that although they are over-bright and 'tinny', the surround and overhead effects are just as powerful. The main difference I found was that the angle of the rear speakers could be much greater. With this smaller circle an angle of 120 degrees was preferable. I suspect this has something to do with the angle of dispersion of the two different sets of speakers.
In Bilocation the location sounds are often combined with musical sounds made in different acoustic spaces. I'd sometimes set up stereo ping-pong echoes on a synth sound or an E-bowed guitar, and place the speakers wide apart in a live space. This could be a concrete garage, an old church, under a brick arch — anything interesting. Then I'd play live and record it to DAT. One section of Bilocation has a beautiful pedal steel guitar played by a friend in my studio. I set up speakers with a ping-pong echo at opposite ends of the empty church and rerecorded it binaurally. I had also made a set of recordings in the church of myself 'overtone singing'. This is a way of singing two notes at once, by changing the shape of the mouth. It produces a drone with changing 'whistle' tones that sweep through the harmonics of the voice. I'd recorded many different notes in the church, each sung in different positions, and combined them in Soundscape into slowly evolving chords. As the voices are actually recorded inside the reverb, the effect is quite eerie. The overtone singing is used on two sections. One features a violent electrical storm in England, where I almost got killed! When the storm started, I'd grabbed my DAT and mics, and an umbrella, and set off to a quiet field to record. The thunder got louder and closer, and ended in a lightning strike in the field — about 20 feet from me! When transaurally processed, the thunder can be heard moving overhead, and the final strike is pretty scary.
Another section is my homage to the stereo test records, where a huge steam train materialises in your living room! I had been on a overnight train in South India, when it stopped at a tiny country station. I started to record night insects out of the open carriage window. Just as I started to record, another steam train pulled slowly into the station, on the next track to mine, and stopped right in front of my mics. The passengers are chattering one keeps spitting out of the window. Then the whistles blow, and the train slowly pulls away and dissolves again.
My favourite recording of all is of a wonderful building in India, the Golgumbaz in Bijapur. This is a huge square building topped by a 38-metre dome, the second biggest in the world. There is a 'whispering gallery' at the top, similar to St Paul's, but whispering is futile, as most people come in and shout! The bottom of the building has an open arch on all four sides, and thousands of swallows go whizzing though, twittering loudly. The sound at the top of the dome is amazing! The dome produces a clear echo of about half a second, which repeats — allegedly 12 times. The echoing shouts and screams of children combine with the swallow calls, and produce something very musical and beautiful. The recordings I made were binaural, and when transaurally processed the effect is just like being in the dome. This was then combined with some musical loops I made many years ago, binaurally recorded onto analogue tape. I had made a large cross by clamping two aluminium ladders together at 90 degrees, and suspended it from the ceiling in a large concrete room, like a carousel. I hung four speakers onto the ends of the ladders, and played back tape loops into each of them. I had several sets of loops that, when played together, formed chords. A dummy head was placed inside the circle of moving speakers. When the arrangement was spun around, the loops played arpeggios that had a Doppler shift and very strong spatial imaging. This works beautifully when combined with the echoing dome.
Another favourite is the sound of a galloping horse — recorded from on the horse! In London I used to share a horse with a woman who became terminally ill and couldn't ride any more. I wanted to record the horse going round her favourite route in Epping Forest, so she could play it on a Walkman in hospital. This is tricky, as a galloping horse goes at about 25 miles an hour, and wind noise is a big problem. I took my Tandy PZM mics, and drilled a hole in each corner of the base plate. I bent some pieces of thick wire into arcs, and hot-glued them into the holes, so they spanned opposite corners and formed something like the frame of a mountain tent. Then I stretched two layers of nylon stocking over the top. I tested the mics with a car, and the wind noise didn't appear until the car got to 35 mph. Then I fixed the mics to my riding boots and set off with the DAT. The results were excellent, but as the mics were nearly a metre apart, hardly binaural. This was corrected by converting the recording to M&S and narrowing the stereo image. When transaurally processed, this sounds disturbingly like the real thing!
The finale of Bilocation is the spectacular flying helicopter. Psychoacoustics is a very powerful thing. When people listen to the helicopter recording cold, they can all hear it flying above their head. But after listening to 40 minutes of surround sound with differing acoustics, the ears really tune in to the spatial information, which becomes more focussed. When the helicopter is heard at the end of the album, you can practically see it moving around above you!
Bilocation costs £13.99 (plus £1 p&p for orders from outside the UK) and is available from my web site at www.bilocation.co.uk, or by post from: Bilocation, PO Box 2700, Devizes, Wilts, SN10 3ZU, UK. The site also includes further information on transaural processing, binaural recording, setting up a 5.1 system and surround sound in general.
Making a satisfactory 5.1 mix in the studio could be seen as merely an intellectual exercise — rather pointless unless it can be made available for others to hear. So it must be coded somehow.
I had read about a piece of software called Soft Encode for combining six mono WAVs into one Dolby Digital bitstream, but it is no longer available. I eventually found someone who had the software, and made a 5.1 coded file. It sounded terrible! The top frequencies suffered very badly, and lots of the spatial information that depended on short delays had simply vanished. I was gutted. The fact is that coding involves some hefty data compression (around 10:1) and Soft Encode just wasn't sophisticated enough for what I needed. After ringing around, I was advised to go the DTS route. Not all surround systems in the world are DTS-capable, but apparently 200 million of them are... DTS coding uses a very complex algorithm to compress the data and interleave the six channels.
Luckily, a friend had just bought the latest software from the Minnetonka company, and coded it for me. I supplied the six WAVs on two CD-Rs, and the process simply involved loading them into his machine and pressing go. I was amazed at the result — it sounded exactly as I'd monitored it originally, yet it now fitted onto one CD-R. I have since been told that equally good results can be had with Dolby Digital, but more tweaking is needed to get the best from it.
Binaural recording with microphones has something of Heath Robinson about it, but it can be a powerful tool. Even more powerful is synthesized binaural sound. The Head Related Transfer Function is what is behind transaural processing, and it's complex. It takes about half a millisecond for sound to pass from one side of a human head to the other — but not all sounds come from the same direction. Sounds reaching the head from different angles and elevations have complex peaks and troughs in their audio spectra, and these again are time-related. This mass of information is decoded by the brain to construct a 3D sound picture. If this process can be synthesized, then false 3D sound images can be created. This has huge implications for virtual reality, gaming, and so on.
A great deal of research is being done in calculating HRTFs, but although I found lots of information on the Internet about them, none of it was comprehensive. The fact is, HRTFs are potentially big business. The data is very difficult to collect, and too valuable to give away. I emailed several universities who had posted HRTF information, asking for more. Not one replied... There is some information available though, and it's worth looking. The HRTF data can of course be programmed into a DSP. So would anyone out there like to have a crack at making a transaural plug-in, in VST, and release it as freeware? Please.
An excellent page of links for 3D audio sites can be found at www.wareing.dircon.co.uk/3daudio.htm.