Creating the score for Pirates Of The Caribbean: At World's End required an incredible amount of music technology, and demonstrated why 64-bit computing offers new levels of realism for orchestral sample libraries.
Film music is perhaps the single area where music technology has its largest application. Not only does film music require all the usual elements of sequencing and recording, but there are the added complications of synchronisation, frame rates, submitting mixes to dub stages, and all sorts of nonsense that other musicians simply don't have to worry about. This complexity makes film musicians ever more reliant on technology for reasons of efficiency: so many tedious tasks need to be performed that it would be painful to consider having to do these things without the aid of technology. As painful, in fact, as it was to carry out such tasks before the technology came along in the first place.
At the same time, technology can also help those working in film music to be more creative — or at least that's the hope. Whether it's creating a new sound that inspires an idea, or working with a computer-aided orchestra that allows for more experimentation, technology can take a musician to more places creatively than they could otherwise explore. And this is especially important when you have to be creative under a time pressure, as is usually the case in film music. So, when it works, music technology allows film composers to be creative, efficiently.
I get to spend my days worrying about this issue, working for the film composer Hans Zimmer. Just recently, while working on Pirates Of The Carribean III: At World's End, Hans decided he wanted to push our use of technology further than normal. This resulted in arguably the most technically accomplished, realistic-sounding orchestral mock-ups ever created, and I think some of the best music written for the Pirates series so far. It also led to Hans' day-to-day technical coordinator referring to the film as 'Samples of the Caribbean'...
Hans is well known for having created his own private collection of sampled orchestral instruments, and the core of this library was originally recorded back in 1994 onto eight-track digital Tascam DA88 format. After the recordings had been edited, instruments were programmed for Hans' sampler of choice at the time, Roland's S760, which wasn't a bad option, considering that each unit took only 1U of rack space and Hans needed over 30 of them to play back the orchestra. Over the years, more samples were added to the library, and the library itself went through many conversions as more powerful samplers were released, such as Emu's E4. In 2002, the original recordings were transferred again from the original DA88s, remastered and programmed natively into Tascam's Gigastudio, and these newly created versions still make up a large part of the palette five years on.
One of the reasons samples are so important in film music is that their use allows the director, and other people involved in the film-making process, such as the editor or producers, to hear how a piece of music (usually referred to as a cue) is going to sound against the film. As Hans himself has commented many times, using samples of real orchestral instruments is going to give a better impression than the composer sat at the piano going "but you know, this is going to sound great when the horns come in!"
These days it's quite common for sampled demos to be used by editors to cut against while they're working on the film, or for showing previews to test audiences, before the final versions of the cues have been approved and performed by a real orchestra. There are a couple of advantages here, but the main one is that it allows for ideas to be tried out easily before the expense of trying them out with a real orchestra. So good demos, and thus good samples, are important for modern film composers to get their ideas across.
Although they still sound pretty good, Hans really wanted to start again with new recordings: partly because the technology has improved radically since 1994, but mostly because he was simply bored with hearing the same old sounds every day. So in 2004, a year after I'd started working with Hans — and 10 years after the original library had been recorded — we started again. The new library was recorded using the same studio as its predecessor (AIR Studios in London), and largely the same bunch of musicians as before, but there were also some key differences.
While the original recordings were made at 44.1kHz with a 16-bit resolution, the new recordings were captured at 88.2kHz/24-bit using Apogee converters. Originally we had planned to record at 176.4kHz, and we did actually record the first sessions using two Pro Tools systems: one at 176.4, and another at 88.2kHz. The differences between the two recordings were very subtle indeed, and although I would have liked to keep the higher sampling rate (I even wanted to record a DSD archive version of all the sessions as well!) ultimately it would have been too expensive. It wasn't just the cost of recording the data to consider, but the cost of backing up that data, the systems required to edit the data, and so on.
One of the reasons we were creating so much data is that instead of recording a single surround mix of the samples, we recorded each sample using 16 separate mics, each on its own channel, just as it might be heard in an actual film score recording. In recent years there have been commercial libraries that allow you to balance, say, close, room and stage mixes; but our goal was to build a system that would allow us to mix from all 16 microphone channels in real time, even when the composer was playing back the samples. While this might sound a little crazy (and that's partly because it is crazy), the result would enable us to work with the mix of a piece of music containing samples in exactly the same way we would work with a piece of music recorded by a real orchestra. Perhaps our secret weapon in the whole recording process was AIR's chief engineer, Geoff Foster, and we relied completely on his experience for the choice of microphones and positioning.
Once the samples had been recorded, the next task was to process them using noise-reduction techniques. Noise reduction is quite a sensitive issue: after taking so much time and care recording the samples, the last thing you want to do is make those samples sound inferior. But the fact is that noise reduction is a necessary compromise when you're creating samples, because no matter how small the amount of noise present in your recording, when that noise is multiplied by the number of times the recording is going to be layered up in the sampler as you play notes, it becomes a real problem.
So at the end of 2004 I flew back to England with my partner on the project, Abhay Manusmare, and visited noise-reduction specialists CEDAR Audio. Geoff Foster came up from London and we spent a day with one of CEDAR's specialists, investigating how their high-end Cambridge system could be used to clean up our samples while preserving as much of the natural reverb of the room and the tone of the instruments as possible. We were pretty impressed, and we decided to purchase a couple of Cambridge systems for the processing of our samples.
This left the Herculean task of editing the samples and turning them into playable instruments, and for this Hans and I turned to a mutual friend, Claudius Bruese. Claudius lives in Germany and is best known for his early work at Waldorf, as product manager for the Wave, and his recent work on products such as Steinberg's The Grand and Halion Symphonic Orchestra. He's also an accomplished composer and musician and a brilliantly organised person to deal with the large task that lay ahead!
However, what do you use to edit high-resolution, multi-channel samples into playable instruments? Well, to be honest, we couldn't find anything. It's not just about the cutting, for which an application like Pro Tools would be fine, but the looping and assigning of metadata as well. So at that point we gave up on the whole project. Just kidding. Fortunately, by this time, Hans had invested in a company called Wizoo (which is now part of Digidesign) and a team of programmers created an editing system that would allow us to create the content for our new library.
There's a big change going on right now in the computer world as we all start adopting 64-bit operating systems, and the big advantage of using a 64-bit operating system, as you've probably heard, is its ability to access a large amount of memory.
When you run an application on 32-bit Windows, for example, that application can only use around 2GB of memory (or up to 3GB in some circumstances). If you run any plug-ins within that application, such as a software sampler plug-in, that plug-in will be included under the application's umbrella of memory usage. So if you're running a sequencer and a bunch of plug-ins on 32-bit Windows, the maximum amount of memory they can collectively access is 2GB.
Under a 64-bit operating system, an application has no such limitation (well, there is a limitation, but it's so damn high that you're likely to run out of money to buy more memory before you reach that limit, and most current workstations can only physically accommodate 16 or 32GB of memory anyway). However, running a 64-bit operating system isn't an instant solution. Not only will you need 64-bit drivers for all your MIDI and audio hardware, but the applications you run need to be specially compiled for 64-bit operating systems in order to make use of all the memory. At present, as far as I'm aware, Cakewalk's Sonar is the only major DAW available on Windows commercially in a 64-bit version.
Assuming you have the appropriate drivers for all your hardware, your current 32-bit Windows applications can perform better on 64-bit Windows. I briefly mentioned earlier that some applications can be made to use up to 3GB on computers with 4GB of memory running 32-bit Windows, and 32-bit applications can actually use up to 4GB if run on 64-bit Windows, provided the developers set a certain flag correctly in the program's executable file. Even though Cubase isn't available in a 64-bit version, for example, the current 32-bit version will already take advantage of this option.
For this reason, we were running our GVI host software (and later our own samplers) on systems with 64-bit Windows. Our audio hardware was RME's Fireface, and although this was overkill, since we only needed the ADAT outputs, it was one of the few bits of hardware available last year with good 64-bit drivers and a couple of ADAT ports — although that situation has now changed. Fortunately, since we were already using a custom-developed system for sending MIDI over networks, we could build that technology straight into the host application without having to worry about installing MIDI drivers. (Commercial alternatives like Music Lab's MIDI Over LAN now support 64-bit Windows.)
Even though we had developed the host in-house, though, it wasn't possible to make it a true 64-bit application, because one feature of 64-bit applications is that they can only load plug-ins compiled for 64-bit systems. Similarly, 32-bit applications can only use plug-ins compiled for 32-bit systems. So if you're running 64-bit Windows and you want to use a 32-bit plug-in, you have to run it within a 32-bit application. (Cakewalk actually have a feature in Sonar called Bit Bridge that allows those running the 64-bit version of Sonar to use 32-bit plug-ins, but that wasn't an option for us, and a full explanation is a little beyond the scope of even this article!) So even though we could have used a 64-bit application like Sonar to access a large amount of memory in a music context, there still weren't any 64-bit sampler plug-ins that we could run within Sonar to play back our orchestral sounds.
During 2005 we didn't get much chance to work on the new sample library, because of other film projects; but towards the end of 2005, and throughout 2006, Claudius and his team really started ploughing through the content we'd recorded so far. Finally, in the Summer of 2006, we began to receive some test instruments of the new library that we could play: short strings and long strings.
The short strings sounded great, and with six or seven velocity layers and up to six round-robin variations on each note (incorporating up and down bows), produced surprisingly realistic performances. Unfortunately, though, Hans absolutely hated the long strings! On the day he heard them, we ended up sitting in the studio until six in the morning, and he was pretty depressed about the whole thing. Part of the problem was that they weren't very playable, and the fact that they weren't very playable made it difficult to judge their quality when compared to the old library.
The reason for this lack of playability was the way each note had been cut. The full natural attack had been preserved, which was great for playing individual notes, but made it hard to perform sweeping legato lines. However, this turned out to be a blessing in disguise, because Claudius later came up with a solution. This was to provide three different attacks for the long notes (original, short, and something in between) that could be selected by the person playing the samples, using keyswitches.
The new attacks solved the problem of playability and still gave the option of having the natural attacks, but it was an interesting lesson. One of the things we really tried to achieve with the samples was to create the most natural, realistic sound possible, and the idea of manipulating the recordings too much as they became sampled instruments was initially quite abhorrent. However, the experience of the long notes served as a reminder of the paradox of recreating acoustic instruments electronically: sometimes you have to do quite unnatural things to recreate something natural. Playing in a legato style is quite natural for a real violinist, but quite unnatural for a sampler.
Even though these test instruments were still far from the ultimate potential of the new library, they were pretty good and sounded, as you would have hoped, significantly better than the old library. So Hans decided he really wanted to use them on Pirates 3: if he was going to have to do another sequel, went the reasoning, we might as well try to make the music sound as different as possible and really push the boundaries. At the time, I had no idea how far that idea would take us.
The first draft instruments we received from Claudius were not the full-quality final versions, partly because we had no way of playing back 16-channel samples with a decent amount of polyphony at high sample rates. So instead, four-channel Gigastudio versions were created (at 16-bit/44.1kHz), consisting of two stereo Giga instruments (one for the left and right front speakers, and one for the left and right rear speakers) that needed to be played back as a stacked instrument on the same MIDI channel in Gigastudio. Although this worked well, there were a couple of big problems.
To say that memory is one of the problems we had with Gigastudio might sound a little odd — isn't the fact that it streams from hard disk meant to save memory? Well, yes and no. Like all streaming samplers, even though Gigastudio does indeed stream samples from hard disk, in order to do this efficiently it needs to pre-load the first part of every sample used in an instrument into memory.
The reason for this is that the seek time of a hard drive — which is to say the time it takes for the hard drive to locate an item of data that's required — is about seven or eight milliseconds. Now think about the buffer size on your soundcard, which is probably set to somewhere between 1.5 and 12 milliseconds. At the point when you ask Gigastudio to play a note, there wouldn't be time to fetch that data from hard disk and process it in time for there not to be a tremendous amount of latency — especially when you start to play more and more notes. So, instead, what happens is that the first part of the sample is played from memory, giving the hard disk sufficient time to start streaming the remainder of the sample.
With all our round-robin variations, different velocity layers, and alternate attacks, there were quite a large number of samples to pre-load, even just for one instrument. And we had separate long and short instruments for violins, violas, celli and basses, all needing to be loaded and played back!
In addition to memory, another problem we encountered was that playing back one of our instruments required hundreds of voices, using about half the processing resources of one fairly beefy computer running Gigastudio. On a modern system, you might get between 600 and 800 voices when running Gigastudio 3. This may sound like a large number, but it's the voice count for mono voices. For our quad samples, each note required a minimum of four voices, so that's only 150 to 200 notes all of a sudden, which easily get used up when you consider that all the samples have four seconds of natural ambience after them. Play a trill, and with the amount of overlapping notes you've just lost a quarter of your polyphony!
Worse still, one of the key features we use in Gigastudio for long strings is the ability to crossfade, using a MIDI controller, between different velocity layers, and this creates a real nightmare for the voice count. If we go back to the discussion of streaming and how it takes too much time to find something on a hard drive at the precise point at which the sampler requires it, and if you consider that the sampler never knows what's coming next, the only solution is that the sampler always has to be ready. Part of being ready is to cache the first part of every sample, as we've already described; but in the case of crossfading between multiple samples in real-time, what happens is that all velocity layers must be played simultaneously, as the user could move the MIDI fader at any time to crossfade to another velocity layer. Our samples have seven velocity layers, and since each note requires four voices to play back (because the samples are in quad) that means each note requires 7 x 4 = 28 voices to play back. Suddenly, 600 to 800 voices per system becomes 21 to 28 notes per system!Although these types of issues are inherent to all samplers, a particular restriction with Gigastudio 3 is that its sample playback engine can only make use of one processor core. So even if you buy a more powerful computer with four or eight cores, Gigastudio will still only use one of those cores to play back samples.
Just as we were starting to look around for another solution last September, we began playing around with Tascam's GVI, the VST plug-in version of Gigastudio, which offered a seemingly good solution. With GVI, we could load our Giga instruments and play them in any VST host. And if that host was able to take advantage of multiple cores, we could run many GVI plug-ins on the same computer and the host would schedule these to take advantage of the available system resources, meaning that we'd get more polyphony. However, there weren't many choices for a suitable host last September: we could choose a sequencer like Sonar or Cubase, but this seemed like a clumsy way to run a couple of VST Instruments on a computer, especially in terms of how the user would interact with those instruments.
So I instead looked into creating a simple host application that would allow us to run multiple GVI instances and would support multiple processing cores. After about a month, we had something up and running, and although there were a few quirks with GVI at the start (no one had actually tested it on a multi-core system before!) eventually it was going well enough for Hans to start using it as work was beginning on Pirates III.
One obvious advantage of the custom software approach was that we could create a tool that very directly addressed our need, which in this case was a host that automatically launched multiple GVI instances that the user could switch between, mixed the outputs of all the instances so that outputs one and two of each plug-in went to the same hardware output, outputs three and four went to the same output, and so on, and supported a single load and save command that stored and retrieved all the parameters of all instances.
Unfortunately, it wasn't a perfect solution. Orignally, the idea was to create a Gigastudio-like host application where, since Gigastudio has eight pages of 16 channels to load instruments, our host would load eight GVI instances to provide the same number of MIDI channels capable of loading the same number of instruments. However, it turned out that if GVI was assigned a high number of voices, say 600 to 800, it actually used a small amount of processing even when the plug-in wasn't doing anything. This isn't a big deal if you're running one instance, but it is a big deal if you're running eight instances, even though the computer we were using was powered by two dual-core 3GHz Intel Xeons. So we throttled the system down to run four GVI instances instead, and while this was good — we achieved about 1500 voices under ideal circumstances — it still wasn't great.
Other than processing power, the other big problem with the GVI approach again came back to memory. As an example, our short strings samples alone required about 6GB of disk space, and we had difficulty loading these instruments into the system to get the performance we needed. Usually Hans has several instances of each string group (violins, violas, celli and basses) in his palette, but he also uses 'full strings' patches as well. In our old library we had real full strings samples, recorded with all the players in the room at the same time. With our new library we had only recorded individual groups, so we also needed to create special stacked versions to produce a full strings patch.
The required line-up for the short strings in Hans' palette (which would be mirrored by the long strings, along with trill and tremolo instruments too) was basically one of threes. Three violins, three violas, three celli and basses (combined patches, because Hans likes to have the celli extend down to basses when he's writing), three basses, and three sets of full strings — a total of 27 instruments to be loaded.
Because each GVI instance doesn't know about instruments loaded in other GVI instances that might be running on the same machine, we ended up with the same instruments loaded several times on the same machine. While we could potentially have loaded up one instance with all the instruments required (using GVI 's stacked instrument function), we would not have achieved the polyphony we required. And the minute we spread the instruments over multiple instances of GVI, we started to run out of memory.
In the end, it took four systems, each with two dual-core 3GHz Xeons and 4GB RAM, to load and play back the short strings for Hans to work with. And it took another four systems of the same specification to load and play back the long strings (including trills and tremolos), on top of the existing five machines running Gigastudio and the rest of the orchestra — and we still didn't have enough power to load in our new brass samples, which required yet another system.
This 14-computer configuration was OK for Hans alone, but an added complication with the project was that it wasn't just Hans who needed access to these new sounds. On a big movie like Pirates Of The Caribbean: At World's End, it's very hard (if not impossible) for one composer to handle the workload. In addition to writing the themes and defining the orchestration, there are the individual cues that need to be written for the movie. And on top of that, it's very rare that the picture stays the same, so as the picture gets edited the composer has to conform his cues to the latest cut of the picture.
Yet another complication was that Hans was going to work at Disney studios in Burbank (just down the road from his facility in Santa Monica), so a duplicate mobile version of his studio was going to be created for him to work with at Disney. In addition, a second support composer would work with him at Disney and require an identical rig. And on top of this, we had four groups of people who would also need access to the new sounds. With the four-core systems required to play back the new samples costing around $7000 each, there was just no way to supply eight or nine of these systems to each studio that needed them. We needed to find another way, and quickly, since Hans was already working away on the big themes for the film, including one for the main antagonist, Beckett, which had quite a nice bit of counterpoint that became a real showpiece for the new short strings.
It was clear that the memory problems we were running into could be reduced if we were able to work with a 64-bit OS (see the 'When I'm 64-Bit' box), since this would remove the 2GB upper limit on RAM in the 32-bit version of Windows. Our CPU use could also be improved by having a sampler engine that was able to use all available processing cores in such a way that you could run a large number of sampler channels that all had access to the same pool of sample data. Such a sampler engine didn't exist, but since I didn't have anything better to do over Christmas, I decided to play around and see if I could write one. We didn't need that many features to play back our new samples: we just needed it to work reliably and be efficient enough to reduce the number of computers required to play back our new palette.
Once the decision was made to focus on 64-bit Windows, we ordered computers with 8GB memory installed, and since the short strings only took up 6GB of disk space, it occurred to me that we could actually just load all that sample data into memory and do away with disk streaming altogether. This had two important advantages: firstly, it was easier to implement, and secondly, the sampler should theoretically perform better if it had all the data it needed in memory.
Using Hans' Beckett sequence as my benchmark, I cobbled together a simple sample engine. I used the host I'd written as a starting point, since that already had a multi-core-capable audio engine and needed only minor tweaks to make it a 64-bit application. I won't bore you with the implementation details, but after Christmas I was able to show Hans the new short strings running on just one system, with a maximum polyphony of about 1800 mono voices.
With the short strings now able to be played from one computer, we started distributing systems to the other composers who were working on the project, and I began playing around with the engine so that we could get the long strings to play back as well. A few weeks later this was working well enough to put into the hands of the other composers, and in the end each studio needed two additional computers: one for the legato long strings, and one for the trills and tremolo long strings. It would have been nice to get all the long strings onto one computer, but at the time we would have needed to install 16GB of memory into those systems, and since that was quite an extra expense we opted to put the money towards an additional system instead and get extra processing resources.
Ultimately, developing a new sampler and deploying it immediately on a movie like Pirates was something of a 'flying by the seat of your pants' experience. Fortunately it worked out, and one advantage of the simple and specific approach of developing something for ourselves was that we could maintain higher reliability. Despite these machines cranking out thousands of voices of polyphony, 24 hours a day, for several months, they never crashed once, which I think was the feature Hans liked best.
As I write this article, we've just about said goodbye to Pirates and are gearing up for the next movie. Fortunately, it won't require anywhere near the technological advancements we just went through, so while Hans and the other composers are working with the systems designed for Pirates, it gives a me chance to sit back and figure out the next step. We still need to finish recording our new sample library, Claudius still has a big job on his hands to finish the editing, and despite this brief glimpse of the potential of the new sounds, we've yet to experience the full flexibility of the original concept.
Although a project like Pirates with a composer like Hans is an extreme example of the application of music technology, I think it clearly shows why there's such a need for 64-bit technology in all computer-based studios. Since the release of Gigasampler back in 1998, sample libraries have been growing in size; and even if you're not going to sample your own orchestra, to get the most out of commercial libraries such as the Vienna Symphonic Library and the East West Quantum Leap Symphonic Orchestra requires lots of computing power and memory. Indeed, at the Winter NAMM show earlier in the year, East West announced the company's forthcoming Play engine, which is going to be available in a 64-bit version to overcome exactly the limitations we've been discussing.
Windows users can start to enjoy the 64-bit revolution right now, using the appropriate versions of XP or Vista, with the majority of audio and MIDI hardware now offering 64-bit driver support. Mac users will hopefully be able to do the same with the release of Mac OS 10.5 'Leopard', in October. So whichever platform you're working with, if you think music making has been good for the first 32 bits, just wait and see what becomes possible with the next 32.