Some music applications will completely fail to take advantage of the multiple cores of a modern CPU - but which ones, and why? We find out, and advise on how you can make best use of however many cores your PC has.
Over the last couple of years, the PC musician has been offered first dual-core processors, then quad-core models, and octo-core machines (currently featuring two quad-core processors) are now available for those with deep enough pockets. Competitive pricing has already ensured a healthy take-up of DAWs based around a quad-core CPU, yet many users haven't cottoned onto the fact that not all software benefits from all these cores. Some existing software may only be able to use two of them, reducing potential performance by a huge 50 percent, while older software may only be able to utilise a single core, reducing potential performance to just 25 percent of the total available. This month PC Musician investigates which audio software works with dual-core, quad-core PCs and beyond, what benefits you're likely to get in practice over a single-core machine, and which software may for ever languish in the doldrums.
In the days when most musicians ran Windows 95, 98 or ME, the question of running multiple processors didn't arise, because none of these operating systems supported more than a single CPU. It was Windows NT and then Windows 2000 that introduced us to the benefits of being able to share the processing load between multiple CPUs: Windows 2000 Professional supported one or two processor chips, while the more expensive Server version supported up to four, and the Advanced Server up to eight. However, at this early stage each processor was a physically separate device, so to be able to (for instance) use twin processors, you needed a specially designed motherboard with two CPU sockets. Many audio developers and interface manufacturers didn't actively support Windows 2000, so most musicians stuck with Windows 98.
In 2001, Microsoft released Windows XP in Home and Professional versions, and once again most consumers who opted for the Home version were limited to a single physical processor, although the Professional version supported two. By this stage many musicians were straining at the leash, wanting to run more and more plug-ins and software instruments, and this Professional version let them do exactly that, using dual-processor motherboards and twin Xeon or Pentium 4 processors.
Multi-processing options really opened up the following year, when Intel introduced first Xeon and then Pentium 4C processor ranges with Hyperthreading technology, which let these CPUs appear to both Windows XP Home and Professional (or Linux 2.4x) as two 'virtual' processors instead of one physical one. They each shared the various internal 'sub-units', including the all-important FPU (Floating Point Unit), but could run two separate processing 'threads' simultaneously.
Intel claimed up to a 30 percent improvement with specially written applications over a standard processor, but as many musicians soon found, having a Hyperthreaded processor didn't necessarily benefit them at all unless they were running several applications simultaneously, since applications like MIDI + Audio sequencers had to be rewritten to take advantage of Hyperthreading. Steinberg's Nuendo 2 was one of the few music apps to support it, but although various others followed, a few (such as Tascam's Gigastudio) needed a major rewrite before they would even run with HT enabled. Nevertheless, my own tests (published in PC Notes June 2004) showed that with optimised audio applications such as Cubase SX2 you could expect a significant drop in CPU overheads where it really mattered, at low latencies of 3ms or under.
The biggest change came in late 2004, when both AMD and Intel seemed to agree that processor clock speeds had reached a ceiling. Intel abandoned plans to release a 4GHz model in their Prescott CPU range, and in 2005 both companies largely switched to releasing dual-core models. Unlike the twin virtual processors of Intel's Hyperthreading range, these featured two separate processing chips mounted inside one physical package. By placing two processor cores into a single piece of silicon, manufacturers could provide significantly faster performance than a single processor, even when under-clocking them and running them at lower voltages, so that they didn't run hotter than the single-core variety.
By late 2006 we had been introduced to quad-core processors, which have now dropped in price and can even be run with Windows XP Home (which is licensed to run a single physical processor, however many cores it has inside). However, if running XP Professional (and the x64 64-bit version), Vista Home Premium, Business, Enterprise or Vista Ultimate you also gain the option of installing two quad-core processors on a suitable motherboard, to provide a total of eight processing cores. Unfortunately, as with so many new hardware advancements, much software has had a long way to catch up before it could take advantage of so many cores.
Determining how much extra performance you'll get from a particular software application with four or more cores will require some benchmark testing, but fortunately it's far easier to determine whether or not a particular application is utilising all the available cores. Windows Task Manager (launch it using the Ctrl-Alt-Delete keyboards shortcut, aka the 'three-fingered salute') has a Performance page that offers a CPU Usage History, and as long as you select the 'One Graph Per CPU' option in the View menu you'll get as many individual graphic windows displaying CPU activity as you have cores.
When you're using a PC with multiple processors of whatever type, to gain any significant performance benefit the software you run has to be specially written or adapted with multiple processors in mind. The way multi-processing works is that applications are divided into 'threads' (semi-independent processes that can be run in parallel). Even with a single processor there are huge advantages in this programming approach. Many applications use multiple threads to enable multi-tasking, so that one task can carry on while another is started; and when multiple processors are available, different threads can be allocated to each CPU.
With some processor-intensive programs, such as 3D graphics and CAD software, it's comparatively easy to split off different functions to each processor. However, the situation becomes somewhat more complicated with an application such as a MIDI + Audio sequencer, since all the different tracks are generally being streamed in real time and must remain in sync.
Early schemes used by audio software for sharing tasks between multiple processors were fairly crude; they tended to devote each CPU to a specific duty, so that (for instance) audio mixing and effects were handled in one thread, MIDI processing in another, and user interface responses in yet another. When a MIDI + Audio sequencer is run with several identical processors under such a scheme, the entire audio-processing workload is normally handled by one processor, with any remaining tasks left to the others. Since audio processing is by far the most significant overhead for any music application, this approach resulted in a typical overall performance improvement of just 20 to 30 percent for a dual-core processor over a single-core processor running at the same clock speed.
To gain further improvement, you need to split the audio processing in some way between the various CPUs, so that it can be processed in parallel. This means added code and complexity, and rather explains why some audio software really benefits from four or more cores, while some doesn't. Steinberg introduced their 'Advanced Multiple Processing Support' on Cubase VST version 5, splitting the audio processing between the processors and giving much larger performance boosts of 50 to 60 percent. Many other audio developers (although not all) followed with similar improvements, and although there are no guarantees, most applications optimised in this way should also subsequently benefit from quad-core and octo-core PCs.
Despite the possibilities, even today many mainstream office applications and games have not really been optimised for multiple processors, and some developers have been resistant to rewriting their applications to support more than two cores, since debugging an application that can run several threads in parallel is far harder than one in which everything happens in a single queue of tasks. Of those applications that have been optimised for multiple processors, most can still only take advantage of two processors, so you'll only get the best performance from them on a dual-core or twin single-core computer. If, for instance, you run a game that can only take advantage of two cores on a quad-core machine, it will only be able to access up to 50 percent of the available processing power.
With quad-core processors and beyond, applications that may benefit include 3D graphics modelling, ray-tracing, and rendering, plus video-encoding tasks, image processing and some scientific tasks. You're always likely to achieve good performance when running several different applications simultaneously, since each will get a good share of the pie, but with MIDI + Audio applications you want a single application to have all its tasks shared out as fairly as possible between the available cores.
While it's possible to specifically assign each Windows programming task to a separate processor, you can also let Windows handle its CPU resources dynamically across a single processor by giving each task a specific priority. The lowest priority is nearly always given to the user interface, which is why screen updates can get sluggish on a single-core machine when you run lots of real-time software plug-ins.
Conversely, any PC with multiple cores is always likely to remain more responsive even when most of the cores are stressed, because the user interface is still happily ticking away on another one. Even if you're running elderly applications that are not multi-threaded, you can still benefit from a dual-, quad- or octo-core machine if you're running several such applications simultaneously, as Windows will allocate each one to a different core.
Developers told me that although most instruments and plug-ins run as several threads, they have no control over how these are distributed among the available cores. This is totally managed by the host application, and according to all the tests I carried out while researching this feature, most audio applications treat each mono/stereo audio track (or soft-synth/sampler track), plus associated plug-in effects, as a single task, and allocate it to a single processor core. You can easily confirm this for your own applications using Task Manager (see the 'Checking Your Tasks' box) and systematically adding a series of demanding plug-ins to the same audio track. I suggest a convolution reverb with the longest Impulse Response you can find.
If you're running multiple cores (whether in the same chip or spread across multiple processors in discrete packages) the above has certain implications. Let's say you have a physically-modelled synth that consumes a lot of CPU resources. Since our synth track is a single task, on a quad-core processor it can only consume a maximum of 25 percent of the overall processing power available — ie. the maximum available from a single core. So, even though your sequencer's 'CPU meter' may indicate 100 percent loading in this situation, and it's possible for your audio application to glitch and stop playback because one of the four cores has run out of steam, you still have 75 percent of your CPU resources available to run other synths and plug-ins, which should automatically get allocated to the remaining cores. Confusing, isn't it? So if you find yourself 'maxing out' a single core by, for instance, running lots of instruments on different tracks, all linked to a single multitimbral software sampler, launch another instance of it and run some of your instruments from that one instead.
When measuring multi-core performance of audio applications, it's therefore important to choose a suitable benchmark test that will allow the applications the best chance of spreading the processor load as evenly as possible. I carefully tested single-, dual- and quad-core PCs, all having identical clock speeds, with Cubase SX running the Thonex and Blofelds DSP40 tests. As you can see from the graphs, while the older Thonex test only displays a 20 to 30 percent improvement between the dual-core and quad-core results, Blofelds showed much better scaling. A quorum of DAW builders seem to have agreed that Vin Curiglianio's DAWbench suite is currently the best test available to measure differences in multi-core system performance, since it starts with a real-world song and then ignores the application's CPU meter in favour of adding more and more plug-ins and/or soft synths across 40 tracks until you hear audio glitching, which largely mirrors what many musicians do in the real world.
The original DAWbench Blofelds DSP40 test is for those Cubase/Nuendo users who mainly record audio tracks and use lots of plug-ins (there's now also a new SONARbench DSP test that uses the same techniques), while the L-Factor II test is for Cubase/Nuendo owners who instead run lots of software synths. Such 'on the edge' tests are also useful in comparing audio driver performance, as well as spotting operating system issues such as jerky graphic scrolling under stress, and the extra overheads imposed by the Windows XP and Vista Aero graphics over the Windows 'classic' look.
What tasks you're going to perform with your audio application may also affect the ideal number of cores, and thus which is the 'best' PC for the job. For drummers and vocalists monitoring their own live performances on headphones, the Holy Grail is to run a system that runs with barely discernible latency. Many would be happy using a buffer size of 64 samples, which would mean a total real-world latency for audio monitoring with plug-ins of just under 5ms (at a sample rate of 44.1kHz), or around 3.5ms for playing soft synths. If you still find this unacceptably high and prefer not to rely on 'zero latency' monitoring solutions (which bypass any plug-in effects), 32-sample buffers would offer total audio monitoring latency of around 3.5ms (around 2.7ms for soft synths), again at a 44.1kHz sample rate.
Blofelds DSP40 tests by a range of DAW builders who have access to lots of PCs based around different processors have shown that at really low buffer sizes, such as 32 samples, a single quad-core processor will always outperform a single dual-core processor or (more interestingly) a system featuring two dual-core processors, and sometimes even a dual quad-core system. In some tests at these really low latencies, when stressed with lots of plug-ins and instruments, the single quad-core machine was the only one to complete them successfully, making it the current king for low-latency performance.
If you're happy to run use a higher buffer size, of 128 samples or above (audio monitoring latency of around 8ms), you'll probably be able to run significantly more plug-ins and soft synths using two quad-core processors than one. Those involved in lots of recording work who want 'real time' monitoring may thus prefer a single quad-core, while others who rely mainly on samples and soft synths may get even more mileage from a twin quad-core system.
This is the biggie: it's all very well having a hugely powerful quad-core or octo-core PC, but not a lot of use if your software only uses two or four cores from those available, or makes a poor job of sharing resources between them. The secret is for the application to balance requirements across the available cores, so that you don't get any audio glitches as a result of one or more cores running out of juice while there's some still available from the others.
For the reasons mentioned above, stereo audio editors may not take full advantage of a multi-core PC — something I soon confirmed with Steinberg's Wavelab 6, which only used one core for DSP processing during playback or audio rendering. Its author Philippe Goutier says that a second core will be used for disk access and the user interface, which does at least mean that the application will always remain responsive to new commands, but he hopes to improve core-sharing now that so many musicians have multi-core PCs.
The vast majority of stand-alone soft synths also seem to mostly use a single core, but as soon as you load the VSTi or DXi version into a host VSTi or DXi application, this host should distribute the various plug-ins and soft synths across the available cores to make best use of resources. Fortunately, most multitrack audio applications can distribute the combined load from all your tracks between as many cores as they find, although it's perhaps inevitable that since many of the latest versions were released long before quad-core and octo-core PCs were in regular use, some don't manage it quite as efficiently as others. Even now some developers don't have octo-core test systems.
Before coming to any conclusions about the multi-core performance of your particular sequencing package, make sure you have any appropriate parameters set correctly. For instance, in the case of Cubase/Nuendo you'll need to tick the 'Multi Processing' box in the Advanced Options area of the Device Setup dialogue, while for Sonar the tick-box labelled 'Use Multiprocessing engine' is the one to check. With these settings deactivated you'll only be using one of your cores, and performance will plummet. In Reaper, most multi-core users will need to tick the the 'FX render-ahead' option in the Audio Buffering dialogue to enable the full benefits of native plug-in multi-processing. Universal Audio UAD1 owners should leave this option un-ticked, however, because of current UA driver issues.
Reaper's Justin Frankel told me that he routinely does a lot of his development on a dual quad-core Xeon PC, so it's hardly surprising that the default Reaper settings work well with up to eight-core machines, typically offering over 95 percent utilisation of all eight cores. Reaper mostly uses 'Anticipatory FX processing' that runs at irregular intervals, often out of order, and slightly ahead of time. Apparently, there are very few times when the cores need to synchronise with each other, and using this scheme he can let them all crank away using nearly all of the available CPU power. Exceptions include record input monitoring, and apparently when running UAD1 DSP cards, which both prefer a more classic 'Synchronous FX multi-processing' scheme.
Steinberg's Cubase SX, Cubase 4 and Nuendo all work decently on quad-core systems, scaling up well from single to dual-core and quad-core PCs. However, Cubase 4 and Nuendo 4 don't currently provide all the benefits they could at low latency with a dual quad-core system. Compared with the potential doubling of plug-in numbers from dual to quad, when you move to 'octo' you may only be able to run about 40 percent more plug-ins down to buffer sizes of 128 samples, while below this you may even get worse performance than a quad-core system.
Steinberg developers have already acknowledged the problem, which is apparently due to "a serialisation of the ASIO driver, which eats up to 40 percent of the processing time. Together with the other synchronisation delays, only 25 to 30 percent of the 1.5-millisecond time-slice can be used for processing. This is not very efficient." Steinberg have promised to address the issue in a Nuendo 4 maintenance update, and have hinted that it may also result in changes to the ASIO specification.
Cakewalk's Sonar does seem to scale well, sometimes giving a better percentage improvement when moving from a quad-core to an octo-core PC than the current version of Nuendo/Cubase 4, but the jury still seems to be out on whether choosing ASIO or WDM/KS drivers gives better results; with some systems ASIO is a clear winner, while in others WDM/KS drivers move significantly ahead.
Digidesign have a reputation for being slow but thorough when testing out new hardware to add to their 'approved list', and as I write this in early November 2007 their web site states that Intel Core 2 Quad processors and Intel Xeon quad-core have not been tested by Digidesign on Windows for any Pro Tools system.
Nevertheless, Pro Tools HD/TDM users started posting recommendations for rock-solid systems featuring twin dual-core Opteron processors (four CPU cores in all) in mid-2006, and there are now loads of Pro Tools LE users successfully running both quad-core and even a few octo-core PCs in advance of any official pronouncements (there's lots of specific recommendations on both quad-core and octo-core PC components in a vast 126-page thread on the Digi User Conference at http://duc.digidesign.com/showflat.php?Cat=&Number=988224). Despite the lack of official 'qualification', all Pro Tools systems seem to scale well on quad-cores, happily running all four cores up to 100 percent utilisation, and many users are very pleased with their quad-core 'native' CPU performance.
Like various other audio applications, even the latest Mac version of Logic Audio doesn't yet fully benefit from having eight processor cores at its disposal, but for die-hard PC users of Logic the situation is rather more serious: Apple discontinued development and support for those using Logic on the PC back in 2002, so most recent version (5.5.1) is now some five years old. Although it's a multi-threaded application, Logic 5.5.1 for Windows is not really optimised for multiple processors, so only one of the cores is likely to get much of a workout. However, there's a partial workaround, using the I/O Helper plug-in available from Logic version 5.2 onwards, which can force any plug-ins on a track with it inserted to run on a second core, so that you can use lots more plug-ins/instruments overall (there's a more detailed description on Universal Audio's web site at www.uaudio.com/webzine/2003/may/index5.html). Logic Audio 5.5.1 also has a problem if more than 1GB of system RAM is installed (see http://community.sonikmatter.com/forums/lofiversion/index.php/t8032.html for some suggestions on this one), and also has problems running some VST plug-ins. It's unlikely to benefit from a quad-core processor at all, and I wouldn't recommend running it on a new quad-core PC, so its shelf-life is looking increasingly limited.
Overall, getting the best out of a multi-core PC generally means a little detective work from the user. You need to make sure you have the most appropriate audio application settings (which might be different if you run DSP cards), and you also need to be cautious when running heavy-duty synths or plug-ins that might consume one of your cores in a single gulp. Keeping an occasional eye on the Windows Task Manager may also help, since the CPU meters provided by most sequencers are becoming rather less useful now that they are monitoring so many individual cores.