Theoretical figures and manufacturers' measurements may suggest that the latency of your soft synths is negligible — but that's no use if their timing is all over the place when you actually play them. Our real-world results give you the true picture.
Way back in SOS November 2000 I discussed the many issues involved in moving from hardware synths to the software variety, including audio quality, sampler and soft-synth aliasing, latency, and latency jitter. The beauty of VST and DX Instruments is that their playback timing from a MIDI + Audio sequencer is always guaranteed to sample accuracy, since the waveforms are generated slightly ahead of time and outputted just like any other audio track. However, judging by subsequent discussion in the SOS Forum, some people found MIDI latency jitter a difficult concept to get their heads around. In practical terms it means not only that some software applications exhibit a time delay between pressing a MIDI note and hearing the soft synth output, but that this delay may vary quite a bit. If for example you set a 12ms buffer in Cubase 5.1, your MIDI-to-audio latency might theoretically vary from 12ms to 24ms while playing in real time or capturing a performance.
I followed up this feature with one in SOS March 2001, which presented an overview of the various causes of MIDI and audio timing problems, ranging from audio clock jitter to short clicks and pops, sequencer resolution, MIDI bottlenecks, and drifting between audio and MIDI tracks over several minutes, along with some suggested tests to identify such problems on your own PC music system.
Since I wrote these features I've noticed that plenty of SOS readers still seem to be reporting sloppy MIDI timing both on standard MIDI recordings and when playing soft synths, so I thought it about time I came up with some ways to move beyond theory and perform some rather more practical real-world tests. What I wanted to do was physically measure the time delay between pressing a key on an external synth keyboard, and hearing the audio output emerging from the loudspeakers, for a variety of hardware and software synths.
My aim in setting up the test rig was to compare the timing of the final audio output at the end of the signal chain with that of the MIDI signal generated at the start of the chain. I decided against trying to source the latter from the terminals in my MIDI keyboard, and instead soldered up a modified MIDI adaptor lead, as outlined in the excellent Hinton Instruments Professional MIDI Guide. This effectively allowed me to tap into the MIDI output signal across pins two and five inside the MIDI plug, and monitor this directly. The only part of the chain that this doesn't eliminate is the keyboard scanning delay, which is typically 1ms, although some early synths such as the Prophet 5 only scanned the keyboard at around 200 times per second, giving a latency jitter of up to 5ms depending on when you pressed the key in each cycle.
I decided to use my soundcard to record the MIDI waveforms, so as a safety precaution I also connected a 5.1kΩ resistor in series with pin five to prevent any soundcard damage in the event of a MIDI fault, and then soldered this up to an audio cable terminated by an in-line jack socket. I attached this new assembly between my keyboard's MIDI output and the original lead, then plugged its flying audio lead into one channel of my soundcard, and the audio signal whose timing I wished to investigate into the other. By recording a few note-presses, I ended up with the MIDI Note On trigger data on one channel and the corresponding audio output on the other. The difference in time between the two gave a fairly accurate value for MIDI-to-audio latency that included the vast majority of the signal chain, and showed up any delays caused by the MIDI interface, its connection and drivers inside the PC, the Windows operating system, processing and buffering delays due to the soft synth, and finally the D-A conversion time. Since these delays will always be part of everybody's chain, it seemed important to include all of them.
If you want to try some measurements for yourself, only connect up this modified cable to your soundcard during the test period, since it may create a ground loop when plugged into some keyboards. The MIDI specification is carefully designed to avoid these by insisting that each MIDI input is opto-isolated so that the metal chassis connections of two MIDI devices are never directly connected. Even the Gameport MIDI Adaptor cables available for many consumer soundcards should incorporate a MIDI opto-isolator on the MIDI input as well as no ground connection on pin 2, although some violate this MIDI hardware specification — if you ever plug in a PC Gameport MIDI Adaptor cable and get a hum, demand a replacement!
In all the following measurements, I attempted to exclude experimental errors by recording about half a dozen key presses and averaging their results, although if I got any unusual figures during these I took considerably more. Sometimes most of the results were about the same, with an occasional increase to a higher figure, while others simply varied over a repeatable range, with the variation presumably due to jitter. I used my Echo Mia for the recordings, mostly at a sample rate of 44.1kHz, and even with its 22kHz audio bandwidth it still proved possible to see the 31.25kbaud MIDI data fairly clearly.
Most MIDI devices emit continuous Active Sensing messages as a safety measure to detect malfunctioning MIDI cables or devices, so you'll probably see regular negative-going 'blips' in your soundcard MIDI recordings. Thankfully, these are easy to distinguish from most other messages such as Note On commands, which appear as clumps of such pulses that are noticeably wider overall and last about 1ms. Since you already know a ballpark value for your latency, you'll soon spot the Note On command associated with each soft-synth audio note, and once you've seen identified the first one they're easy to spot in future.
All my measurements were taken from the start of the MIDI Note On message to the start of the synth note, since this seemed to me the fairest way. Strictly speaking, no synth can start reacting to any MIDI data until the end of the message, which in the case of an isolated MIDI Note On message is some 1ms later. So, if you want a strict 'reaction time', then subtract 1ms from all my results. However, given that a synth keyboard will probably take on average about 1ms before the MIDI data starts being sent, due to its internal keyboard scanning delay, I took the view that these two would pretty well cancel each other out, so that the figures I report should fairly accurately reflect the time it takes from the keyboard contact being closed to the commencement of the synth note.
The best way to generate an easily measured start point on the synth sound is to start with a waveform with a suitable fast rise time, such as a square or pulse wave, turn the cutoff frequency of any filters to maximum and resonance at minimum, and set any envelope generators to their fastest attack time. Judging by the hundreds of results I measured, and their repeatability, I'm confident that they are accurate to at least 0.1ms.
It seemed sensible to start by measuring the delay between MIDI Note On data and the audio output of a few hardware synths to provide some useful comparisons with their software counterparts, so I started by plumbing in my old Korg M1. Like most hardware synths, this provides a consistent and fairly low delay that I measured at 3.2ms, which seems very good considering its 16-note polyphony. Another industry standard is Roland's JV1080, which turned in a rock-solid but slightly higher timing of 4.4ms, no doubt due to its 64-note polyphony.
These figures may surprise some of you who expect hardware to respond instantaneously, but in most cases digital synths are themselves driven by microprocessors, and although their operating system are obviously fine-tuned for real-time playing, it still takes a finite time to process everything.
Having established the response time of some hardware MIDI synths, the next obvious step was to incorporate a PC MIDI interface into the chain. I started by plugging in the test lead between my keyboard and the MIDI input socket of my Yamaha SW1000XG. This is effectively an MU100R hardware synth with added audio channels, but with an important difference — it uses its P21 custom gate array processor to handle MIDI processing, along with audio data transfers across the PCI buss, digital clock information, and many other functions.
This of course involves the PC's operating system, in this case Windows 98SE, which is almost bound to introduce some timing uncertainties, especially since I was connecting the interface and synth through software — in this case the excellent XGedit95. Sure enough, most of the time I measured an excellent latency that varied between 3.4ms and 3.9ms, which is almost exactly the same as a typical hardware synth, but just occasionally this rocketed anywhere up to 11ms, presumably when the gate array (or Windows) was carrying out other duties.
I then switched to my Midiman Midisport 8x8/s interface (using its serial connection rather than USB), and took some readings after connecting one of its MIDI input drivers to the SW1000XG in exactly the same way. These were mostly between 4.3ms and 4.9ms — around 1ms higher than using the direct MIDI input of the SW1000XG — but although they also went higher on occasion, the highest reading I recorded was only 7.7ms. These figures already show the variation you may experience with different PC MIDI interfaces. It's not the fault of the interface, but the timing uncertainties of the additional path through the computer.
It's often recommended that musicians use PCI-based MIDI interfaces (or the MIDI ports on PCI soundcards) in preference to serial, parallel or USB devices, since they may be faster (see box below), and my admittedly limited set of measurements seems to support this. However, since my Midisport 8x8/s has both serial and USB ports, there was also an ideal opportunity to directly measure the performance difference between these two. To test this aspect out, I uninstalled its version 1.05 serial port drivers (the most recent to still include the serial port option), and replaced them with the latest version 1.08 USB drivers before taking exactly the same set of measurements. This time most readings were in the range 7.0ms to 9.1ms — about 3ms higher than they were with exactly the same interface using the serial port. I suspect this is due to the different architectures, and it may also depend on how often each interface is polled.
Very occasionally, moreover, the value jumped as high as 13.2ms, which seems to prove what many people have already said: a USB interface is more prone to jitter, and may exhibit twice as much as interfaces using other connections. It's difficult to make any hard and fast conclusions, especially as the figures also reflect my PC setup — any of you with more background tasks running may experience even larger swings, depending on when and how often these strike, and whether or not they have been given a higher priority than the MIDI drivers.
In the early days of the PC, nearly all dedicated MIDI interfaces either used the serial port, the parallel port or a dedicated ISA expansion card. The most popular ISA card was Roland's MPU401, which became a standard specification followed by many other manufacturers, even when incorporated into other ISA and (later) PCI soundcards. However, serial and parallel-port devices were also extremely popular, particularly since with suitable drivers they could be plugged into either a Mac and PC.
Despite the comparatively low speeds of early ports, serial and parallel interfaces mostly provided excellent performance, but were often extremely tricky to install and set up, especially in a crowded PC. Finding a dedicated IRQ could be difficult, as could setting up the most suitable mode for the parallel port inside the BIOS, and you could experience conflicts with dongles or even your printer when plugged into the same port.
Given these teething difficulties, it's not surprising that manufacturers jumped at the chance to embrace the USB standard in 1999, since this allowed up to 127 devices to be hot-plugged into a Mac or PC, with automatic configuration and no IRQ, BIOS or dongle conflicts. However, as my measurements indicate, and as many other musicians have also discovered, USB can add a small amount of latency — and, more importantly, a larger element of jitter.
This is because unlike USB audio, which uses isosynchronous timing to guarantee its delivery time unless a disturbance is so large that it runs out of buffer space, USB MIDI uses asynchronous timing — data is delivered as and when it gets through, and its arrival depends a lot more on other factors. Once you try to send both MIDI and audio data via USB, they have the potential for huge conflicts, and I would personally only recommend this approach when using a single product that takes care of both data streams. At least then its drivers will be carefully written to juggle the demands of both MIDI and audio, to ensure that both emerge with the minimum of conflict.
My tests seem to prove that the humble serial or parallel-port MIDI interface still has a place for the musician who wants tight timing, but sadly only when using Windows 98, SE or ME. This is because it's very difficult to find serial or parallel-port drivers for the NT/2000/XP platform, since unlike the Windows 95/98 series, it maintains strict control over I/O ports, and will generate an exception error if any attempt is made to access a port without the appropriate privileges.
Microsoft now recommend using USB or FireWire I/O on new PCs, which is why serial and parallel-port dongles are fast disappearing from music applications, to be replaced by USB ones.
The next stage was to test a soft synth, to add the uncertainties of soundcard audio playback to the signal chain. I started off by examining the results using MME drivers with Native Instruments' Pro 52. I chose this because it's arguably one of the most popular soft synths available in stand-alone, VSTi, and now DXi formats, and also because its Play Ahead slider latency adjustment is used by the entire NI range for the PC, including Absynth, B4, Kontakt and Reaktor. I normally quote the lowest MME Pro 52 setting I can manage when reviewing soundcards as an indication of their MME driver quality, but have sometimes been suspicious of the displayed values. Now was my chance to measure them properly for the first time, including the MIDI interface component.
If you're trying out these tests for yourself, do take care to check the playback speed of your MIDI/audio recording to make sure the pitch of the notes matches those originally played in on the soft synth. I discovered to my cost after quite a few tests that if Pro 52 was already using my Echo Mia outputs at 44.1kHz, and I then selected its inputs in Wavelab 4.0 to make the MIDI/audio recording, any recordings I made were also at 44.1kHz, regardless of the recording sample rate I selected in Wavelab. For instance, a recording I made at 96kHz showed up at 96kHz on playback, but listening tests confirmed that it was actually a 44.1kHz one played back at more than double speed. This is not only very confusing, but will totally mess up your timing measurements!
I started with the lowest 10ms Play Ahead setting, which actually measured between 8.9 and 13.7 ms when fed from the SW1000 MIDI input. A check at 50ms Play Ahead provided figures that showed an increase of almost exactly 40ms, so it seems that once you subtract the 3.5 to 3.9ms of the SW1000 MIDI Input component you are indeed left with a top MME latency of about 10ms, but with a total jitter of about 5ms.
NI told me that to keep MIDI jitter low when using MME drivers, they prepare buffers of about 5ms (at 44.1kHz) and tell the soundcard when a buffer is ready. As soon as the soundcard has finished playing a buffer, it sends back a message. If there are more messages than buffers the buffer size is increased slightly, and if fewer, they are made slightly smaller. The Play Ahead time simply indicates the number of 5ms buffers that you need to ensure that the soundcard never runs out of audio.
However, when I measured Pro 52 running on my SW1000XG I got very different results. The lowest reported Play Ahead value I could manage without breakup was 20ms, but this produced measured values between 42.3ms and 46.0ms. Further tests with a Play Ahead setting of 40ms produced results between 61.3ms and 66.1ms, which confirmed my suspicions — all the measured values were about 20ms higher than reported. The reason for this is that NI's mechanism can only indicate the size of the software buffer under its control, and in the case of the SW1000XG there is an additional 1K 'flip' hardware buffer on the card, which at 44.1kHz adds 23.2ms latency. Designs like this cause a lot of confusion, leaving many musicians under the impression that they are running at considerably lower latencies than they are, although thankfully this soundcard design approach is now rare.
It does however show that you can never totally rely on the latency values reported by software. Many SW1000XG owners have reported figures of 12ms latency when running its WDM drivers inside Sonar, even though the hardware prevents this from dropping below about 28ms. This is because the drivers aren't correctly reporting back the total buffer size.
I was particularly interested in measuring GigaStudio's performance, since its developers have consistently maintained that their lower-level interface to Windows provides timing on a par with MIDI hardware. GSIF apparently talks to the soundcard at kernel level, and also allows software routines to remain in the kernel, unlike those of ASIO and WDM, which access the kernel via the Windows I/O Manager. GSIF drivers normally use three buffers of 128 samples, resulting in a theoretical MIDI to audio latency that varies between 5.8ms and 8.7ms at 44.1kHz, depending when the MIDI data is received.
Using the GSIF drivers of my Echo Mia soundcard, I measured latencies between 9.8ms and 12.9ms using the SW1000XG MIDI input, between 10.6ms and 13.7ms using the serial port drivers of my Midisport 8x8/s, and between 15.3ms and 17.7ms using the USB drivers of my Midisport 8x8/s. Unlike the measurements taken using XGedit95, there were no occasional higher figures even when using a USB-based MIDI interface, which seems to prove that the GSIF drivers do indeed exhibit a low jitter of 3ms. Playing GigaStudio will therefore feel comparatively 'tight', even though the overall latency is somewhat higher than a hardware synth (on my system anyway).
A huge number of musicians now rely on ASIO drivers to provide them with low enough latency to monitor their audio recordings, as well as to play soft synths, both in 'real time'. To measure their performance I once again used my Echo Mia and the stand-alone version of NI's Pro 52, but using the ASIO drivers option with a 128-sample buffer size at 44.1kHz, which equates to a theoretical latency of 2.9ms. This time I also recorded the real-world audio latency (as described in the box), which measured 5.0ms. When playing Pro 52 I measured a range of readings between 7.8ms and 9.7ms including the serial MIDI interface component, which is an excellent result. As a cross-check I repeated the measurements when running Pro 52 as a VST Instrument inside Cubase 5.1 with the same ASIO driver settings, and they proved to be identical.
Compared with the GSIF measurements using the same interface, running Pro 52 using ASIO drivers produced better figures for both latency and jitter, whether stand-alone or as a VST Instrument. However, whereas GigaStudio160 maintains its performance when you ramp up the polyphony as far as 160 notes (assuming your PC hard drive can manage this), most musicians already know that as you increase the load with ASIO drivers you often have to increase the buffer size to avoid glitching.
With this in mind I took some more measurements, first with a buffer size of 256 samples, giving a theoretical latency of 5.8ms (and with my Echo Mia a measured real-world latency of 8.0ms). This time the MIDI to audio latency varied from 10.3ms to 15.5ms, showing that the jitter component had risen to nearly 5ms, compared with 3ms for GSIF at the same latency. For those whose PCs can't for one reason or another manage this buffer size and have to work at 512 samples, although the reported latency inside Cubase is still just 11.6ms (real-world 13.7ms), the actual MIDI to audio variation measured from 16.0ms to 27.4ms — a huge amount of jitter.
I also had M Audio's USB Duo in for review, which gave me the chance to test out a modern USB audio peripheral for real-world latency as well. Its lowest ASIO setting was reported by Pro 52 as 441 samples, which should equate to exactly 10ms at 44.1kHz, and my real-world audio latency measurement was 11.7ms at this sampling rate — an increase of just 76 samples, which is better than the figure my Echo Mia achieved.
However, things were not so rosy when it came to playing soft synths through the USB Duo. I decided to disable the USB drivers of my Midisport 8x8/s and use the SW1000XG MIDI input, to minimise any disruption due to having two USB devices running simultaneously. Even with this basic configuration, however, the measured time between key press and soft synth audio output varied wildly between 13.6ms and anything up 32ms. I took dozens of measurements, running Pro 52 both stand-alone and inside Cubase VST, double-checked everything I could think of including the Duo's USB thread priority, and nothing improved matters.
The only thing I can currently surmise is that the USB audio activity had somehow caused the MIDI data to be delayed. This is worrying, since my USB MIDI interface performed well in isolation, although with a slightly higher latency and jitter than using the same interface with the serial port, and the M Audio Duo also performed well for audio recordings, with no glitching when used in full-duplex mode with 10ms latency at 44.1kHz. However, I intend to get to the bottom of the cause and report back shortly with more details.
You might expect that determining audio latency is easy: you just open the appropriate Control Panel dialogue window, either from inside your music application or directly as a utility, and read off the buffer size in samples, which is then converted to a latency value depending on the current sample rate. For instance, 512 samples at the most commonly used sample rate of 44.1kHz equates to 512/44100 seconds or 11.6ms (commonly rounded up by most software packages to the nearest whole number, in this case 12ms), while 128 samples gives a latency of 2.9ms, normally displayed in software as 3ms.
However, real-world latencies are slightly larger, due to other delays such as those introduced by the A-D and D-A converters. As a consequence I came up with a simple way to measure the real-world audio latency of your soundcard, including all the extras. This is also an ideal way to check whether or not your soundcard buffer size and latency are being reported correctly by your software.
This time you don't need any special leads, but it's helpful to have an external MIDI synth to use as a signal generator. You need a pulse with a fast rise time, which you can create starting with a square or pulse wave, ideally not passed through any filters — if this is unavoidable open the filter frequency to maximum and reduce resonance to minimum. Set any envelope up for fastest attack, and make sure any built-in DSP effects are disabled or turned down.
Now you need a full-duplex music application with audio monitoring functions. I used Cubase VST, and set its monitoring to Record Enable type. Route your hardware synth's output to your soundcard's left input channel only, and then connect the soundcard's left output socket to its right input socket. If you're using a mixer for your connections, don't make the mistake of leaving the left soundcard output panned left, since you'll set up a feedback loop.
Next, make a recording playing several single high synth notes. This will contain the original sound on the left channel, while on the right will be the same sound after passing through the A-D converter, a set of soundcard audio buffers on the way in, a second set on the way out, and finally the D-A converter. You can now directly measure the time delay between their two start points, which is the real-world monitoring latency. You can divide this figure by two to get the audio output latency, which is the only part of the chain used for soft-synth playback.
There's absolutely no jitter in these measurements, and as long as you're careful to position a marker at exactly the same point on each channel, your results will be accurate to a single sample. For a 128-sample buffer with a theoretical latency of 128/44100 or 2.90ms, my Echo Mia measured 10.07ms, or 5.03ms each for both recording and playback — this 2.13ms difference is 94 samples at 44.1kHz — and subsequent checks showed that whatever the buffer size or sample rate, there was a constant additional delay of 94 samples to both input and output. So, for playing soft synths in 'real time' only the 2.13ms of the playback chain is added, while when monitoring audio inputs in 'real time' there is a total of 4.26ms added to the signal chain, over and above the audio buffers.
Whether you've followed all the theory here, or just cherry-picked the results that interest you, there should be plenty of food for thought. Although many musicians rely totally on the latency figures reported by software utilities, the truth can be considerably more complex, involves many more factors than the size of the soundcard buffer, is sometimes prone to being slightly misreported, and is sometimes totally at odds with reported values.
Both VST and DX Instruments now offer sample-accurate playback timing inside host applications like Cubase VST, Logic Audio and Sonar, while soundcard buffers now offer latencies down to 3ms, and sometimes 1.5ms or less (although given the number of interrupts these lower figures generate, they are generally not practical, due to the increase in CPU overhead). However, playing a soft synth in real time can never offer sample-accurate timing, and my initial measurements show that you can expect anywhere between 3ms and 15ms to be added onto the reported soundcard latency before you hear a note, simply due to the MIDI interface and Windows operating system. This rather explains why some musicians claim to find soft synths sluggish even when their soundcard buffer is set to a typical 10ms — the truth is that this setting may well result in a 25ms latency between playing a note and hearing the result, which falls into the 'noticeable' area. If your PC isn't particularly well set up and has various background tasks running, you may occasionally experience even more sluggish notes.
Although many musicians complain that MIDI is inherently flawed, since an eight-note chord will emerge as eight notes spread over 8ms, the reality is that it's almost impossible to hear this in a real-world situation. However, once you take into account the other timing uncertainties that may further spread the notes once they enter the MIDI interface, there's a much greater chance of a performance being compromised.
PCI MIDI ports do seem to offer the tightest timing option for the PC musician, followed by devices that use the serial port or parallel ports, while the USB port currently provides the 'loosest' performance. Even those interfaces using special technology like Emagic's AMT and Steinberg's Midex ranges improve playback timing only — they cannot bypass these uncertainties during recording.
If, like me, you have a MIDI port on your soundcard, as well as a larger eight-in/eight-out interface connected via another type of port, you could perhaps connect the PCI port to your master keyboard to capture the tightest performances. However, it's also important to keep these timing measurements in perspective. Most musicians can easily adjust to latencies even as high as 15ms, as long as they are reasonably consistent — it's the jitter that tends to be more problematic, as this determines the amount of 'looseness'.
Emerging technologies such as FireWire/mLan and USB 2.0 already promise much greater bandwidth for transmitting MIDI data, and Steinberg's VST System Link can connect together multiple computers running Cubase VST, Cubase SX and Nuendo with sample-accurate MIDI timing. However, as long as we rely on the traditional MIDI keyboard and interface to record our performances in the first place we could still be subject to the majority of delays and uncertainties discussed here. Whatever happens in the future, I shall certainly start including both reported and measured values for audio latency in my soundcard reviews. After all, we all live in the real world.
In Part 2 of this feature I'm hoping to include some more feedback from manufacturers, and I'll also be looking more closely at the situation for musicians using Windows XP, since many are now moving across with the promise of lower latencies using WDM-format drivers inside Sonar and the release of the Windows 2000/XP-only Cubase SX. Since these Windows platforms are completely different from the Windows 95/98/ME family, they have thrown up another raft of possible timing problems for musicians.