The issue of latency, or monitoring delay, is one that still seems to cause lots of confusion among people starting out with computer-based recording. In this article, I'll explain what latency is, why it happens, and what the options are for dealing with it.
In a nutshell, latency is a delay between audio or MIDI going into your computer, and sound coming out. When your microphone, guitar or keyboard is plugged into your Mac or PC, its output doesn't appear at the speakers instantly: there are a number of stages at which its progress gets halted temporarily, and the result is that what you hear from the speakers or headphones is slightly 'behind' what went in. If the cumulative effect of all these stoppages is too great, the delay becomes noticeable and, beyond a certain point, can make it very difficult to play or sing in time.
So what are the bottlenecks that cause these delays, and can't they be eliminated? To answer the second question first, some of them can be eliminated, and others can be reduced. However, there will always be some input-to-output delay in any digital recording system. The reason for this is that the first and last stages in the trip your audio takes are through analogue-to-digital and digital-to-analogue converters respectively, and all A-D and D-A converters cause a delay. In other words, the process of creating a digital representation of an analogue signal takes time, and so does the process of turning a set of numbers back into a continuously fluctuating voltage (that is, an analogue signal).
The good news is that the amount of delay the converters are responsible for is small. Typically, A-D and D-A converters together delay the signal by considerably less than five milliseconds (ms), and it's generally agreed that most people's perceptions are not sharp enough to notice a delay of this order. In fact, this is the reason why latency is not usually thought of as a problem in digital mixers, or stand-alone digital multitrackers. Digital mixers do have latency, but usually, this is only to the extent that their converters have latency, which is small enough to be imperceptible in most circumstances. It is worth noting, though, that there are situations where even a couple of milliseconds' delay can be a problem: for instance, if you use a PA rather than headphones to allow musicians to hear themselves, you usually get some spill into the mics. If that PA is being fed from a digital mixer or recorder, you may get 'comb filtering' between the spill and the direct sound.
There's No Getting Away From It
In this article I've been considering latency as it affects audio recording. However, latency can also be a problem for software synths — and if it is, there's actually much less you can do about it. Since the sound you're hearing is actually being produced by the computer, workarounds such as using an analogue mixer are obviously not feasible. Your only option is to try to work with the smallest buffer sizes possible, to minimise the delay between pressing a key on your MIDI keyboard and hearing the output of the soft synth. You might find that changing some settings in your soft synth and DAW make it work better at lower buffer sizes: for instance, sample-based synths may be less prone to glitching if you set them to load all their samples into RAM rather than streaming them from hard disk. If you can't achieve a comfortable latency setting without occasional glitches in the output from your soft synth, this may not be a problem. After all, as long as the system is correctly recording the MIDI data you're entering, you can always increase the buffer size on playback after you've finished recording it, and the glitches will go away.
So how come delay can be such a problem in computer setups? After all, aren't digital mixers basically just computers without keyboards? The fundamental problem is that computers have a whole lot of processes going on inside them at any one time, and overseeing the input and output of audio is only one of these. Yes, digital mixers and multitrack recorders are computers too, but they're computers that are designed with one purpose in mind. You could draw a crude analogy by imagining a teacher who has to look after a class of 30 pupils, and a private music lesson with one student: the teacher with one pupil can give him or her undivided attention, because there's no need to worry about the other kids throwing paper planes at the back of the classroom.
In other words, a computer can only 'look' at what's coming in from the outside world at intervals. When that computer is designed and dedicated to the job of recording audio, these intervals can be very, very small. However, when it's a generic Mac or PC, running a complex modern operating system and quite possibly a bunch of other programs too, the intervals have to be larger. To deal with this, data coming to and from the audio hardware is 'buffered'. Instead of passing every sample on to the CPU as soon as it's received, the soundcard builds up a little pile of them, so that the CPU can deal with a bunch of samples every so often, rather than taking each as it arrives. A similar system operates on the way out, too, and it's this buffering process that adds extra delays to the signal's path through your computer.
Usually, the size of these buffers is measured in samples, and the input and output buffers are the same size. You can work out how much delay they cause by taking the number of samples, multiplying by two, and dividing by the number of samples per second at the current sample rate. So, for instance, if you're recording at 44.1kHz in a computer system with input and output buffers set to 512 samples, the delay caused by buffering, in seconds, will be 512 x 2 / 44100, or 0.023. Add the delay caused by the converters, and you reach a figure of 25 milliseconds or more. A delay of this order will definitely be noticeable: if you listen to the audio output from your computer alongside the original sound, there will appear to be a slapback delay. It might make your singer sound like John Lennon, but not everyone wants that.
Finally, as if this delay weren't enough, there is another possible cause of latency in software-based recording systems. Most plug-in effects and processors don't delay the signal coming through them, but some compressors, EQs, convolution processors and limiters depend on a technique known as 'lookahead'. In other words, they look into the future, but the only way they can do this is by delaying the incoming audio stream. For example, a lookahead limiter might not decide what to do with the sample at position 'X' until it has inspected several hundred samples that follow it. This means it has to wait until it has received those samples from the host program, so the entire signal is delayed.
What's more, plug-ins running on DSP cards like the TC Powercore, Universal Audio UAD1 or Creamware Scope cause delays, even if they don't use lookahead. That's because when your DAW software sends audio to one of these cards to be processed, it has to be buffered in and out, just as if it was going to the outside world.
So, if you're recording, say, a vocal, and you put a plug-in that uses lookahead or runs on a DSP card across that track, you'll be hit with another delay. The same applies to plug-ins inserted across the whole mix, or any group channels to which your inputs are routed. And if plug-in delay compensation is switched on in your DAW (which ensures that all signals arrive at the output in sync), using one of these plug-ins anywhere in the mix will cause everything to be delayed by that plug-in's lookahead or buffer time.
As we've seen, the delay between an audio signal entering your computer and coming out again can be accumulated from several different sources (see diagram, left):
The time taken to convert an analogue signal to digital, and to convert the digital output back to analogue form.
The delay caused by the soundcard buffering the information it receives before it's read by the CPU, and by the parallel buffering process on the way out.
The delay caused by buffering audio data to and from any DSP cards that are being used in the project.
The delay added by any plug-ins that use lookahead techniques.
So if it all gets too much, what can you do about it? Much of the confusion surrounding latency has to do with the efforts that soundcard manufacturers and software designers have made to eliminate the problem, and the often inaccurate language that is used to describe the various possible solutions. In essence, there are three possible approaches, and each has its own pros and cons.
The most drastic way of avoiding problems with latency is to completely give up on monitoring your audio inputs via the computer. Instead, an alternative monitoring path is constructed in the analogue domain. The signal that's produced at your microphone or guitar is an analogue one, as is the signal passing down the wire that feeds your speakers or headphones: so a 'short circuit' that allows you to route the one directly to the other will enable you to hear your sources with no delay whatsoever.
Monitoring your inputs in the analogue domain is the only way to hear them with no latency at all, and if you have the right analogue hardware, it's straightforward to do. For example, if you use an analogue mixer as a 'front end' for a computer-based recording system, your headphones would typically be attached to the main stereo output, or to one of the auxiliary outputs, while direct outputs from the individual channels feed the inputs of your soundcard or audio interface. The outputs from your DAW system then feed further channels on the mixer. All you need to do in order to monitor those input channels directly is create a routing from those channels to the headphone bus, as well as sending them out of their direct outputs. A few digital audio interfaces, such as Digidesign's M Box, also contain analogue circuitry allowing you to set up this kind of direct monitoring, but this is unusual.
Using an analogue mixer (see diagram, right) is one of the most popular ways of setting up monitoring, and as well as being the only way to create a truly latency-free monitor mix, it has some other advantages. Principal among these is that, depending on what facilities your mixer offers, it is easy to set up one or more different balances of individual inputs with the pre-recorded material being output from your DAW. You'll be able to set up different monitor balances for different musicians. It is possible to do this in the digital domain too, but this is the simplest way of doing so and, in the heat of a recording session, the ability to set up and modify monitor mixes quickly and easily is valuable.
There are, however, down sides to this approach. One is that you need a mixer to use it, and lots of people either don't have space for a mixer, or have other needs that would be better met by alternative choices such as stand-alone preamps and monitor controllers. Another is that you can't monitor with effects, unless these are applied in the analogue domain. So if your singer can only pitch properly when he hears himself with a Grand Canyon-style reverb, this approach is only viable if you invest in a hardware effects unit.
A third problem is that if you monitor in the analogue domain, you don't hear any problems that happen in the digital domain. So if your soundcard is introducing clicks and pops, you might not find out until it's too late.
A fourth is that not all DAW programs make it particularly convenient to work in this way. If you're monitoring your inputs in the analogue domain, you don't want to hear a duplicate of those inputs in the digital domain, so the DAW tracks you're recording to need to be muted. However, when you're playing back what you've just recorded, you'll want those tracks to be unmuted again. Most DAWs offer the facility to automatically mute record-enabled tracks during recording, but not all do, so you may find yourself having to hit the mute button rather a lot, which can be tedious!
A fifth and final problem is that monitoring via an analogue mixer means that your monitor inputs are always active unless you remember to physically mute them on the mixer. This is distracting at best, and can lead to problems with feedback if you're recording with microphones in the same room as your monitor speakers.
Of the four possible sources of latency listed earlier, the delay caused by the A-D and D-A converters is usually the only one that's set in stone but, as we've seen, it's not enough to be a problem in most circumstances. The two latter sources — DSP-based and lookahead plug-ins — are easily eliminated by the simple expedient of not using those plug-ins in a project until the mix stage. A cheap-and-cheerful reverb should be fine for tracking purposes, for example, even if it isn't an appropriate choice when mixing
That leaves delays caused by buffering incoming and outgoing audio between soundcard and computer. These delays can be controlled, at least up to a point, by reducing the buffer size. A modern Mac or PC with suitable audio hardware should be able to work with buffers as small as 64 samples. With the A-D and D-A conversion delays added in, the total system latency will then be as low as five or six milliseconds, which is well below the threshold where most musicians perceive it as an audible delay. If you can set your system up like this, you get the advantage of being able to monitor live inputs with effects (as long as they're not DSP card or lookahead plug-ins), and the reassurance that what you're monitoring is exactly what is being laid down to your hard drive. When the sound you're recording is actually produced or shaped by the computer, such as when you're playing through a software guitar-amp simulator, this is really the only option.
Even today, not everyone finds that they can reliably operate their music-recording machines at very low buffer sizes. For example, even though my main music computer is a modern 2GHz Centrino laptop, running the latest version of Windows XP, I've never yet found a Firewire or USB audio interface that will work reliably with it at latencies below 20ms or so. And even if your computer can operate reliably at a 5ms latency, doing so will probably place a much heavier load on the CPU than working with a larger buffer size.
For this reason, nearly all modern audio interfaces offer a feature that is variously called 'low-latency monitoring', 'near-zero latency', 'direct monitoring' or (wrongly and confusingly) 'zero latency'. Behind the array of different names, the principle is always the same: inputs that you want to monitor can be routed directly to outputs, but this time, the routing happens in the digital domain. In essence, the input signal from your microphone or guitar passes through the A-D converter, and is then split. One copy goes to the computer to be recorded. The other is passed straight to the output of the soundcard, where it's converted back to analogue.
This means you can hear your input signals without having to wait for them to be buffered in and out of the computer. However, they still have to pass through the A-D and D-A converters, which is why this method doesn't give true zero-latency monitoring.
Compared with using an analogue mixer, this approach has some advantages. For one thing, you don't have to buy an analogue mixer! For another, it gives you full recall over your monitor mixes, with all the flexibility and savings of time that brings. For a third, it also means you're eliminating at least some of the ways in which what's going to hard disk can differ from what you're hearing, though it doesn't eliminate all of them.
However, it also shares one of the key negative features of analogue monitor mixing. Usually, 'direct monitoring' means monitoring without effects (though a few soundcards feature built-in DSP effects that can be applied to a directly monitored signal). Also, the process of setting up direct monitoring tends to be more complex and less hands-on than using an analogue mixer, which can be a problem when your singer suddenly hears full-scale digital noise in her headphones, and you don't know where it's coming from!
Something else that I cited as a disadvantage of analogue monitor mixing is its lack of integration with DAW software. With an analogue mixer, your monitor channels will always be live unless you physically mute them on the mixer; and conversely, not all DAWs can automatically mute tracks during recording only. This, in theory, is an area where 'direct monitoring' can bring advantages, but unfortunately, it's another area where confusion is rife.
The confusion arises because two separate programs are often involved in setting up direct monitoring: the DAW itself, and the 'control panel' utility that allows you to change settings on your audio interface. If your interface supports direct monitoring, you will find that the control panel offers the ability to activate it, and, in many cases, to specify a routing of inputs to outputs. The sophistication of this control panel software varies enormously. Some simple interfaces just allow you to turn direct monitoring on and off, with no control over routing. At the other end of the scale, utilities like RME's Total Mix and MOTU's Cue Mix include heavy-duty mixer functionality, allowing you to route inputs to your heart's content, and potentially to create multiple monitor mixes and route them to different hardware outputs. This is great, but it's still possible to end up in the situation where your monitor mixing arrangements are entirely separate from your recording software. You don't want to have to be constantly switching from your DAW to the control panel every time you want to mute a monitor input, and nor do you want to hear duplicates of your directly monitored inputs emerging from your DAW.
The idea behind Steinberg's ASIO Direct Monitoring protocol (see diagram, left) is to provide a way for the DAW to tell the control panel what to do. The amount of control that ASIO Direct Monitoring gives to the DAW application is limited, but the key feature it enables is allowing the DAW to switch direct monitoring on and off for each input. This means you can turn monitoring off and on for any input without leaving your DAW. In theory, it also means you never need to manually mute inputs or playback tracks, because your recording application will offer you a choice of automated approaches to making inputs and recorded material audible. Cubase, for instance, allows you to have monitoring switched on automatically on tracks that are actually recording, or on all tracks that are record-enabled, whether you're actually recording or not. A third option emulates the behaviour of an analogue tape machine, allowing you to hear the inputs when recording and when the transport is stopped, but not during playback.
The ASIO Direct Monitoring protocol also covers pan and volume, though not all hardware manufacturers implement this part of it. If your audio interface does, it means you can set up a basic monitor mix using the pan controls and channel faders in your DAW mixer, rather than having to do it in the control panel.
For all the problems of latency, it's something that computer-based musicians have to learn to live with and work around. I hope this article will help you to do so, and also that it has dispelled some of the myths surrounding the issue.