You are here

Digital Problems, Practical Solutions

Getting The Best From Digital Audio By Hugh Robjohns
Published February 2008

Designers overcame most inherent problems with digital audio long ago, but for many of us, the experience still isn't perfect. This in-depth guide tells you why - and what you need to know to put things right...

Digital Problems, Practical SolutionsThe analogue/digital debate has been running so long that many of the arguments have completely lost any basis in technical fact and have become urban myths. In this article, I'll correct some of the basic misconceptions and (via the SOS web site), provide some audio examples to demonstrate the reality. Along the way, I'll hopefully help you to pinpoint any problems you might be experiencing in your own system.

Digital Versus Analogue

Digital and analogue systems are designed to do the same job, but they do it differently and therefore have different characteristics. We could liken the situation to the choice of a petrol or diesel engine for your next car: both can power the car adequately to get you from A to B, but the experience is different, so some people prefer one to the other, and one is often better suited to some situations than the other.

The same is true of analogue and digital recording, mixing or processing. In some cases the inherent distortions of analogue recording bring something extra that complements the music, while in others it might distract. Sometimes the greater dynamic range of digital systems benefits complex productions where analogue systems would descend into a mush of mix-bus noise. It comes down to horses for courses — there's no absolute right or wrong, good or bad. But we must understand the practical and technical limitations of both systems equally, appreciate the strengths of their creative contributions, and learn when to choose one over the other.

Sampling Confusion

Perhaps the biggest misconceptions about digital audio surround the issue of sampling — probably to due over-simplifications introduced by most tutors when trying to explain inherently complicated concepts.

The classic mistake is to compare digital audio with film. Audio sampling does not work in the same way as film: while film is a sampling system of sorts, the sampling rate is woefully inadequate, and as a result, film creates only an illusion of movement. By capturing and showing a relatively slowly changing sequence of static pictures, film and TV rely on the persistence of vision and the interpretative abilities of the brain to construct (not reconstruct) an impression of life-like movement. Film does not — and cannot — convey all the information embodied in the light waves reflected from the source scene to reconstruct them accurately. It doesn't even convey accurate movement (hence the wheels on the stage coach often appearing to go backwards). In fact, you only have to look away from the screen to perceive the flickery nature of what is being shown, and to realise its very poor relationship to reality.

In contrast, sampled audio is (potentially) a totally accurate and faithful means of conveying a complete audio signal from one place to another. I won't wade through the sums, but there is a very elegant and irrefutable mathematical proof for the sampling process. Furthermore, sampling is used in a very wide range of industries outside our own domestic and professional audio interests, many of them in life-critical applications!

The bottom line is that sampling works, and it is theoretically a perfect process: you get back exactly what you put in, provided the system is engineered adequately, and real-world weaknesses are purely down to failures of implementation — and there were certainly some serious weaknesses (and optimistic over-selling!) in the early days of digital audio, some 25 years ago. The memory of many of those technical weaknesses still haunts the industry today, even though advancing technology removed most of them years ago.

Perfectly Modulated

Sampling is a simple modulation process —an amplitude modulation (AM) process, to be precise. Just like AM radio, it can convey audio accurately from one place to another via an appropriate medium. We all listen to the radio and know it works, so why doubt exactly the same process when it is used in digital audio equipment?

An audio signal at 5kHz creates 'images' or 'side-bands' at 35kHz and 45kHz when sampled at a rate of 40kHz.An audio signal at 5kHz creates 'images' or 'side-bands' at 35kHz and 45kHz when sampled at a rate of 40kHz.Perhaps the need to constrain the audio bandwidth in order for the modulation process to work properly feels strange because it isn't something we ever had to think about with analogue systems. In the case of AM radio, the audio bandwidth is typically limited to about 4kHz, while basic rate digital sampling at 44.1 or 48kHz takes that up to a little over 20kHz.

Radio broadcasting uses a continuous sine-wave carrier signal, and we modulate the amplitude of that continuous high-frequency carrier with the wanted audio signal. In digital audio, we use a series of brief pulses running at the sample rate instead of a continuous carrier, but we still modulate the amplitude of that pulse stream in the same way. Apart from the fact that one system is time-continuous and the other chopped up, the two processes are mathematically identical, and the created 'side-bands' are essentially the same.

Whether we're talking about AM radio or digital audio sampling, the requirement to limit the audio bandwidth is because of the 'side-bands' that all modulation processes produce. These side-bands are the sum and difference products of the wanted audio signal frequencies with the carrier frequency. Essentially, the process of modulation creates 'images' of the wanted audio either side of the carrier frequency — the upper and lower side-bands in radio parlance, or the upper and lower images.

Modulation essentially adds the wanted audio signal to the carrier frequency to create the upper side-band (in effect, the audio is frequency-shifted up to the carrier frequency), and also subtracts it from the carrier frequency to create the lower side-band (producing a mirror-image of the audio extending down from the carrier). In radio broadcasting these side-bands are transmitted through 'the ether' to your radio — side-bands either side of 198kHz in the case of BBC Radio 4 long-wave. The radio receiver detects these transmitted side-bands and 'demodulates' them to extract the wanted audio again. It's a system we've been using for almost a hundred years and we all take it for granted.

In the case of AM radio, the size of the radio side-bands determines the frequency spacing of the adjacent radio channels.In the case of AM radio, the size of the radio side-bands determines the frequency spacing of the adjacent radio channels.A full-range audio signal generates images or side-bands which must be kept spectrally separate from the source audio.A full-range audio signal generates images or side-bands which must be kept spectrally separate from the source audio.With radio, the bandwidth of the audio (and thus the size of the side-bands) determines the minimum spacing of adjacent radio carriers: the more tightly constrained the audio, the smaller the side-bands and the more radio channels you can fit into a given area of radio spectrum. That's why AM radio channels are in 9kHz increments (10kHz in the US) and have an audio bandwidth of only about 4kHz.

In digital audio the side-bands are an unwanted side-effect of the sampling process, but they still determine the audio bandwidth we can use: if we sample an audio signal at 44.1kHz, a pair of side-bands is produced extending above and below that (carrier) sample rate. We aren't going to use them, but they're there nonetheless, and we have to remove them when we output the wanted audio (otherwise they could fry the tweeters in your speakers, or cause distortions in equipment that can't handle such high-frequency content cleanly).

Unfortunately, unlike the side-bands created in radio, these ones are pretty close (in spectrum terms) to the wanted audio, simply because the sampling rate is relatively low. If we sample a 20kHz signal at 44.1kHz, for example, the lower side-band will appear at 24.1kHz (44.1kHz minus 20kHz). The wanted audio (20kHz) and the unwanted lower side-band (24.1kHz) are less than a quarter of an octave apart.

To remove the unwanted side-bands without affecting the wanted audio, we need a very steep low-pass filter (the 'reconstruction filter'). In the example above, if the 20kHz signal is near peak level, and we want the side-bands to be attenuated below the noise floor, we need a filter that reduces them by more than 100dB in less than a quarter of an octave. The steepest filter on most audio mixers has a slope of 18dB per octave, so you can appreciate the scale of the challenge!

One solution is to sample at a much higher rate, so that the side-bands are further away from the wanted audio. However, the higher the sampling rate, the more data you have to store or process — and even 44.1kHz was hard enough to accommodate 25 years ago when digital audio started to become practical. The low-pass filters used in the earliest digital products weren't really up to the job of filtering out side-bands at the low sample rates of the time. Not only did they often fail to remove the lower side-band adequately, but they also affected the wanted audio in a way that had unpleasantly audible side effects — harsh, clinical, scratchy... I'm sure you've heard these descriptions!

A very steep low-pass filter is needed to remove the unwanted side-bands when reconstructing the analogue output.A very steep low-pass filter is needed to remove the unwanted side-bands when reconstructing the analogue output.The filtering challenge isn't restricted to the output side of a sampled system, and we need a virtually identical filter on the input: the 'anti-alias' filter (see 'Aliasing' section) must be there to make sure that the system operates correctly. You can't remove the unwanted side-bands if they're allowed to overlap the wanted audio, so we have to keep the two separated. To do that we need to choose a sample rate that is at least twice the highest frequency that we want to sample. In that way the lower side-band can't extend down in frequency so far that it overlaps the wanted audio — but we have to make absolutely sure that the audio going into the system never extends to more than half the sample rate. In other words, we have to ensure that we stick to the self-imposed rule of not having any input signal higher than half the sampling rate, so that the resulting lower side-band doesn't reach down as far as the input audio. This second low-pass filter at the input will suffer the same audible and technical issues as the output (reconstruction) filter: a double whammy!

Fortunately, technology has advanced and almost all modern digital equipment uses 'delta-sigma' converters. These operate internally at much higher sample rates than that at which the system is running, and the steep low-pass filtering (which is still needed) is performed entirely in the digital domain, where it's far more accurate and has minimal audible side-effects.

Audio Examples!

As mentioned in the main text, we've placed a number of audio files on the Sound On Sound web site that demonstrate some of the theory and problems described within this article. So if you want to know what the different snaps, crackles and pops sound like, or to hear an exaggerated example of what dither actually does, go to www.soundonsound.com/sos/feb08/articles/digitalaudiofiles.htm.

Joining The Dots

Another common misunderstanding is caused by a widely-used diagram showing a sampled waveform. Each sample is shown as a thin bar, with adjacent samples then appearing as 'steps', and people mistakenly associate this 'stepped' appearance with quantising errors — even though sampling and quantising are entirely separate and independent elements of the process, and no quantising has yet taken place! The 'steps' on the diagram aren't there because something is missing, but because something has been added — something we don't want or need, and which the reconstruction filter removes: the 'steps' are actually created by the addition of the side-bands, nothing more and nothing less.

The stepped nature of the reconstructed samples is due to the presence of high-frequency images.The stepped nature of the reconstructed samples is due to the presence of high-frequency images.The higher the sample rate, the higher the frequency of the images, and the smaller the steps. Intuitively, it is probably clear that the higher the sample rate, the easier the reconstruction filter's job becomes.The higher the sample rate, the higher the frequency of the images, and the smaller the steps. Intuitively, it is probably clear that the higher the sample rate, the easier the reconstruction filter's job becomes.If you redraw the graph with a much higher sample rate, the 'steps' appear smaller, and people mistakenly believe this is proof that sampling at higher rates is more accurate. It isn't: it actually shows that the reconstruction filter has an easier job at higher sampling rates, because the unwanted side-bands are moved further away from the wanted audio.

In fact, it doesn't make any difference whether we sample a 100Hz tone at 250Hz, 2500Hz, 25,000Hz or 44.1kHz: provided the reconstruction filter is properly engineered we'll always get back the same 100Hz tone and nothing else. There are two audio files on the web site that demonstrate this: one (100Hz_44.mp3) is a 100Hz sine wave sampled at 44.1kHz, and the other (100Hz_8k.mp3) is the same signal sampled at 8000Hz. Assuming your player is engineered properly, there should be no audible difference between the two, as both sample rates satisfy the Nyquist (Shannon) criterion of being at least twice as high as the source signal bandwidth. However, it is obviously a lot easier to design a filter that rolls off above 101Hz to remove side-bands centred on 44.1kHz than it is to make one that achieves the 100dB attenuation between 101Hz and 150Hz that you'd need for the 250Hz sampling rate!

So the size of the steps in that sampled signal diagram simply reflects the challenge facing the reconstruction filter: the higher the sample rate, the easier the filter's job becomes. Once those unwanted side-bands are removed, we're left (filter designs permitting!) with the original audio, with nothing added and nothing taken away. I wasn't lying when I said earlier that the sampling process is theoretically a perfect one!

Aliasing

Cast your mind back to the 'wheels going backwards' effect I mentioned earlier when comparing film with audio sampling — an effect caused by the sampling rate being too low (less than twice the bandwidth of the wanted signal). Now imagine a wheel with one spoke painted bright red. The wagon is moving and the wheel is rotating, and the camera takes a picture for the first frame of a film — let's say the painted spoke happens to be vertical, in the 12 o'clock position. The wagon continues to move and the wheel revolves several times before the camera takes the next picture (1/24 second later). This time the painted spoke happens to be at 9 o'clock, the next frame at 6 o'clock, and so on. When we replay the film it appears as if the wheel is rotating slowly anticlockwise, when in reality it was going clockwise at a much faster rate. The effect is called 'aliasing' because what we are seeing is false information — an alias.

If the sample rate is less than twice the audio bandwidth, the lower side-band will overlay the wanted audio, and high-frequency input signals will be heard as low frequencies.If the sample rate is less than twice the audio bandwidth, the lower side-band will overlay the wanted audio, and high-frequency input signals will be heard as low frequencies.Even after the reconstruction filter has removed the upper side-band, the part of the lower side that overlays the wanted audio remains, and is audible.Even after the reconstruction filter has removed the upper side-band, the part of the lower side that overlays the wanted audio remains, and is audible.In a properly designed digital audio system, aliasing shouldn't happen. The sample rate is at least twice the bandwidth of the wanted audio, and a steep, 'brickwall' anti-alias filter ensures nothing above half the sample rate gets in. In that way, the lower side-band produced is kept clear of the wanted audio. But what happens if we allow signals higher than half the sample rate into the system, or choose a sample rate that isn't more than twice the signal bandwidth?

The answer is that part of the lower side-band ends up overlaying the wanted audio, and becomes audible. Since the lower side-band is spectrally reversed, high-frequency source signals appear as lower frequency signals (aliases), the audible equivalent of the wheels going backwards. What's more, there's no musical relationship between the input and alias frequencies; the relationship is between the input frequency and the sample rate, and therefore sounds very discordant and unnatural.

A file on the SOS web site (aliased piano.mp3) demonstrates this with some simple piano music. After about 10 seconds you'll hear a glitch, which is when I started reducing the sampling rate from the standard 44.1kHz. The reduction is through a series of switched settings and you'll hear some being introduced through the rest of the piece. As the sample rate comes down, the high piano harmonics start to appear at lower discordant frequencies, and the effect becomes stronger as the sample rate is reduced further. In the end, with the sample rate down to about 6kHz, the beautiful piano sounds like a very nasty electronic harpsichord sample!

As I said earlier, this kind of problem shouldn't happen with well-engineered equipment (at least not within the analogue-digital conversion stage), but it can happen by accident if digital signals are passed between equipment operating at different sample rates, or if sample-rate conversion isn't performed properly.

If you've ever received a greetings card with a voice message you'll have heard the effect of aliasing caused by improper sample rate conversion. The original sound has been sampled at a very low sample rate (to save data) without implementing the appropriate anti-alias filter, so frequencies higher than half the sample rate have been allowed into the digitising process, resulting in aliases. This problem also commonly occurs in cheap computer games and some Internet video and audio clips.

Computer Clicks, Pops & Glitches

When you connect a computer into your digital audio setup, a whole raft of problems can arise, mostly relating to audio clicks, pops and gaps (terms that generally refer to the length of the interruption, from a single sample to a sizeable chunk of audio). Problems tend to fall within three broad areas: mains supply, recording, and playback.

A low buffer setting may be needed for low-latency applications (such as playing VST instruments live), but as your mix gets busier you may find you need to increase the buffer size to prevent unwanted glitching. The trade-off is that higher buffer settings increase latency.A low buffer setting may be needed for low-latency applications (such as playing VST instruments live), but as your mix gets busier you may find you need to increase the buffer size to prevent unwanted glitching. The trade-off is that higher buffer settings increase latency.Although the causes of interference riding piggyback on the mains supply can sometimes be tricky to track down, they're likely to affect both audio recording and playback, and their timing will bear no relation to anything happening in the music. Notice whether clicks and pops coincide with your central heating, oven, microwave or freezer switching on. If so, they may be nothing to do with your computer at all, instead requiring interference suppression at source, or more careful connection of your gear to the mains. Intermittent crackling problems are often caused by faulty audio or mains cables, so check them before blaming your computer. Continuous hums and buzzes may also relate to electric light dimmers, so keep these away from the studio as well.

Computer-related recording and playback problems may relate to clocking issues (discussed in the main text), but most are due to hardware or software inside your computer. Despite having reviewed over 84 audio interfaces, I've rarely run into this category of click and pop — largely because I choose the hardware components in my PC carefully for maximum compatibility with music hardware and software, and carefully set up my operating system with the same end in mind. Stick with recommended motherboard chip sets, avoid unusual expansion cards wherever possible, and if you have a Firewire audio interface make sure your computer features one of the Firewire controller chips recommended by the interface manufacturer. Mac users generally have an easier time here, simply because there are fewer hardware variations for audio interface manufacturers to test.

Software problems often stem from the audio interface RAM buffers being too small, and the data running out before the operating system can get back to top them up (playback) or empty them (recording). So if you hear even a single click in your audio recording or playback, it's probably due to something preventing the operating system from filling and emptying those audio buffers in time for smooth delivery to your audio interface. If those interruptions become more frequent, the isolated clicks and pops turn into occasional crackles, and eventually to almost continuous interruptions that sound like distortion as the audio starts to break up more regularly.

The most obvious cure is to increase the audio interface buffer size, and make sure you have the latest drivers installed and the recommended operating system tweaks in place. If you only get an occasional click, see if it coincides with a soft synth playing more notes and temporarily pushing the CPU 'over the edge', or a non-musical background task cutting in unexpectedly from time to time, that you can disable. Occasionally a rogue plug-in or soft synth can cause sudden processing 'spikes' as well, so to track down problems try temporarily disabling the plug-ins you're using, one at a time, to see if it cures the problem. Once you have compatible hardware and no unexpected software interruptions you should hear no clicks or pops until the audio buffer size is at 2ms — or even lower. Martin Walker

Jitter & Clocking

A lot of fuss is still made about jitter, but while it is potentially a serious issue it's rarely a practical problem these days — simply because equipment designers and chip manufacturers have found very effective ways of both preventing it and dealing with it.

Jitter is the word used to describe very short-term timing variations between one sampling moment and the next. In a 48kHz sampled system, the time between each clock pulse should be 20.83333333333 (recurring) microseconds. If the gap between some pulses is, say 20.80 and that between others is 20.85, we have timing errors, which translate into waveform amplitude errors — otherwise known as distortion.

This can happen at either the A-D stage, or the D-A stage, but it is more serious if it happens at the former, because those distortions are then locked into the digital signal. A jittery A-D clock means that the amplitude of audio samples is measured fractionally early or late, but stored as if taken at the precise required time. So these digitised sample amplitudes are really telling lies about the true amplitude at the required moment in time.

This extreme example of jitter shows how the first blue sample is produced too early and the second too late, with the result that the intended waveform (red) is distorted (purple).This extreme example of jitter shows how the first blue sample is produced too early and the second too late, with the result that the intended waveform (red) is distorted (purple).A similar problem afflicts the D-A converter because it is trying to reconstruct analogue samples from the digitised amplitude data. If it produces those sample amplitudes slightly early or late, again it is distorting the true waveform. The saving grace is that if the jitter can be removed (by using a better D-A converter, say), then the original data can be used to reconstruct the proper waveform.

If the clock jitter is entirely random, the resulting distortion will also be random, and a random signal is noise. Since a high-frequency signal changes faster than a low-frequency one, small timing errors will produce larger amplitude errors in a high-frequency signal. So random jitter tends to produce a predominately high frequency hiss. I've yet to hear that on any current digital system, though — clocking circuits these days are just too good for this to be a practical problem.

On the other hand, if the jitter variations are cyclical or related to the audio, the distortions will be tonal (similar to aliasing) or harmonic, and they'd tend to be far more obvious and audible. But I've not heard that on any current digital audio system either: other than in very low cost equipment with extremely inferior clocking structures, A-D and D-A jitter just isn't a practical problem anymore.

Another source of jitter (the strongest source these days) is cable-induced. If you pass digital signals down a long cable (or fibre), the nice square-wave signals that enter degrade into something that looks more like shark fins at the other end, with slowed rise and fall times. This is caused by the cable's capacitance (or the fibre's internal light dispersion), so the longer the cable, the worse the degradation becomes. That's why digital cables need to be wide-bandwidth, low-capacitance types.

This matters because most digital signals incorporate embedded clocks along with the audio data, and that clocking information is determined from the rise and fall between the data pulses. If the clocking edges are vertical, the clocking moments are obvious. However, if the clocking edges slope, the timing point becomes ambiguous — and we now have embedded jittery clocks!

When passing digital audio between one system and the next, the precise clock timing actually doesn't matter that much, as long as the average sample rate is the same for both. All that's needed is to be able to determine at each clock moment what the binary value of each bit is in the binary word.

However, when sampling or reconstructing an analogue signal, the clocking moments are critically important, as explained. So if a D-A relies on using the jittery embedded clocking information from its input signal to reconstruct the analogue output, there could be a problem with jitter noise or distortions. Fortunately, most modern D-As incorporate sophisticated jitter-removal systems to provide isolation between the inherently jittery incoming clocks embedded in the digital signal, and the converter stage's own reconstruction clock.

In most cases, A-D converters operate from an internal, low-jitter clock, and it is only necessary to use external embedded clocks when slaving multiple A-D converters. To minimise the potential for clock jitter, it is generally best to use the A-D converter's internal clock as the system master whenever possible. If you have to use external clocks, use the shortest and best-quality clocking cables between devices that you can, fed from the most stable clock master available.

Quantisation

A linear system has a linear transfer curve in which the relationship between input level and output level is proportional.A linear system has a linear transfer curve in which the relationship between input level and output level is proportional.The other big stumbling block in the digitisation process is the concept of 'resolution.' Again, a common way of describing quantisation is to show how the amplitude measurement of audio samples is inherently inaccurate, because of the clearly defined quantising increments. These measurement errors reduce with increasing word length (because there are more quantising levels, spaced more closely together), so while eight bits can only count 256 levels, 16 bits can count to 65,536 levels and 24 bits can count to 16,777,216 levels — so it seems obvious that 24 bits gives higher 'resolution' and is more accurate than 16 or 8 bits. This may be true, but it is also very misleading, because audio quantising isn't implemented in that simplistic way.

If you draw a graph to show the relationship between input signal level and output signal level (often called a transfer curve), and show what happens with an analogue system operating with unity gain, you get a straight line at 45 degrees. As the input level increases, the output level increases in direct proportion — which means we have a straight-line graph, and the system is described as being 'linear', or free from amplitude distortions.

With the simple quantising system, we get a staircase. As the input level rises, the output remains at a fixed level until the next quantising threshold is reached, at which point the level suddenly jumps to a new fixed output level. Clearly, this is very non-linear and the audible result is a distorted output. There are audio files on the web site to demonstrate this with simple piano music. The first file (Piano_16.mp3) is the original music taken from a CD, the second (Piano_8.mp3) is re-quantised to eight bits, and the third (Piano_3.wav) to just three.

A crudely quantised system has a stepped transfer curve in which the output level increases in quantised steps as the input level rises linearly.A crudely quantised system has a stepped transfer curve in which the output level increases in quantised steps as the input level rises linearly.As you can hear, the fewer the bits, the bigger the quantising steps, the more non-linear the quantised digitisation becomes, and the worse the distortion. At just three bits the piano is almost unrecognisable. You'll also notice (at least in the 8-bit version) that when the signal level is quite high the quantising errors are essentially random, and sound like noise, but as the level of the piano falls the errors become more identifiable as distortion. When the level falls below that of the lowest quantising threshold the output is completely silent— there's no background hiss (unless introduced by your monitoring system!).

In these examples, I've maintained the original source audio amplitude and just reduced the word length, in order to make the effects very obvious. But exactly the same effects will happen in a crudely quantised 24-bit system as the signal is decreased in amplitude to the point where it only crosses the bottom eight, or just the bottom three, bits. The distortion effects would then be less audible, simply because the signal level would be very low — but they would still be there, and that non-linearity is unacceptable.

Dither

Analogue systems don't distort as signals get quieter, and we don't want a digital system to distort either, so we need to linearise the quantisation process. The solution is a technique called dither, which everyone has heard of, but few understand.

Essentially, dither forces the quantisation process to jump between adjacent levels at random. It deliberately adds a noise-like signal (essentially a randomly changing signal) to the wanted audio, so that the quantised output constantly jumps between adjacent levels at random. The resulting transfer curve on a graph is a 'hairy' straight line, rather than a clean staircase: we now have a linear system with  a defined noise-floor, instead of a non-linear system without it. An A-D converter can be dithered by the electronic noise from its analogue input stage electronics, while the dithering signal in a word-length reduction process is usually derived from the truncated bits so that low level audio information is retained withinthe 'noise'.

Dither noise fills each 'quantisation step' to produce a linear system, but one which carries an inherent level of nose roughly equal to the amplitude of one quantising level.Dither noise fills each 'quantisation step' to produce a linear system, but one which carries an inherent level of nose roughly equal to the amplitude of one quantising level.An audio file (introduceddither.mp3) on the web site demonstrates the audible effect of dither. A simple sine wave is quantised with a low word length and the resulting distortion is quite audible. As white noise is introduced along with the tone, the distortion can be heard to reduce dramatically, but it reappears as the noise is faded down. It's important to note that the noise is not 'masking' the distortion products, but is linearising the system so that it no longer causes distortion.

In reality, the amount of 'noise' necessary to correctly dither the system is only that needed to occupy one quantising level, and while in the case of the 3-bit example the noise needs to be about -18dBFS (a lot of noise!) in an 8-bit system the amount needed would be about -48dBFS — only a little worse than an average cassette tape's noise floor. You can hear examples of correctly dithered 8-bit (ditheredpiano_8.mp3) and 3-bit (ditheredpiano_3.mp3) signals on the web site, with the same piano music as before. Note how there's no distortion at all, and the decaying piano notes fade smoothly into the noise floor, just as if you recorded at low level into an analogue system, even when quantised with just three bits! You can also hear the piano in the 3-bit version, even though it is well within the noise floor of the dithered system.

As the word length of a digital system is increased, the amount of dither noise needed obviously falls commensurately, so a 16-bit system requires dither noise at roughly -96dBFS (only slightly above the noise floor of conventional analogue equipment) and 20- or 24-bit equipment will usually be dithered more than sufficiently at around -120dBFS by the inherent noise of the analogue electronics feeding the converter inputs.

A useful twist to this dithering process is that the spectral content of the audible noise is less important than its statistical properties. Consequently, it is possible to 'shape' the spectral content of the dithering signal to make it less psychoacoustically audible. For example, if you reduce the amplitude of the noise in the mid-frequency range while increasing it at the high frequency end, it will have a more 'hissy' character but sound quieter overall.

Several manufacturers have taken advantage of this in products such as Sony's Super Bit Mapping, or Apogee's UV22. In the case of the Sony system (which was used on DAT machines and other digital products) the dithered signal would measure with a signal-to-noise ratio of around 93dB (as you'd expect for a 16-bit system), but sound as though it was around 20dB quieter, more closely resembling a 20-bit system.

Again, there are files on the web site to demonstrate this effect at 8-bit (noiseshapedpiano_8.wav) and 3-bit (noiseshaped piano_3.wav) word lengths. In these examples a fairly crude spectral reshaping of the dithering noise has been used that puts more energy at the high end, reducing the amount of noise in the mid- and low-frequency regions.

The audible effect is to make the hiss much 'hissier', but as the piano has few high harmonics it also makes the piano music more audible within the hiss. This is particularly apparent in the 3-bit example, in which the original quality of the piano recording, and its subtle decay of notes, is clearly audible, despite still being only a 3-bit recording!

This demonstrates that in a correctly dithered digital system the wanted audio can fade down into a constant and smooth noise floor without losing 'resolution' and without gaining distortion artifacts. In fact, it works in exactly the same way as fading a signal down in an analogue system.

The only difference that the word length of a digital system actually makes is in defining the level at which that noise floor sits: around -93dBFS in a 16-bit converter, and around -120dBFS in a 20- or 24-bit system. In theory, a 24-bit system should have a dithered noise floor at around -140dBFS, but few systems are that good. This isn't because of limitations with the digital side of things, but because of the inherent noise of analogue electronics: analogue really is the weak link in the chain at this level of performance.

Fixed Point Versus Floating Point

A question I'm often asked is why different DAW applications seem to produce different sound quality, even when mixing signals that have no EQ and no plug-ins applied. Obviously, if the input or output level is allowed to clip you'll hear problems, but if the levels are kept below clipping, why the difference? In part, this may be due to the difference between fixed- and floating-point mathematics — the two systems used in digital mixing.

Most real-world maths follows the fixed-point system, where the figure immediately to the left of a decimal point represents units and the one two places to the left represents tens. Similarly, the number immediately to the right of the point represents tenths, and so on. The important detail is that the decimal point never changes position — hence the term 'fixed point'. These examples use the base 10 numbering system but binary maths can work in exactly the same way, where the DAW's mixer may use upwards of 32 bits to allow a practical amount of headroom when summing several channels of 24-bit signals. Indeed, 48 or 56 bits is not uncommon, and is often referred to as 'double precision' working.

The other system uses what is known as floating-point maths, so instead of having one very long number representing large values, we use a smaller number teamed with a multiplier. In the case of decimal numbers, we'd have one more manageably sized number for the value and another saying how many noughts should be added to the end of it to create the required result. Again, there is a binary equivalent.

Both systems work fine until you run out of numerical headroom (or footroom) or, in other words, when a calculation's result exceeds the system's ability to express that number. However, the significance of quantisation error is different between the two systems. Using fixed-point maths, larger values benefit from a larger signal-to-noise ratio, so the percentage error decreases right up to the point of clipping, above which all bets are off! Very low-level signals, on the other hand, are always closer to the dither noise floor.

With floating-point maths, the first number (the mantissa) constantly moves from large to small as signal levels vary, then the exponent (the number of following noughts in a base 10 system) changes and the first number may flip from a very small value to a very large one and vice versa. In practice this means that the signal-to-noise ratio and/or quantisation error is constantly changing, unlike the fixed-point system, where big numbers are always the most accurately expressed and small numbers always the least.

However, modern systems should have enough resolution that either system will deliver good results, and the floating-point system has the advantage of vast amounts of headroom. Having said that, the people who write the software may use various mathematical shortcuts to lessen CPU load, so the noise and distortion figures that can be obtained in theory may not always pertain in practice. In my experience with my own floating-point DAW, the subjective quality of the output is always better if the individual track levels are set to peak at around -12dB, rather than -1 or -2dB, even though you can set the output fader to avoid numerical clipping in either case. Furthermore, the more tracks you're mixing, the more safety headroom you need to leave, so in this respect mixing digital audio isn't that different from mixing analogue audio. The bottom line, then, is that the sound of your DAW may be more to do with what the programmers aren't telling you about their method, than the difference between floating- and fixed-point maths. Paul White

Headroom

This brings me neatly on to the subject of headroom. In the good old days of hand-crafted studio furniture and analogue audio, we used to work with a fairly generous amount of headroom above the nominal system alignment level (typically +4dBu or 0VU), and the provision of headroom has been taken so completely for granted that none of our standardised analogue metering systems even bother to show it! Possibly because of this, few younger sound recording engineers seem to be aware of the provision of headroom, let alone why it is there.

Typically, if the reference level of an analogue desk or recorder's metering is calibrated to +4dBu, say, the system can cope quite happily with transient peaks well above that. Commonly, 18dB of headroom is provided in the majority of professional systems, which means they can accommodate peak levels of +22dBu or more before clipping or suffering excessive distortion.

So for much of the time the average signal levels would probably be between 12 and 20dB below the system's true clipping point — which is something we've never been concerned about in the analogue world (not least, I suspect, because the traditional analogue meters don't show us how much headroom we aren't using!).

Even with such an apparently generous amount of headroom, the average level is still at least 90dB above the console's noise floor, and we rarely have to worry about poor signal-to-noise ratios. The total dynamic range available in an analogue console is around 115dB — the difference between the noise floor at -90dBu or so, and clipping at around +22dBu. OK, analogue tape recorders would struggle to match that, but with modern noise-reduction systems it is possible to achieve a dynamic range of 90dB or more without too much trouble.

Guess what? The best of the early 16-bit systems matched the dynamic range of analogue tape a long time ago, and modern 24-bit digital systems generally exceed the dynamic range of even the best analogue consoles. Even budget converters now routinely achieve a dynamic range of 112dB or more, and easily match good-quality analogue systems in that regard.

So, given that 24-bit digital systems enjoy a similar dynamic range to good analogue systems, we can (and, I would argue, should) adopt the same kind of working levels and headroom margins. These practices were developed for good engineering and operationally pragmatic reasons, and those same arguments still apply — which makes me wonder why so many people still insist on peaking original digital recordings to within a gnat's whisker of 0dBFS. I'm not talking about mastering to 0dBFS (an entirely separate discussion) but making original, live recordings in the studio or on location.

Perhaps the widespread misunderstandings of how sampling, quantisation and dithering works, combined with some very specific (but now seriously outdated) working practices associated with early 16-bit digital systems, have led many people into thinking headroom is a bad thing, and that signal levels should be peaked as close to 0dBFS as possible. In fact, quite the reverse is true. A lack of headroom is a very bad thing as far as recording and mixing is concerned: it makes recording fraught with worry about accidental clipping, and mixing a nightmare of poor gain structure and non-optimal signal processing.

By adopting similar gain structuring and signal levels to traditional analogue techniques, working with digital equipment becomes just as easy, and, to my ears, sounds at least as good and probably better. If you routinely allow something like 12dB of headroom above normal signal peaks, you won't have to worry about accidental overloads on brief transients.

Equally, you won't have to worry about poor signal-to-noise ratios (or 'lost resolution') because the lovely smooth dithered noise floor is still something like 100dB below your peaks and probably 70dB or more below even the pianissimo parts. In 99 percent of cases the digital system's noise floor isn't the limitation anyway; the ambient noise floor of your recording room will be considerably higher.

Furthermore, when you come to mix a hard drive full of tracks with average levels of around -20dBFS, they will sum to something under 0dBFS. You won't have to pull down the master fader to remove the output converter overloads, and the plug-ins won't have to calculate values in excess of 0dBFS. The result will often be a smoother, more analogue-like sound, and there will also be a lot less operational hassle.

Peak Practice

I believe a large part of the reason people work with minimal headroom is because of the almost universal use (including in modern DAW software) of digital meters scaled to 0dBFS. This approach might have been appropriate for early 16-bit systems, but it isn't now. There's a reason why high-end professional digital consoles are increasingly being fitted with traditional analogue-style meters. These no longer routinely show the entire scale to 0dBFS, and the operators are working with traditional analogue headroom margins without any problems at all.

With modern 24-bit digital systems, you can afford to leave the same amount of headroom as you did by default in analogue, and taking an analogue approach to metering, using VU or PPM meters, can help avoid unnecessary worries about clipping. Though many sequencers don't have this facility built in, software such as PSP Audioware's freeware Vintage Meter offers a solution.With modern 24-bit digital systems, you can afford to leave the same amount of headroom as you did by default in analogue, and taking an analogue approach to metering, using VU or PPM meters, can help avoid unnecessary worries about clipping. Though many sequencers don't have this facility built in, software such as PSP Audioware's freeware Vintage Meter offers a solution.I tend to work with a DK-Technologies MSD600M++ meter, which can be switched to show several different scales and calibrations. I normally use a standard IEC Type II PPM (the type of meter used throughout the UK broadcast industry) with the BBC 1-7 scale. The default calibration sets the reference mark (PPM4) to -18dBFS on steady tone, and I try to contain signal peaks to PPM6 (nominally 8dB higher). However, because the Type II PPM is deliberately slugged to ignore fast transient peaks, a true peak-reading meter would show transient peaks not of -10dBFS (8dB above -18dBFS) but probably something more like -6dBFS. Am I worried by that? Of course not — that's what the headroom margin is there for.

The same kind of meter zero to digital headroom relationship applies to those who prefer to use VU meters, too, and a lot of skilled engineers are working in the digital domain with VU meters without any problems at all, because they have adopted sensible, well-proven headroom margins.

The operational advantage of using analogue-style meter scale and headroom margins is that we can use meters that are more familiar and generally easier on the eye, that allow us to concentrate more on what we are trying to do creatively than on whether or not we are about to clip the system.

Mythbusters

Hopefully, this article has helped to straighten out some of those long-held myths and confusions about digital audio, and maybe even encouraged some of you to look upon using digital audio equipment in a new light and with new working practices. Properly engineered digital audio does work, and matches or exceeds the performance of analogue audio in many technical areas. It sounds different in some ways, certainly, which leads to a personal and aesthetic choice of medium. But the arguments about which is better don't stand up on their own any more. You have to ask better for what? Digital is a clear winner in some areas and analogue in others. Of course, there's nothing to stop you mixing and matching the technologies, to have the best of both worlds...