Distortion
There are two types of distortion, which can be described as 'linear' and 'non-linear'. I've already discussed linear distortion, which manifests as deviations in the frequency response (magnitude and phase) away from ideal flat lines. Non-linear distortion occurs when any sound that was not present in the original signal is heard on playback. Ideally for a loudspeaker, there should be none. But all loudspeakers have distortion, and any marketing claim to the contrary is, to use the polite form, a 'terminological inexactitude'. Of course, some loudspeakers distort more than others, and all distort in different ways that depend on the design. We can look at non-linear distortion in two ways.
Harmonic Distortion
First, we'll consider harmonic distortion, where the content that's added is related to the original signal. Second-order harmonic distortion is sound added at twice the stimulation frequency. For example we play 1kHz into the loudspeaker, and we listen for a 2kHz signal that should not be there. Similarly, third-order harmonic distortion is unwanted sound at three times the stimulation frequency, in this case at 3kHz. (You can work out how fourth, fifth and further harmonic distortion work.) If you add up all of these harmonics, you get the 'total harmonic distortion', or 'THD'. Usually, in speaker development, one tries to minimise the second- and third-order harmonic distortions, and the higher orders and the THD as a whole will then decrease accordingly. Distortion can be expressed as a dB level below the stimulus signal, or as a percentage of the signal. It's easy to swap between them quickly using this simple pattern: 0dB = 100 percent, -10dB = 30 percent, -20dB = 10 percent, -30dB = 3 percent, -40dB = 1 percent. And so on.
But while harmonic distortion is conceptually easy to measure — and helpful in diagnosing problems during development of the loudspeaker — a single THD figure cannot tell us what the loudspeaker will actually sound like. Third-order harmonic distortion, for example, always sounds bad. (It's heard when a sine wave clips in an amplifier, and we all know how bad that sounds!) Second-order harmonic distortion comes from non-linearities in the system — for example, the 'air springs' on either side of the bass driver being different (the air on the inside of the cabinet is stiffer than the open air outside, and more so in a sealed cabinet than a vented one) — and it sounds nice. It brings 'warmth' and 'fullness' to the sound.
 Graph 5: Harmonic distortion at 90dB SPL. THD (green), second (blue), third (red). Ideally all distortion should be eliminated (green arrows).
Graph 5: Harmonic distortion at 90dB SPL. THD (green), second (blue), third (red). Ideally all distortion should be eliminated (green arrows). Graph 6: Harmonic distortion at 95dB SPL. THD (green), second (blue), third (red). Ideally all distortion should be eliminated (green arrows).Consider, then, two imaginary speakers, whose published specifications quote the same THD figure. The first has a high level of second-order harmonic distortion, and low level of third–order harmonic, while the second one has a high level of third-order harmonic distortion and a low level of second-order harmonic. Despite them having the same published distortion figure, the first will probably sound quite nice, and the second will most likely sound unpleasant.
Graph 6: Harmonic distortion at 95dB SPL. THD (green), second (blue), third (red). Ideally all distortion should be eliminated (green arrows).Consider, then, two imaginary speakers, whose published specifications quote the same THD figure. The first has a high level of second-order harmonic distortion, and low level of third–order harmonic, while the second one has a high level of third-order harmonic distortion and a low level of second-order harmonic. Despite them having the same published distortion figure, the first will probably sound quite nice, and the second will most likely sound unpleasant.
So a single THD figure isn't really all that helpful in defining what makes a good loudspeaker. Like most things in acoustics, harmonic distortion is frequency-dependent. It also changes significantly with signal level. This is especially true as the loudspeaker approaches its limits, which usually happens at low frequencies first. So we need to know the level of the loudspeaker output when quoting the numbers. Also, what is the test signal? Usually sine waves are used for testing purposes but who listens to them for entertainment?
But for all that, remember that a studio monitor is a measurement tool — ideally, we don't want it to add anything, so no matter how 'nice' the distortion may seem, it's unwelcome in a studio monitor.
Intermodulation Distortion
Intermodulation distortion is the sound of your FM radio not quite tuned in properly. It does not sound good, and clearly it has no place in studio monitoring. Imagine leaning on 30 notes of a church organ: that's akin to the stimulating signal used to measure intermodulation distortion. Each note modulates the others, creating new frequency information not in the original signals. There should be nothing there but, of course, there always is. This distortion always sounds bad, but it gets worse with higher signal levels, and in a studio monitor it must be minimised.
Unfortunately, no speaker manufacturer quotes intermodulation distortion figures because there are no agreed standard ways to measure it (what level, how many modulations, and so on). There is no standard single number that can be put on it, and the market would not understand how to interpret the graphs or figures even if they were published.
How Loud?
Another specification that's often poorly described by marketing departments is the maximum sound pressure level, or SPL. It's another case where a single number is often quoted but is inadequate because the situation is somewhat complex. Again, there are few standards that everyone follows. Here are some real examples:
Max SPL = 110dB
Max SPL = 110dB at 1m
Max SPL (100Hz-6kHz) = 110-114 dB at 1m
The first figure tells us nothing, and the second tells us nothing at a distance of 1m! The third is at least a bit more informative, but it's actually not very useful either — to truly understand what the number means, we need to know the input signal, the acoustical conditions and the measurement method. Even so, this parameter is (again) strongly frequency dependent, so the original data is important.
 Graph 7: Maximum SPL at 1m for 1 percent (red) and 3 percent (blue) THD. Ideally the SPL for a certain distortion level should be maximised (green arrows).For example, in a compact three-way sealed cabinet, the maximum output below 500Hz will naturally be somewhat lower than at mid-range frequencies. If this is a problem for your studio (for instance, if you have a larger room with a longer listening distance) and the material you produce (such as hip-hop, or action films), a larger loudspeaker and/or subwoofer(s) are needed. Conversely, if you're working on folk music consisting of an acoustic guitar and a vocal, this product could be a very good choice on its own.
Graph 7: Maximum SPL at 1m for 1 percent (red) and 3 percent (blue) THD. Ideally the SPL for a certain distortion level should be maximised (green arrows).For example, in a compact three-way sealed cabinet, the maximum output below 500Hz will naturally be somewhat lower than at mid-range frequencies. If this is a problem for your studio (for instance, if you have a larger room with a longer listening distance) and the material you produce (such as hip-hop, or action films), a larger loudspeaker and/or subwoofer(s) are needed. Conversely, if you're working on folk music consisting of an acoustic guitar and a vocal, this product could be a very good choice on its own.
Sometimes single values can be appropriate. For example, Dolby use a standard method of pink noise as the signal and a sound level meter set to C-weighted and slow, located at the listening position. The loudspeaker(s) connected to a channel must be able to reproduce this signal to at least 85dB for at least one minute. This is a fully described and clear requirement that can be used to compare any speaker with any other.
Of course, just because a speaker can play loud does not mean it has to be played loud. A speaker that can play loud but is not being played loud is likely reproducing the sound with lower distortion. If you can't see why, think of two cars, one with a larger engine than the other; even if the smaller-engined car can travel as fast as the other, the larger–engined one works less hard and gives a smoother, quieter ride at the same speed.
One manufacturer has managed to measure, for one loudspeaker model, 12 different SPL values ranging over 23dB; the values are strongly dependant on the measurement signal and measurement conditions. This demonstrates that a single SPL value is useless if the conditions are not specified as it means that valid specification comparison between products is impossible.
Directivity/Dispersion
Not all the sound generated by the loudspeaker travels straight out towards the listening position. Most of it travels in other directions, bounces off surfaces in the room and might then, eventually (later, because it has travelled further), reach the listening position. These indirect sounds will not sound the same as the direct sound, due to the effect of room acoustics. This in turn greatly affects the perceived sound quality.
For example, take a room with all the walls covered with thin damping material — not so uncommon, as it is easy to do with off-the-shelf foam panels. The higher frequencies will be very well absorbed and the lower frequencies will not. Therefore, the indirect sound will have overly attenuated high frequencies and a 'dull' tonal quality. Adding this into the direct sound, even if that has a perfectly flat response, is likely to lead to a lifeless and boring overall sound quality.

 Graph 8: A dispersion plot. The green line is an idealised shape, and the blue dotted line is commonly seen in loudspeakers without waveguides.If the directivity is well controlled, the effect of this tonally different indirect sound can be reduced (though never completely eliminated). The consequence is a loudspeaker that's easier to install into a variety of acoustical conditions whilst still achieving a good result. Those different acoustical conditions could be in the same room (for example, the rear ceiling loudspeaker of a 3D system is not in the same acoustical conditions as the front left loudspeaker) or in different rooms (important for multi-room facilities or when moving a project from one room to another).
Graph 8: A dispersion plot. The green line is an idealised shape, and the blue dotted line is commonly seen in loudspeakers without waveguides.If the directivity is well controlled, the effect of this tonally different indirect sound can be reduced (though never completely eliminated). The consequence is a loudspeaker that's easier to install into a variety of acoustical conditions whilst still achieving a good result. Those different acoustical conditions could be in the same room (for example, the rear ceiling loudspeaker of a 3D system is not in the same acoustical conditions as the front left loudspeaker) or in different rooms (important for multi-room facilities or when moving a project from one room to another).
A dispersion plot displays how the sound spreads out from the loudspeaker in three dimensions. The example shown in Graph 8 is the horizontal dispersion of a larger three–way loudspeaker. Vertically, the plot will look different if the loudspeaker's physical shape is different in that plane, which almost all are. Ideally, the dispersion should be smooth and controlled at mid-range to higher frequencies (the flat part of the green line) and it will usually widen towards lower frequencies (the sloped part of the green line). If one sees dispersion like the blue line (common in loudspeakers without waveguides), the loudspeaker will sound different in acoustically different spaces, which leads to poor consistency and poor translation. It's possible that a loudspeaker like this might sound OK in one room but less so in another, acoustically different space. For this reason, the customer takes a risk buying products like this — it's likely the reason why one commonly sees comments in forums such as "but you have to listen to them in your own room to be certain".
 Graph 9: An example of a polar plot, showing speaker directivity at different frequencies.Another way to view directivity is by using a polar plot. This is visually easier to understand immediately, but it conveys less information — plotting all the frequencies is not practically possible on one diagram. In the example shown in Graph 9, which only has three octave bands, you can see how this loudspeaker has a cardioid response at low frequencies to reduce the effect of the reflection off the front wall when the loudspeaker is positioned away from that wall (a common placement for a midfield monitor in larger control rooms). As seen in Graph 8, loudspeakers are usually omnidirectional at these low frequencies, and this would be shown as a circle on a polar plot.
Graph 9: An example of a polar plot, showing speaker directivity at different frequencies.Another way to view directivity is by using a polar plot. This is visually easier to understand immediately, but it conveys less information — plotting all the frequencies is not practically possible on one diagram. In the example shown in Graph 9, which only has three octave bands, you can see how this loudspeaker has a cardioid response at low frequencies to reduce the effect of the reflection off the front wall when the loudspeaker is positioned away from that wall (a common placement for a midfield monitor in larger control rooms). As seen in Graph 8, loudspeakers are usually omnidirectional at these low frequencies, and this would be shown as a circle on a polar plot.
 Graph 10: An example of a directivity plot. The green line describes an idealised shape.
Graph 10: An example of a directivity plot. The green line describes an idealised shape. Graph 11: A low-frequency dispersion plot of Genelec's new subwoofer design. Frequencies as low as 80Hz are highly directional — but this is not the norm!Another method of displaying directivity data (Graph 10) is using the directivity index (DI). A good result is a smooth and gradual increase towards higher frequencies, with a plateau above the crossover frequency. Very recently Genelec launched a subwoofer that extends the controlled directivity response all the way down to about 80Hz, and this can be seen in the more colourful plot in Graph 11. However, this novel technology comes with a significant cost and is likely to be out of reach for many SOS readers.
Graph 11: A low-frequency dispersion plot of Genelec's new subwoofer design. Frequencies as low as 80Hz are highly directional — but this is not the norm!Another method of displaying directivity data (Graph 10) is using the directivity index (DI). A good result is a smooth and gradual increase towards higher frequencies, with a plateau above the crossover frequency. Very recently Genelec launched a subwoofer that extends the controlled directivity response all the way down to about 80Hz, and this can be seen in the more colourful plot in Graph 11. However, this novel technology comes with a significant cost and is likely to be out of reach for many SOS readers.
Waterfalls
In waterfall plots, we collect together a bunch of frequency-response curves that are later and later in time, and then plot them all on one 3D graph, with the curves at the back being earlier in time. Any downward ridges extending from the back towards the front of the graph (red lines in Graph 12) indicate a resonance. Ideally there should be no resonances but, as acoustics is not black and white, one can ask this question: how short does a resonance need to be before it is inaudible?
 Graph 12: A waterfall plot. The superimposed red lines would describe a strong mid-range resonance.Luckily, there are some numbers for this in reference paper 2 (see the 'References' box later). Mid-range resonances usually come from standing waves inside the cabinet, or from undamped resonances in the port(s). It would sound like one note of a piano being louder than the others despite being played at the same level. In the example shown in Graph 12 there are no audible mid-range resonances. Loudspeakers are effectively a band-pass filter, with some (more if vented) significant resonant behaviour around the low-frequency corner. A sealed cabinet is less resonant in the bass region, as can be seen in this well-behaved waterfall plot. Again, it's possible to mask this resonance by making the system linear-phase, but at the cost of significantly increased latency (many tens of milliseconds).
Graph 12: A waterfall plot. The superimposed red lines would describe a strong mid-range resonance.Luckily, there are some numbers for this in reference paper 2 (see the 'References' box later). Mid-range resonances usually come from standing waves inside the cabinet, or from undamped resonances in the port(s). It would sound like one note of a piano being louder than the others despite being played at the same level. In the example shown in Graph 12 there are no audible mid-range resonances. Loudspeakers are effectively a band-pass filter, with some (more if vented) significant resonant behaviour around the low-frequency corner. A sealed cabinet is less resonant in the bass region, as can be seen in this well-behaved waterfall plot. Again, it's possible to mask this resonance by making the system linear-phase, but at the cost of significantly increased latency (many tens of milliseconds).
Additional Requirements
There are a number of other considerations, which are important but require a less detailed explanation — so I've gathered them together in this next section.
- There should be no rattles or rubbing sounds (it should be self-evident why this is undesirable!). These 'irregular distortions' are hard to measure automatically; it's easier to listen for them by ear using a sine wave to stimulate them into action. They occur at specific frequencies and sound like buzzing. You can also try physically shaking the cabinet and tapping around the enclosure with your fingers whilst listening for 'interesting' noises — there should be none.
 
- There should be no audible self–generated noise (0dB A-weighted) at the listening position; there will be some noise at the drivers, but one does not usually listen at that distance! Note that this adds up with every loudspeaker in the room, so a 3D audio system installed in a very small, quiet room could easily have annoying amounts of self-generated noise at the listening position. Note also that often the source equipment is what's actually generating the noise, and the loudspeaker is just doing its job reproducing that 'input signal'. You can check if it's the speaker by unplugging the input connector whilst listening for a change in the noise level at the front. Usually the noise has a white spectrum (a gentle hissing sound) but there can be noise leakage from the power supply into the rest of the electronics, causing a 'crunchy' sound to be heard. This is more annoying and obviously not desirable.
 
- A studio monitor should have a high maximum input level. Most loudspeakers made today can accept the maximum output of most audio interfaces, which is typically 18-19 dBu, but not all can accept the higher signal levels output by the loudest, or from larger desks, which can be as much as 24dBu.
 
- The design should include limiters to protect the driver from accidental overload and ensure long-term reliability. The more sophisticated the limiter, the more one will be able to squeeze out of the drivers before one of them breaks. DSP lookahead limiters can be much more complex than reactionary analogue ones, and as they are very fast-acting they can guarantee that no driver overload occurs.
 
- If multiple model types from the same manufacturer are used in a multi-channel system (maybe smaller ones at the back of the room) or in multiple rooms, they should have the same sound character. An oft-overlooked aspect in this regard is the phase response, which needs to be the same for each speaker — otherwise localisation of images when panning becomes unreliable. The easiest ways to ensure phase consistency across different models are for them all to have a linear phase response (no manufacturer currently offers this) or for them all to be exactly the same speaker model (which is often not practical due to the size of the room).
 
- A speaker should have an adjustable response, to allow compensation for the acoustical conditions in which each loudspeaker finds itself. While possible, it's unusual that the anechoic response is the right one for the installed loudspeaker — one should expect to have to adjust the controls in some way. In-situ measurements are recommended to ensure that better adjustment choices are made.
 
- Some manufacturers have a full range of mounting hardware for their products but most don't, in which case you must find or make your own solution, which takes time and costs money. Like the loudspeakers, mounting hardware shouldn't rattle.
 
- The speakers should be physically, electronically and acoustically robust, for a long, trouble-free lifetime.
Other numbers like cabinet and driver sizes, weight, crossover frequencies, amplifier power, power consumption, input impedance, sensitivity and so on are all secondary considerations to what I've discussed above.
Eight Crucial Questions
Summing up (in a literary rather than a mathematical or mixing sense!), there are many measurements which can be used to describe various aspects of the performance of a loudspeaker. Looking at only one or some of them will give only a partial account of a speaker's objective quality. Only when all of them are viewed together can you see a complete picture of the overall performance of a speaker. And if there are no specs at all, the customer is completely in the dark about what they are buying. To recap, to judge the suitability of a loudspeaker for studio monitoring from its specifications, we need to answer the following questions:
1. Frequency response — how low does it go, and how flat is the line?
2. Phase response — how flat is it?
3. Group delay — how flat is it, and is the latency acceptable?
4. Harmonic distortion — how was it measured and how low is it?
5. Intermodulation distortion — how was it measured and how low is it?
6. Maximum SPL — how was it measured and how high is it?
7. Directivity — how immune does this make the loudspeaker to the influence of room acoustics?
8. Waterfall plot — does this reveal any audible resonances?
Demand Answers!
When considering the purchase of new speakers for your studio, try hard to find answers to all these questions — whatever budget you have available, this will help you make a good choice. Note that if you are to answer most of them properly, graphs are a necessity; beware single-number values for frequency-dependent properties, especially if the 'small print' (defining the input signal type, level, acoustical conditions and so on) is missing — in which case ask more questions of your supplier or manufacturer.
If studio monitor manufacturers refuse to publish or supply the information you need, it can only really be for a few reasons.
- First, the design engineers don't know how to measure these things — this would be a very worrying situation, because if they cannot measure properly then how can they make a good design?
- Second, perhaps the design engineers can measure, but are embarrassed by the results or don't know how to interpret them and fix the problems; either way, they're hiding the information.
- Third, the marketing department cannot be persuaded to publish the results because they do not understand that it's important for this market to display the full performance of these measurement instruments when trying to evaluate sound — in which case why not ask them to supply the information?
I'll end, then, by urging you one last time to demand more information, so you can make informed purchasing decisions. Once you have a technically objective excellent loudspeaker — that you've installed properly (including calibration) into a well-treated room — you'll be able to make artistic decisions much more reliably. Which is what it's all about!
The author has worked for Genelec and Klein+Hummel/Neumann for a combined total of 22 years. At the time of writing he was working on a freelance basis, and the views expressed here are his own — and can hopefully be seen to be unbiased!
Subwoofers
Despite subwoofers being optimised for low-frequency reproduction, they're still loudspeakers, and most of what I've discussed in the main text applies to them. Because subwoofers operate at lower frequencies than other loudspeakers, though, the group-delay increase will be higher due to the lower low-frequency cutoff. Unless there's a design disaster, this is unlikely to be audible — but as research in this is lacking we cannot be certain.
Unless the subwoofer is very large and not working hard, distortion will be much higher than from the loudspeaker. In some cases it can be higher even than the original sound (>0dB or >100 percent), and a better name for such a product would be a 'distortion generator'! Published distortion graphs would be welcome to enable comparison but, sadly, these are often lacking.
If the subwoofer has bass management, so that the loudspeakers are rolled off at a higher frequency (typically 80-100Hz), the loudspeakers can be expected to have a reduction in distortion and be capable of louder playback, because the bass driver does not have to work so hard. However, this filtering increases group delay around the crossover frequency which can be audible in good-quality rooms — the bass is less 'tight' than before. If the loudspeakers are played full-range and the subwoofer spliced on to the bottom end of the loudspeaker's natural response (not a trivial in-room tuning task!), these distortion and maximum SPL benefits will not be gained, but the loudspeaker's group delay will not be increased. Pick your poison — there is no free lunch here!
Finally, the deeper bass response that subwoofers and larger loudspeakers have demands better low-frequency acoustical treatment in the room. This is a principal reason why home studios often do not benefit from adding subwoofers — they serve to 'excite the parts others cannot reach', serving to do little other than make poorly treated or untreated room problems audible. DSP subwoofers are becoming more prevalent, and they can contain much more filtering than analogue ones. This allows them to be tuned very exactly to the room, potentially resulting in a very good sound quality. However, in-situ acoustical measurements are necessary; doing this by ear is impossible.
References
The 'reference papers' mentioned in the main text are:
1. Andrew Goldberg, Quantifying Consistency In Loudspeaker System Production, Proc. of 142th AES Conv., Berlin, Germany, preprint 9713, (2017);
2. Bruno Fazenda, M. Stephenson and A. Goldberg, Perceptual Thresholds For The Effects Of Room Modes As A Function Of Modal Decay, J. Acoust. Soc. Am. 13, No. 3, pp. 1088-1098, (2015).
The graphs that appear in this article can be found on the manufacturers' websites, and are reprinted with their kind permission (they retain copyright): Georg Neumann GmbH (www.neumann.com); PSI Audio (www.psiaudio.swiss); Musikelectronic Geithain GmbH (www.me-geithain.de); JBL Professional by HARMAN (www.jblpro.com); Genelec Oy (www.genelec.com).
