Monitors are better than they've ever been — but they're still the weakest link in most studios. We trace the technology from its roots in the 1930s, and find out whether there's a DSP revolution on the way.
Far be it from me to deploy a cliché in the first sentence, but music recording technology has come a long way in the last decade or two. Who'd have thought, 20 years ago, that in 2019 we'd be able to create complex music on a small handheld touchscreen device, share it instantly with somebody on the other side of the world and then call them — with video — from the same device to enthuse over just how cool we both are? It is truly extraordinary, and if I think back to when I first started making and recording music with friends back in the early '80s, what we have now would have fallen then into Arthur C Clarke's 'magic' category (see Note 1).
Can you tell where this is going? Despite the 'magic' we now take for granted, there's one piece of recording hardware that, in comparison to all the DAW, network, and plug-in malarkey, often looks as if it has spent the last 50 years standing still, never mind the last few years. I'm talking about monitor speakers, of course. In the context of extraordinary change elsewhere, doesn't it strike you as odd that the monitor that's still the one most often seen in commercial studios, the Yamaha NS10, was designed in 1978 — well before I first slipped an extra high-output chrome cassette tape into my Fostex 250 multitracker? So, having been asked initially to write about "developments in studio monitor technology", I'm going to begin by explaining why that phrase, these digital days, sometimes looks as if it might be a contradiction in terms.
If you consider a music production 'workflow', the elements between microphone (see Note 2) and monitor really do nothing more than process, store and transfer information. This was true even back in the analogue days. Developments in electronic data processing and storage since perhaps the 1980s, but at an extraordinary rate in the more recent past, have simply increased our ability to store and process musical information. The hardware at either end of the workflow, however — monitors and microphones — are different. They don't store information, and if they process information at all, they do so only done as an adjunct to their primary role of transduction. Microphones take information in the acoustic domain and translate it, via the electro-mechanical domain, into the electronic; monitor speakers do the opposite. And thanks to making a living simultaneously in three domains, monitors and microphones are subject to three sub-sets of the laws of physics...
Now, there's still likely to be a long way to go in electronic data processing and storage before the fundamental physical laws of that domain call a halt. We've become blasé about technological leaps of an order of magnitude every few years and the pace shows no sign of slowing. In the electro-mechanical and acoustic domains where monitors live, however, we've been pretty much toe to toe with the laws of physics since Rice and Kellogg patented the moving-coil speaker in 1925.
It's the density of air relative to the density of viable diaphragm materials that is the fundamental limiting factor on a monitor's moving-coil driver technology. When you move a driver diaphragm backwards and forwards in the hope of creating an acoustic wave, there's an enormous impedance mismatch between the diaphragm and the air, so very little of the diaphragm energy makes it into the acoustic domain. This is why a typical moving-coil driver is typically just a couple of percent efficient — it's for the same reason that doing breast stroke in air doesn't move you forward quite so well as it does in water!
Imagine a diaphragm made from a material with the same density as air — air itself, for instance (see Note 3). It would couple to the air around it almost perfectly, and energy transfer would be close to 100 percent. We can't use diaphragms made of air, of course, because along with being light, a diaphragm has to retain its shape when vigorously accelerated, so it has to be rigid too. And the lightest viably rigid diaphragm material? Well it's probably still the one Rice and Kellogg used: paper. (Or, as I've seen it described in a speaker brochure, "natural cellulose-based fibre composite"!) So, while in the domain of musical data processing and storage, technology races ahead at a giddy rate, in the electro-mechanical-acoustic domain, unless somebody makes an order-of-magnitude breakthrough in materials science (in which case they'd be well advised to apply their discovery in a few other fields before bothering with studio monitors!), moving-coil monitors are, I suspect, destined to remain pretty much 'as is' for a while yet.
However, even the laws of physics allow for a bit of wriggle room, and there's no shortage of engineers striving to find innovative ways to exploit it. The wriggle room arises, ironically, thanks to monitors straddling those three domains, because this multiplies the number of degrees of freedom within which monitor designers can wriggle. Almost every month in SOS, a nearfield monitor review provides evidence of monitor designers wriggling and coming up with something slightly different in terms of the way the basic electro-acoustics are addressed.
For example, at around the same time I began writing what would become this article, I was listening to Kii Audio's remarkable Three monitors (reviewed in SOS January 2017: www.soundonsound.com/reviews/kii-audio-three). In terms of their drivers and even, despite appearances, their enclosure, the Kii monitors break little truly new ground. They are unusual, however, in the way contemporary DSP techniques have been employed to control dispersion, and in the way their mid-range amplifiers have been configured to enhance the control they have over the driver diaphragms through feedback of voice-coil current. Having said that, while the concept of DSP-based dispersion control is a recent one, that of diaphragm feedback and correction is not: Philips launched a range of active hi‑fi speakers in the 1980s featuring 'motional feedback'. The Kii Three also incorporates compensation to correct the group delays inherent in the crossover filtering, and similar delay compensation is claimed by PSI to endow their monitors with more linear time-domain characteristics than is typical of moving-coil speakers. However, as with any speaker incorporating multiple non-coincident drivers, time-domain characteristics are in practice significantly influenced by the drivers themselves and their differing path lengths to the listener's ears. So I wonder if the unusually neat looking square wave Illustrated on PSI's promotional material as evidence of linear time-domain characteristics might look rather less neat if the measuring microphone were moved a little...
The PSI monitors represent something of a speaker-technology curate's egg, in that they're advanced in some respects, yet traditional in others. Thanks to the multi-disciplinary nature of speaker design, such 'curate's eggery' isn't unusual. Speaker manufacturers, in the grand scheme of things, tend to be relatively small organisations, with limited development resources. So innovating simultaneously across more than one discipline is tough (especially tough when a Sales Director with nothing new to sell at the next trade show strides purposefully into the R&D office. I speak from experience!).
As if conveniently to illustrate my point about the multi-disciplinary nature of speaker design, a product launched a few years ago by sE Electronics but later coming under the Munro Sonic brand, offered something in complete contrast to the PSI or Kii approach. Andy Munro, the designer, chose to wriggle in a different domain. It can't have been too difficult to come up with a name for the Egg — it's form is unmistakably ovoid. The shape arises thanks to one of the acoustic fundamentals of a diaphragm radiating from an enclosure: the shape of the enclosure affects the acoustic radiation. American audio engineering pioneer Harry F Olsen published a paper in 1969 describing the influence of enclosure shape on acoustic radiation. He showed that rectilinear enclosures have the greatest influence and spherical enclosures have the least. The effect arises due to the diaphragm radiation, at frequencies where the wavelength is less than the dimensions of the enclosure, diffracting and re-radiating from the enclosure edges. Thanks to the physical distance between the diaphragm and enclosure edges, the re-radiation happens a short time after the direct radiation so the two interfere and frequency-response anomalies result. A spherical, or egg‑shaped, enclosure has no edges so is all but immune to the phenomenon.
So why, you might reasonably ask, are all monitor enclosures not egg-shaped (see Note 4)? One answer illustrates a fundamental law of electro-acoustics; a law that comes into play when an enclosure is used to contain the rear radiation of a driver and stop it destructively interfering with the forward radiation: for a given driver there's a fixed relationship between low-frequency bandwidth, enclosure internal volume, and baseline efficiency (electrical power in compared with acoustic power out). Reduce the enclosure volume and either the efficiency must fall or the low-frequency cutoff must rise. For the same height, depth and width, an egg-shaped enclosure can't help but enclose much less air than a rectilinear enclosure, so if, as a speaker designer, you take the egg-shaped road, you have to accept either that your speaker will require significantly more powerful amplification and drivers that are better engineered to resist thermo-mechanical stress and compression — which is expensive — or that your speaker's low-frequency bandwidth will be restricted.
To illustrate the kind of calculation that Andy Munro perhaps slept on for a night or two, if we consider a conventional rectangular enclosure of Egg-like basic dimensions and with, say, 15mm-thick panels, we get an internal volume of 22 litres. According to its published specification, however, the Egg encloses just 14 litres. By my calculations, and all other things being equal, this difference in enclosed volume would typically result in a speaker that is either approximately 2.5dB less efficient for the same low-frequency cutoff, or has a low-frequency cutoff around 10Hz higher for the same efficiency (or a compromise between the two). Now, 2.5dB or 10Hz may not sound like much, but remember, firstly, a 2.5dB loss in efficiency means that the driving amplifier needs to be nearly twice as powerful (a 3dB increase in acoustic volume requires a doubling of power), and secondly, at the frequencies in question, 10Hz corresponds to about a musically significant quarter of an octave. Bandwidth and efficiency, though, aren't everything in nearfield monitor design (witness again the enduring appeal of the NS10), and the Munro Sonic Egg is an innovative attempt to put radiation linearity at the top of the priority list — and on the basis that non-conformity in this deeply conformist world is a good thing, it definitely deserves to be celebrated.
While the volume/efficiency/bandwidth law is fundamentally inalienable, there is however still a bit of wriggle room. In the mechanical-acoustic domain you can modify the law by reflex-loading your monitor to get some value out of the rear radiation of the driver (see Note 5). Reflex loading is cheap, easy to do, apparently a free lunch, and is loved by industrial designers because it can provide an extra bit of eye-candy; all of which explains why you hardly ever see a monitor without a port! Of course, the free lunch is a mirage, because reflex loading unavoidably introduces time-domain errors — and often various compression and distortion effects — to the speaker's low-frequency performance which, at their worst, a significant body of opinion (this body included) believes constitute a problem.
With active drive there's also wriggle room in electronics. Equalisation can extend a monitor's low-frequency bandwidth beyond the natural cutoff defined by the driver's electro-mechanical parameters and the enclosure volume and tuning frequency (if the enclosure is ported). All that's really happening, however, is that as efficiency falls below the low-frequency cutoff, the amplifier delivers more power. Active low-frequency equalisation can be a very useful tool, but taken too far it runs into the issues of group delay from over-ambitious equalisation filters, compression and distortion from low-frequency drivers asked to operate at the extremes, and reflex ports through which it simply isn't possible to push enough air without turbulence and compression (see Note 6).
You should be getting the impression from all this that monitor design is a little like herding cats or squeezing a balloon. Make gains in one area and the laws of physics conspire to pop up and blow a raspberry at you somewhere else. So along with products like the Kii, PSI and the sE/Munro monitors that nail their flags to a specific mast or two, there are also technologically advanced monitors in the market that take a more holistic approach and wriggle just a little in multiple domains. A good example of this approach is the One series from Genelec. The One series integrates Genelec's existing DSP‑based SAM/GLA room-optimisation technology and their dual‑coincident driver technology, first introduced on their 8260A, with some innovative new ideas on integrating low-frequency drivers in compact cabinets. The key to the last of these is that the wavelength at low frequencies is so long that, rather than treating a bass driver as a diaphragm that sets up a wave, you can treat it more like an air pump — the acoustic wave is set up where the pumped air exits. This conceptual leap opens up freedom to locate the bass driver diaphragm (or diaphragms) to fit the demands of really compact enclosures and then use DSP-based equalisation to correct the resulting frequency response. If there's a downside to pushing simultaneously on multiple engineering boundaries it's that doing so is not inexpensive — and the Genelec One series illustrates that quite effectively!
So the art of good monitor design is to find a viable path through a multitude of often conflicting multi-disciplinary constraints. Successful designs often have a sense of inevitability about their final form because the answer to each design problem in turn dictates the final solution — as in a maze, there is only one route to the exit. The Genelec monitors are a classic example of this. They are the way the are because the logic of the design process allowed no other solution.
Surely monitor development can't creep forward at a (pre-climate change) glacial rate while the rest of recording technology zooms off into the stratosphere? Well, yes it can, and it probably will. There seems little doubt that we'll see more integration of monitors into the network audio paradigm — we already have many active monitors with digital and network inputs — although how the 'networkification' of monitors will improve their audio performance I'm not sure. Perhaps network inputs and on-board DSP will make more practical and commonplace the really effective driver and room correction that has been hovering above the horizon for well over a decade? Surely that will bring a fundamental leap in performance? It will certainly make things sound different, but different is not always better. Active room correction is fundamentally limited in what it can achieve once the influence of the room becomes chaotic in the reverberant field. Room 'measurement' and EQ is effectively limited by the sheer complexity of the problem (see Note 7). Even if more DSP power is thrown at the problem, the risk is that any improved accuracy will be over an increasingly limited listening window and, if time-domain errors are part of the correction, result in unacceptable latency. There's also no escaping the fact that applying 'corrective EQ' to the speaker significantly distorts the amplitude response of the direct sound, an issue that some commentators consider a fatal flaw.
But there have been a couple of very recent applications of heavyweight DSP that might point to a new era in speaker design. The first is Trinnov's range of stand‑alone DSP room/speaker optimisers, which feature a four-channel measurement mic and can (at the cost of latency) compensate not only for frequency-domain issues, but also for time-domain ones — in other words, phase anomalies. The second, I mentioned earlier: the Kii Audio Three. This is a rare example of DSP being used to offer something genuinely new in a speaker: cardioid dispersion right down to low frequencies. Speakers based on moving-coil drivers are naturally cardioid at mid- and high frequencies, and one of their fundamental problems is that, being naturally omnidirectional at low frequencies, the way they drive the acoustics of the listening room varies with frequency. So it's not rocket science to imagine that if you could make a speaker with a relatively tight and consistent radiation pattern over its full range, the vagaries of the listening room would be, to some extent, taken out of the equation. The 'rocket science' part of the equation is making it work, which is what Bruno Putzeys and his colleagues at Kii have achieved. It's not just Kii Audio that have deployed DSP in the direction of dispersion control, though — take a quick look at Bang & Olufsen's remarkable Beolab 90 and Beolab 50 speakers and tell me you're not interested.
Another area that you might imagine on-board DSP could play a role in would be the correction of driver errors. It's perfectly feasible in the frequency domain, but problematical in the all-important time domain. Not only are time-domain errors fundamentally intractable in a universe where cause precedes effect (a driver with mass and compliance will always bounce when 'hit' by an impulse and no amount of upstream DSP power will stop it), many of the errors we might want to correct are not only non-linear and level-dependent, which is challenging enough, they are unpredictable. It comes back to my opening premise that a monitor pushes information through multiple domains. Manipulation of the information in one domain can't always fix the errors in the subsequent one. It also comes back to the fundamental issues of all moving-coil speaker technology, where materials science and electro-acoustics are the keys: if you want to make a better driver, then all the DSP in the world won't help as much as would finding a new, ultra-light, ultra-rigid material for a diaphragm, or developing a new driver topology that improves significantly on the one used by Rice and Kellog.
So where are we on those fronts? Well, leaving aside the fact that Yamaha developed, for their NS1000M of the late 1970s, high and mid-frequency diaphragms made from immensely light and rigid vapour‑deposited beryllium (since when, sadly, Yamaha's speaker technology seems to have gone backwards), it tends to be hi‑fi speaker companies rather than professional monitor companies that force the pace in driver development. Professional monitor manufacturers, with a few honourable exceptions, seem mostly content to rely on the driver technology of two decades or more ago. For instance, Bowers & Wilkins have their vapour-deposited diamond high-frequency domes that remain rigid up to around 50kHz (a typical 'soft‑dome' tweeter diaphragm will be no longer moving as a whole well below 10kHz). And again, Focal introduced beryllium domes in hi‑fi speakers long before they appeared in professional monitors.
At the other end of the bandwidth, while diaphragm technology in the pro sector extends only as far as a few examples of honeycomb composite cones (Focal and ADAM), the high-end hi‑fi sector is home to a number of very advanced ceramic-diaphragm bass and bass/mid drivers, with those from German manufacturer Accuton (www.accuton.de) being particularly impressive. Accuton's technology masterpiece is probably the 120mm-diameter ceramic-dome mid-range driver (see Note 8), the likes of which, sadly, I don't expect to see in a professional monitor any time soon. It would be fascinating to see more of this style of cutting-edge driver technology employed in professional monitors, but while the cost pressure in the sector is unrelentingly downward, and the technology 'buzz' is unrelentingly about DSP and digital electronics rather than electro-acoustics, it seems somewhat unlikely.
So, has there really been only one significant speaker technology development (DSP) since the mid 1990s? Perhaps I'll be shot down in flames for saying so, but I fear that's the way things are. Even worse, I believe the commercial pressure for ever more 'performance' (particularly bandwidth) for ever less cost has held speaker technology back. It seems to me that some fundamental stuff has been largely forgotten — if we were to take a few of the best small hi‑fi speakers of the 1970s or 1980s, drive them with a decent amp, and put them up against some contemporary nearfield monitors, I suspect we'd be quite surprised at just how well they work. Heresy? Possibly. But do a bit of digging around the archives and listen to a few classic designs yourself and see what you think.
But what of the future? Is there anything other than DSP lurking over the electro-acoustic horizon? Well, there is something that, if the hype is to be believed (and you don't get much more hype than a Nobel prize), might just be a game changer: graphene. Graphene is a sheet material created by a specific arrangement of carbon atoms and its mechanical properties are extraordinary. Compared with steel, graphene sheet is said to be five to six times lighter, have 10 times the tensile strength and 13 times the bending strength. Graphene headphone diaphragms already exist and the material has found its way into speaker diaphragms as a reinforcing element. But some day soon, someone will surely make an entire speaker diaphragm from graphene — and that's something I would definitely like to hear.