The Realiser A8 is claimed to be able to capture the sound of any monitoring environment in the world and perfectly reproduce it through headphones. Could it be time to sell your speakers?
The audio industry has been toying with the idea of artificially simulating speaker listening ever since someone first demonstrated that playing back binaural recordings (captured via mics within a person's ear canals) over headphones could recreate a thoroughly compelling three-dimensional spatial illusion. While SOS readers will probably be most familiar with the Virtual Reference Monitoring technology included in some affordable Focusrite audio interfaces, the last 20 years have seen many products that have attempted to harness this binaural concept, with heavyweight players like Roland, Sennheiser, AKG, Studer and (most recently) Beyerdynamic all dipping toes in the water.
However, uptake of such systems for serious studio work has remained pretty minimal, the main reason being that most of the simulations simply aren't very convincing for the majority of people: they're all right as gimmicks for recreational listening, but not effective enough to persuade audio engineers to rely on them for heavy-duty recording or mixing. For example, almost all current products don't keep the virtual monitoring environment stationary when you move your head, which powerfully contradicts the binaural illusion on a psychological level. And nothing we have so far reviewed personalises its speaker emulation to match each individual user's ear shape, an aspect of physiology that's fundamental to binaural audio.
During early childhood, each of us learns to recognise how the acoustic properties of our unique ear shapes affect the frequency response of sounds arriving from different angles. As a result, each person learns a different set of filter curves, referred to as their Head Related Transfer Functions (HRTF). This is why the perceived realism of any binaural recording will vary dramatically between listeners, depending on how well their own HRTFs match that of the ear (real or prosthetic) used to make the original recording. All the binaural monitoring products we've seen thus far have either employed generic HRTF data obtained by averaging results from many different individuals, or have given you a choice of several different HRTF data sets in the hope that one of them will match your own ears tolerably well.
I've been trying out binaural encoders like these for almost 20 years now, and have been forced to conclude that I'm some kind of mutant, because I've never had a passable semblance of realism from any of them. So when I heard that a small UK company had created a binaural speaker emulator that actually measures your personalised HRTF characteristics, I was all ears — so to speak!
The product in question is the Smyth Research Realiser A8, which has at its core a convolution-based technology called Smyth Virtual Surround (SVS). Supplied miniature in-ear microphones let you sample not only your unique control-room listening system (ears included!), but also the frequency response of your specific headphones. The resultant Personalised Room Impulse Response (PRIR) and Headphone EQ (HPEQ) files are then combined into one of four onboard presets to recreate the sensation of listening to your loudspeakers as you'd normally do.
A two-piece head-tracking system is included, too. Wired to the Realiser is a little infra-red detector gizmo that clips to your DAW screen or console meterbridge and logs the angle of a wireless transmitter module fixed to the headband of your cans. The head-top unit is about the size of a 9V battery (but much lighter), and its protruding USB-style connector mates not only with the headphone mounting clip, but also with a socket on the Realiser itself for charging purposes. The detector's maximum range is about 10m, and although this is dependent on environmental factors to an extent, I didn't experience any difficulties within a 5m range as long as the head-top module had been adequately charged.
Inside the Realiser there's enough DSP horsepower for eight SVS processing channels, each with its own hardware I/O socket. The channels are nominally designated Left, Right, Centre, Subwoofer, Left Surround, Right Surround, Left Back and Right Back, although the actual position of each virtual speaker isn't restricted, and the hardware I/O can also be freely reassigned.
Some processing facilities are common to all the channels. The input signal passes through a high-pass filter with switchable cutoff frequency (80Hz, 120Hz, 240Hz and 480Hz) and hi-fi style Bass and Treble tone controls, each offering ±12dB gain in 2dB increments. Following this is a ±12dB gain control and up to 63ms of delay, should you wish to adjust the balance or time-alignment of the virtual speakers after the sampling process. The main SVS processing module then uses the PRIR and HPEQ files to simulate each channel's virtual speaker binaurally, and all eight SVS outputs are mixed to feed your headphones via a software volume control and switchable safety limiter.
In addition to this, channels 3-6 are prefaced by a rudimentary mixing matrix, which can blend signals from all eight hardware inputs. This makes it possible to reassign some of the audio streams in a 7.1 mix for auditioning in 5.1, for instance, while switchable 80Hz low-pass filtering on channels 3 and 4 lets you redirect low end from multiple inputs to a single virtual subwoofer — providing, in other words, a way of implementing bass management.
Two more stereo processing channels, each furnished with its own independent Mix Block, expand the options further. The first generates a stereo downmix of all the input channels without SVS processing — in other words, a normal 'inside the head' headphone listening experience. The second stereo channel adds a switchable low-pass filter (with 80Hz, 120Hz, 240Hz and 480Hz settings) and up to 255ms of delay, and supplies a pair of dedicated rear-panel outputs marked Tactile. If you connect an external 'seat shaker' device to these outputs, you can supplement the virtualised listening experience with the kind of physical low-frequency sensation that headphones are incapable of delivering on their own. Alternatively, you can use a physical subwoofer instead, provided that you adjust each input channel's high-pass filtering to avoid doubling up the hybrid system's subjective bass level.
The fundamental process of sampling a speaker system is fairly straightforward. You start by routing the Realiser's outputs to the appropriate speakers and putting the mini-mics in your ears. You get a set of foam plugs to fit different ear sizes, and the mics are supported by these so that their diaphragms are held securely at the openings of your ear canals. A built-in calibration routine checks for optimal levels, whereupon sine-wave sweeps are used to make binaural PRIR measurements at the monitoring sweet spot for three different head angles (Look Angles, in Realiser parlance): facing the centre of the stereo image; facing the left speaker; and facing the right speaker. Leaving the mini-mics in situ, you then don your headphones (circumaural designs gave the best results for me) and connect them to the Realiser for another level calibration and then the HPEQ test sequence.
There are, however, numerous refinements available to this basic scheme. For instance, the PRIR sweeps can be lengthened from three (the default) to 12 seconds in duration, and repeated up to eight times per speaker to improve the measurement fidelity. Without doing repeat measurements, I found that noise in the PRIR produced a synthetic-sounding low-level reverb 'ghost' for a fraction of a second after DAW playback was halted, so I opted to minimise this by using eight 12-second sweeps per speaker for each of the three Look Angles, despite the increased sampling time required — around 12 minutes for a stereo system, up to roughly 45 minutes for a full 7.1 setup. Where the speakers are quite distant, using more than two sweeps can cause some high-frequency dulling because of phase variations introduced by the air, but I experienced no such issues in typical mid-to-nearfield scenarios.
The accuracy with which you line up your head for the different Look Angles has a bearing on the realism of the PRIR, so it's useful that there's an option to engage the head-tracking system (or a separate measurement step involving audio pilot tones) to capture the exact angle of your head during the sampling process, in order to eliminate that element of human error — or to sample any system that has speakers placed in unusual positions or tucked out of sight.
If you're willing to dive deep into the operating system, you can even build up PRIR files one speaker measurement at a time, so even if you only have a stereo monitor system, you can sample those two speakers repeatedly at different locations in your room to create an SVS surround setup. You could even use just one speaker to achieve improbably tight frequency matching across the whole rig!
Although there's little you can do to refine the HPEQ measurement itself, you can control how you use its raw data. What the Realiser does is take the measured headphone frequency response and inverts it into a compensatory 32-band graphic EQ curve. However, this process involves some technical compromises (such as the fact that your ear canal's resonance isn't captured by the foam-clad mini-mics), so the Realiser lets you finesse the HPEQ setting by ear, either in a broad-brush manner by adjusting the potency of the EQ across three frequency zones, or by tweaking each of the 32 HPEQ bands individually, in 0.5dB increments over a ±9.5dB range. In the latter scenario, the Realiser feeds a noise test signal through each of the EQ bands in turn so you can directly compare real and virtualised sounds via a special head-tracking mode (see 'Off The Beaten Tracking' box for details).
The first question everyone asks me when I tell them about the Realiser is this: does it actually work? In other words, can SVS indistinguishably mimic the sound of my speakers? Well, let me be absolutely clear: no, it can't. I tried Beyerdynamic DT880 Pro, Sennheiser HD650, and Stax SR202 headphones, all seriously high-quality beasts. I bypassed all the Realiser's analogue output circuitry by driving a stand-alone headphone amp from its rear-panel optical S/PDIF output. I pushed the measurement techniques and post-sampling refinements as far as I could, with direct assistance from Smyth Research themselves. I connected my subwoofer to the Tactile outputs to add physical bass sensation. I squeezed my eyes shut, crossed my fingers, and believed with all my heart... but, try as I might, my well-worn reference CD never did sound exactly the same through the Realiser as through my own nearfields.
However, defining the Realiser in terms of what it can't do is, in my opinion, missing the point — because what the Realiser can do is astonishing! Despite never truly cloning the sampled speaker system, it does get startlingly close, and my first experience of its speaker-listening illusion left me, quite literally, open-mouthed. Even after extensive A/B comparisons, I still often felt moved to briefly lift a headphone earpiece in order to banish doubts about whether I was hearing my headphones or my speakers — and I lost count of the number of times I mistakenly tried to adjust the SVS headphone volume from my analogue monitor controller! Sound images are sharp and well-defined, panning decisions are trustworthy, and common headphone-mixing pitfalls such as misjudged effects levels and centre-versus-edge imbalances are easily avoided. In short, the Realiser acts very much like a real speaker system — albeit a slightly different speaker system than the one you sampled.
And the story is little different in surround. Despite my own limited experience in this field, sampling a 5.1 post-production system at SAE's Munich campus demonstrated that SVS's rear imaging was utterly uncanny, and panning around the different speakers felt very natural. On that basis, I'd personally expect no limitations to the system's usability for surround work beyond the hardware's I/O count.
A big concern many people have about headphone monitoring is the lack of physical bass sensation, but I was genuinely surprised at how little difference the use of the Tactile output actually made to my mixing decisions in the long term. Just hearing the low-frequency mix components within such a believably speaker-like context seems to clarify most low-frequency level and quality questions on its own somehow, and in no less reliable a manner than 95 percent of nearfield monitoring systems I've heard, given the strong influence of room resonances on real-world bass reproduction. Furthermore, the Realiser's nifty Direct Bass feature (see the 'Better Than The Real Thing?' box) can remove the effects of LF room modes from its emulation entirely, delivering low-end fidelity that's well beyond the capabilities of the speaker system you originally sampled!
So forget about treating the Realiser like some kind of audio photocopier. The most important question to my mind is whether you can produce release-quality work on the Realiser as quickly and reliably as on a similarly priced monitoring system comprised of physical speakers, mounting hardware and acoustic treatment. Hand on heart, I honestly think you can.
The last sentence has probably just propelled most readers' eyebrows into orbit, and some of you may already be dismissing me as a cloth-eared crank. Indeed, I would have expected an equally powerful knee-jerk reaction from an earlier self, which is why I ended up going rather overboard with the review process, vainly attempting to dissuade myself from such an inflammatory stance. I hogged the review unit for more than a year, sampling various different speaker systems including the Blue Sky 2.1 nearfield system and Avantone Mix Cube mid-range monitor which are both mainstays of my mixing rig. I took several mixes from start to finish using the Realiser exclusively, including the 'Mix Rescue' remixes in SOS March and April 2013. I used the SVS emulation of my own speakers as a reference during several complicated location recordings, such as the full-band dates featured in the October 2012 and April 2013 'Session Notes' columns.
But despite all that, I've been unable to escape the conclusion that the Realiser is at least as worthy a performer, in terms of both mixing power and mixing speed, as any similarly priced physical monitoring system. It can take a little time to acclimatise your ear to a given Preset, but it's an indicator of how good the SVS system is that this mental break-in period seems no longer than when switching to any new pair of speakers or an unfamiliar acoustic environment.
Now let's consider the Realiser's UK price: factoring in a pair of monitoring-grade headphones and a few hours sampling speakers in a local studio, you're looking at an outlay in the order of £3000$4500 or so for a functional system. Expecting studio owners to fork out such a sum would be ludicrous if SVS weren't capable of serious mix work, but if you're still reading this then I'm assuming you're willing to entertain the possibility that it is. Taken in that light, the price actually carries the whiff of a bargain, as I see it [smell it, surely? — Ed].
Let me explain. If I had to duplicate my own main nearfield monitoring system (with its speaker stands, decouplers and acoustic treatment) from scratch, I estimate it'd cost £1000$1000 more than the Realiser — and you could probably add another £1000$1000 to that if I used off-the-shelf acoustics products rather than thriftily constructing my own. The uncomfortable fact that the SVS emulation of this system proved itself an equally reliable mixdown tool would be reason enough to consider the Realiser good value for money, but it also offers enormous ancillary benefits.
Firstly, you can take advantage of commercial-grade speaker monitoring pretty much anywhere: in a broom cupboard, in the car, in the middle of a crowded office, or even lounging by the pool! This presents a paradigm shift as far as home studios are concerned, and also makes the Realiser a location-recording tool par excellence, given that its 1U half-rack box will fit in a laptop bag. Then there are the expansion possibilities: with a little practice, you can sample new speaker systems from a standing start in under an hour, store as many samples as you like via the front-panel SD card slot, and break into the world of surround-sound monitoring even if you've no surround system available to sample.
Clearly, though, there are some situations where real speakers still steal a march on these virtual counterparts. Nothing beats them for vibing up a band after their first take, for instance, and collaborative music-making is also easier if you can fill your control room with sound. To be fair, a single Realiser can actually host two users, with independent Preset selection and head-tracking, at the expense of reducing the maximum PRIR sample length to 250ms, but you do still have to create a second set of PRIR and HPEQ files, so it's not something you're likely to try on impulse after pub closing time.
Latency may also be a deterrent in a few applications too, because the convolution processing incurs a minimum delay of around 15ms. I never found this distracting during my own location sessions, but given that you have to add this delay on top of any latency inherent in your recording system, I imagine that many musicians may therefore rule out using SVS for overdubbing purposes. You can always bypass the emulation for that, though, using the Realiser simply to amplify a traditional headphone foldback feed.
I also missed the psychological 'averaging' effect that you get with real speakers when you stroll round your room. Although you can move around quite a bit without losing the head-tracking, your virtual monitoring position remains riveted in the stereo sweet spot, which might not be the position that provides the most useful frequency balance in your sampled room. In practice, however, any time this lost me by slowing down mix decisions was recouped via the Realiser's sharper low-level detail and the improved low-frequency linearity afforded by the Direct Bass feature.
My biggest gripe about the Realiser has nothing to do with the sound, though: it's the clunkiness of the user interface. Using the small four-line LCD display and button-festooned remote control has all the appeal of programming bagpipe multisamples on an Akai S1000 with a broken data wheel! Creating a surround setup one speaker at a time seems mind-numbingly convoluted, while manually tweaking the 32-band HPEQ involves (no word of a lie) hundreds of button presses — a process made all the more soul-destroying when the Realiser occasionally seemed to forget my settings or shift them to neighbouring bands, eventually driving me to record them on paper! There are also buttons and menu entries that do nothing, which only adds to the impression of a job half finished, despite the stability and maturity of the underlying SVS algorithms.
At this price point, some studio users will doubtless frown on the line-lump power-supply, unbalanced analogue connectivity (all RCA phonos), and preference for multi-channel digital transfer over HDMI rather than something like ADAT or AES3. In practice, though, I found the unbalanced audio caused me no problems during the review period, despite my linking the unit up to multiple different systems for sampling and monitoring purposes, and so I felt little need to use the digital I/O beyond confirming that it actually operated as claimed, sitting in-line between my computer's HDMI output and my display without interrupting the video feed, and handling resolutions of up to 24-bit/192kHz.
Frankly, though, to hell with the niggles! For me it's an absolute bombshell that SVS allows me to mix on headphones just as quickly and easily as on my more expensive non-virtual monitoring system, so I reckon the Realiser fully deserves to cut a swathe through the project-studio monitoring market. It doesn't render real speakers redundant, but neither did the Line 6 Pod render real guitar amps redundant, and that still shifted units by the truckload — a triumph of convenience, economics, and real-world usability.
But what blows my tiny mind even more is the tantalising prospect that SVS technology might trickle down beyond this first-generation proof of concept. Imagine, if you will, a kind of supercharged Focusrite VRM Box with computer-hosted processing, slick software control GUI, and USB-connected I/O hardware. What price could that be manufactured for? And how about a cheaper playback-only version with just headphone and head-tracker sockets, which would allow colleges to sample their whole student intake at the start of a course, and then teach mixing and sound-design in a packed-out computer lab with everyone listening from the flagship studio's virtual sweet spot. Big mixing and mastering facilities could tap into a regular revenue stream by hosting speaker-sampling events, confident in the knowledge that personalised PRIR samples would be impossible to bootleg...
The revolution starts here.
Many thanks to Rainer Schwarz and Matthias Schaaff at SAE Munich for their help with this review.
There's nothing else on the market that offers such heavily personalised binaural speaker simulation as the Realiser. However, if your hearing system happens by chance to be very close to the average, then Beyerdynamic's Headzone system (reviewed in SOS March 2007) may provide a viable alternative at a slightly lower price — albeit with two fewer input channels, no option to switch between different speaker models, and a narrower range of supported headphone models.
The way Smyth Research's head-tracking algorithm cleverly interpolates between the three Look Angle measurements in a PRIR file is one of the key reasons why the Realiser can offer DIY speaker sampling at all. Without this, you'd have had to repeat your impulse-response measurements for small Look Angle increments across the entire stereo field — multiplying the sampling time beyond the point where cramp typically sets in! However, an inevitable limitation of this approach is that the Realiser will only follow your head through a ±30-degree angle. As such, you have to select what happens if your attention wanders further off-axis than that: the Realiser can mute the headphones, for instance, or defeat the head-tracking so that the centre of the soundstage follows your nose.
An important related feature here, though, is that the Realiser's head-tracking can also sense when the headphones are tilted downwards. By engaging a special Demo mode, you can get the unit to mute the headphones whenever this happens, and simultaneously pass any signals arriving at the unit's rear-panel inputs directly through to its outputs — and onwards to your monitor system. In this way you can instantly compare a real speaker system with its SVS-emulated version, just by pulling the headphones on and off.
Although the Realiser never sounds exactly the same as the specific speaker system it samples, that certainly doesn't mean that it sounds worse. In fact, one of the most intriguing things about this system for users of small studios is that the Realiser actually has the potential to be better, for mixing purposes, than the speakers it's modelling! For example, because standard convolution is incapable of capturing dynamic effects such as compression and distortion, the SVS emulation will usually exhibit less power compression than real speakers, meaning that as long as you have an excellent set of headphones, the frequency-response characteristics of the virtual speakers change less than those of real ones as you adjust the monitoring volume. You'll also hear less distortion, which means that the Realiser tends to present instrument timbres more cleanly and reveal a greater degree of low-level mix detail — the latter further enhanced by any reduction in background noise your headphone earcups provide.
But that's not all. The maximum duration of the impulse response data in a PRIR file is around 800ms, but the Realiser lets you choose, post-sampling, how much of it you actually want to use. This means that you can reduce any undesirable reverb tail in your monitoring room by truncating the PRIR data, much as you can with an impulse response running in a convolution reverb plug-in. (Even if you're sampling a bone-dry monitoring environment, you might also wish to use this to eliminate any vestige of the convolution engine's artificial 'ghost' reverb overhang, as mentioned in the main article.) For more detailed control, you can also apply a handful of preset volume envelopes to the PRIR data to adjust the balance between the beginning and end of the file, thereby suggesting a change in the distance between the speaker and the listener as the ratio of direct/reflected sound changes. And these PRIR tweaks aren't just available on a global level — you can also apply them speaker by speaker. It's pretty freaky stuff.
But easily the coolest weapon in the Realiser's arsenal is the Direct Bass feature. This allows you to reroute the low-frequency information that normally feeds the Tactile output, mixing it into the headphone signal instead, but bypassing the SVS processing. What this means is that if your speaker system's low end is compromised by low-frequency room modes (in other words, if you've spent less than five figures on acoustic design), you can filter the lumpy low end out of the SVS emulation and replace it with unsullied direct signal (low-pass-filtered and delayed as appropriate), in effect giving you ruler-flat low end. Indeed, despite having spent at least £1500 acoustically treating the low-frequency resonances in my own mix room, I still ended up preferring the unnatural neutrality of Direct Bass when emulating my monitors on the Realiser. Yes, you read that right: the Direct Bass function delivered a clearer and more usable low-end balance than my full-range nearfield monitoring system!
Now you might expect the Direct Bass to sound odd, because it's completely free of the sampled room reflections that affect the virtual speaker signals. However, as long as you take your time setting up the appropriate crossover, level and delay parameters, you'll find that it does little to undermine the realism of the SVS illusion, because low-frequency spatial cues are naturally so weak, even in real rooms. Those people with large 2.0 stereo setups may notice some reduction in the subjective width of low-frequency sources, but if your speakers are smaller two-way designs or a 2.1 system with a shared subwoofer, this difference will be pretty minimal. And besides, it's a negligible price to pay for a speaker listening experience with the smoothest bass response you're ever likely to hear.