Artificial reverb algorithms are designed to add 'fake' room ambience to recorded sounds — but a prototype reverb unit from Sony purports to deliver new levels of realism by imposing the characteristics of actual reverberant spaces on the sound. Hugh Robjohns explains...
Reverberation is amazingly complex and, with limited DSP capabilities, the vast majority of reverb units can only synthesize a relatively crude approximation to the real thing — a few early reflections and a mush of decaying noise! Having said that, many reverb units manage to produce perfectly acceptable approximations for most purposes, but only the most expensive machines stand comparison with the real acoustic of a large, well‑designed hall or church. However, the continuing advance in DSP power is now allowing an old idea to become a practical reality, with a potential leap in the realism of 'artifical' reverberation.
This old idea is a process called 'convolution' — 'convolving' or imposing one real sound on another — in this case adding reverberation to a music or other sound source. Convolution is a relatively simple process (as simple as any other kind of digital signal processing, anyway!) but requires an immense amount of number‑crunching — which is why it is only now becoming a viable technique. If you don't want to know the theory, skip the next section!
Convolution is a complex and computationally intensive process, but the core idea is actually very simple. Think about standing in a large hall or church and clapping your hands just once to produce a sharp click. Clapping hands generates a 'transient signal' which triggers dozens of reflections and echoes, producing an elaborate reverberation unique to that particular venue. Let's also assume you were able to capture that reverberation pattern on some kind of digital recorder.
Now think about a digital audio signal and, in particular, the stream of samples of which it is composed. The level of each sample is determined by the amplitude of the original analogue audio signal at a given moment in time, but any one of those samples, in isolation, would sound like a click not too dissimilar from our transient handclap. a single sample, if replayed with sufficient volume in the hall or church described earlier, would therefore generate the same kind of reverberation pattern as our earlier handclap — which pattern we had the good foresight to record!
This is where the leap of faith comes in! How about making a machine which would replay the entire recording of the reverberation pattern we made — triggered by each and every sample in the digital audidata stream, with the level of the replayed reverberation modified to correspond with the amplitude of the triggering audio sample? The resulting sound would be the recorded natural reverberation pattern of the real venue imposed on the digital audio signal, in exactly the same way as if the sound source was being played and recorded in the natural acoustic space. This whole process is called convolution.
The problem, as always, is in taking this fairly simple idea and turning it into a practical reality. The challenge of recording the natural reverberation is discussed in the 'Recording Reverb' box, but let us assume for the time being that a perfect recording exists. What would that recording contain? Well, a typical large hall might have a reverberation time (RT60) of perhaps five seconds (that means that it would take five seconds for the reverberation sound to fall to 60dB below the level of the initial triggering sound). However, we need the reverberation tail to decay all the way down to the noise floor — which is potentially ‑144dB or more (for a 24‑bit system) below the maximum peak level. That implies we are looking at a total reverberation time, to the last dying ember of the reverb tail, of perhaps around 15 seconds — which, at 44.1kHz, represents well over half a million samples.
The convolution process requires that for each sample in the digital audio signal (the one we want to add reverb to), we have to replay that string of half a million samples of reverberation, but suitably modified in level in accordance with the amplitude of the audio signal. Changing the gain of a digital signal requires a multiplication process. It gets even more complicated when you consider that this stream of modified reverberation samples then has to be added to the all the other strings of reverberation generated by all the preceeding audio samples, as well as to the audio samples themselves. And that is one hell of a lot of multiplications and additions!
Multiply that lot by two for stereo, or four (or more) for surround... and then double the number you have just thought of if you want a 96kHz sampling option! It goes without saying that these big sums have to be performed before the next audio sample arrives and, to make sure rounding errors don't reduce your output resolution, all calculations must be performed with double or triple precision (with, say, 56‑bit wordlengths).
A few products which use convolution for simulated reverberation have already reached the market, including a Pro Tools plug‑in from Sound Forge and the TC Electronic M5000, which uses convolution to enhance the realism of the early reflections in some of its reverb processes. However, neither of these examples have the DSP capability to perform real‑time convolution to the degree necessary to calculate the entire reverberation tail to the full resolution of 24 bits, let alone at 96kHz sampling rates. Thus, convolution is either performed at present as an off‑line process, or is only used as a part of the overall method of synthesizing reverb. All that is about to change, though, thanks to Sony — and most of the other major players in this field are bound to follow with similar systems in the relatively near future. Convolutional reverberation is about to become the technology of the new millennium.
If you read through the Frankfurt MusikMesse show report last month in Sound On Sound, you may have noticed a reference to Sony's protoype 'Sampling Reverb' machine, the DRE S777. As you can see from the accompanying photo, it already looks like a finished product and I'm told we can expect it to be launched in around six months. The picture also reveals that the S777 bears little resemblance to the normal crop of digital reverberators we know and love. The LCD panel, four soft keys and the data entry wheel are pretty much par for the course, but the Chippendale front panel makes it stand out in the crowd — and what about that CD‑ROM drive?
Sony have been recording the natural reverberation of many of the best known and well‑loved venues in Japan and Europe, and have the data for 10 rooms already complete, with many more to come. The amount of data required to record the reverberation of a room is vast and will be supplied for use with the DRE S777 on CD‑ROM (hence the front‑panel drive). a small amount of editing of the reverb characteristics will apparently be possible, but nothing like that available on existing artificial reverberation units. Typically, the soft keys and control wheel will allow the user to customise the pre‑delay and, perhaps, decay settings and damping for the provided halls, but little else. However, since the whole idea of the machine is to provide instant access to some of the best‑sounding halls and rooms in the world, why would you want to change perfection?
Although the product will be launched with CD‑ROMs for a selection of well‑known halls, future discs may include a wide range of alternative acoustic spaces — for example, domestic spaces, forests, virtual rooms, submarines, and who knows what else! It may even become possible, with later‑generation machines, for users to record and process their own reverberation environments. Furthermore, the convolution process can be used for a lot more than just imposing real reverb characteristics on a sound, and Sony have big plans for the S777 and its descendants in the future.
This leading‑edge technology will obviously not come cheap — at least for the first generation of products — and the glorious polished wood front panel implies an extra nought on the price tag to start with! I'm told the DRE S777 will be marketed in direct competition with the Lexicon 300L, at between £3500 and £6000 depending on fitted options. I have not heard the machine, but those who have tell me it sounds incredibly natural and believable, so it should represent very good value for money, especially when you consider that the current state‑of‑the‑art Lexicon 480L will set you back around eight and a half grand! The S777 will be part of Sony's 24‑bit world, with 24‑bit digital I/Os as standard, operating at either 44.1 or 48kHz sample rates. Further options will provide analogue I/Os, and a DSP upgrade to permit either four‑channel surround or 96kHz sample rates for stereo outputs.
In order to handle the monumental computational tasks involved, Sony have had to produce their own application‑specific DSP chips using the very latest fabrication technology to execute these repetitive but complex calculations at such tremendous speeds, performing 256,000 point convolutions. Although the R&D investment has been high, however, it is likely that these new processors will eventually find their way into mass‑market products such as hi‑fi surround decoders and processors, which will spread the costs over a much wider product base.
It sounds simple, doesn't it? How complicated can recording a bit of real reverberation be? Well, it is actually very difficult if you want to be able to use the data for convolution.
In a perfect world, all that would be needed is a system which generated an acoustic transient click, and then recorded the resulting reverberation. After all, in the slightly bizarre engineering world of time‑domain impulse responses, a transient click contains all frequencies at equal amplitudes and is thus the perfect test signal. However, a true transient (or impulse to use the engineering speak) is infinitessimally short, and thus contains no energy... and you definitely need acoustic energy if you want to generate an audible reverberation pattern which you can record!
If the duration of the 'click' is increased to contain more energy, it ceases to have a uniform frequency response (which is one of the problems afflicting loudspeaker testing using only impulse‑reponse systems). a second problem concerns what format to record the signal on. Ideally, the recording should have rather greater dynamic range than the final production medium, so that any errors or inaccuracies are inaudible. However, the best we can currently do is 24‑bit — which is already a production and release format.
A solution chosen by the Sony engineers developing the DRE S777 is this: instead of an impulse test signal, they use a special frequency sweep generated by a computer and output through a large, very loud, and accurate PA system into the space whose acoustic they wish to capture. The resulting natural reverb is captured through four microphones which are placed in the venue at precise distances and angles from the PA. (Four microphones are used because the DRE will be capable of reproducing natural reverberation in a four‑channel surround‑sound format). The frequency sweep is performed and recorded 16 times in all.
Back in Japan, the 16 recordings are time‑aligned in a computer and combined. Because the 16 test signals are all phase‑coherent they add constructively, increasing in level by 6dB for each additional pair of recordings, and boosting the wanted signal level by 24dB. The background noise, by contrast, is random, and so adds at only around 3dB per pair of tracks (12dB in total), so the result of combining all 16 recordings is an improvement in the signal‑to‑noise ratio of around 12dB, or two bits' worth. Thus the captured reverberation signal has a dynamic range 12dB greater than the 24‑bit signal processor in which it will be used.
The recorded reverberant frequency sweep signal is further processed to convert it from the frequency domain to the time domain (Fourier transforms raise their heads at this point!). The original sweep source can also be converted to a time‑domain signal, and the disparity between a true impulse and the real‑world test signal determined (see diagram). By applying the inverse of this difference to the recorded reverberation, a new signal is produced which is what would have been captured if a perfect impulse test signal had been used in the first place! This is the reverberation pattern which can then be used in the convolution process and is stored on a CD‑ROM for use with the DRE S777.