Celemony's clever restoration tool invisibly fixes the speed variations that can plague material recorded on tape or vinyl.
Anyone under the age of 30 might be excused for wondering what 'wow and flutter' are, let alone what they sound like! With the advent of the CD in 1983, wow and flutter were effectively banished, but previously they were very familiar parts of all recorded sound, caused by short-term variations in playback speed due to inherent mechanical inaccuracies. For the unfamiliar, 'wow' is a slow, usually cyclic, speed variation, while 'flutter' is a faster variation (below and above about 4Hz, respectively). These variations in the recording or replay speed create equivalent variations of pitch, producing a wobbly or unstable sound.
Analogue tape and cassette machines, and vinyl record players, all exhibit some level of wow and flutter: they are intrinsic artifacts of the technology involved. Common causes include vinyl records with the mounting hole stamped slightly off-centre, and tape machines with slightly eccentric capstans or pinch rollers, or varying tape tension. For completeness, I should also mention another form of flutter called 'scrape flutter' which is peculiar to tape machines and not normally included in traditional wow and flutter measurements. Scrape flutter is caused by the moving tape vibrating at a high frequency (usually above 100Hz) as it is dragged across the tape heads and guides. This very high vibration rate creates an intermodulation effect with the wanted audio, which produces a kind of subtle but coarse anharmonic distortion of the wanted sound, rather than an obvious pitch variation.
To provide some ballpark figures, the wow and flutter measurements included in equipment technical specifications ranged from something like 0.03 percent for the best professional tape machines and turntables, up to around 0.3 percent or so for budget consumer equipment. Semi-professional equipment would usually measure around 0.07 percent.
Interestingly, the audibility of a given level of wow and flutter depends enormously on both the type of audio material and the listening environment. Wow and flutter artifacts are most audible on material with sustained tones, like solo organ or slow piano music, whereas they are very hard to detect in strongly rhythmic music. Moreover, wow and flutter can be quite hard to recognise in an acoustic space with a very short reverberation time, but becomes glaringly obvious in a more reverberant space, because sustain created by the reverberation provides a pitch reference against which any cyclic pitch variations can be easily detected!
Although wow and flutter are inescapable in all mechanical systems, they can be minimised by very accurate machining of all rotating elements, and by using expensive bearings and massively heavy flywheels to damp out any drive speed variations. However, they can never be eradicated completely, and the law of diminishing returns applies: professional tape machines and turntables were very large, very heavy, and hugely expensive by today's standards, yet their wow and flutter performance was only an order of magnitude better than budget consumer equipment.
Digital systems potentially suffer from wow and flutter too, since variations in the sample rate will create corresponding variations in replay or record speed and the consequent pitch changes. Thankfully, though, the very high-frequency crystal-based clocks employed even in the cheapest digital system are phenomenally stable in comparison with any mechanical system and, to all intents and purposes, the pitch stability of CDs and computer DAWs is perfect. Instead, the small timing variations that do exist in digital clocks create an artifact known as 'jitter' which produces low-level noise or anharmonic distortions… but that's a topic for a different article!
The point is that the virtually perfect speed stability of the digital environment, combined with the ability to perform very complex DSP frequency-domain analysis of audio signals, means that it is now possible and practical to both detect wow and flutter within an archive recording, and reduce or remove it — which is where Celemony's Capstan software comes in.
Celemony are well known as the company that brought us the Melodyne pitch-manipulation software, numerous versions of which have been reviewed in Sound On Sound over the last decade (November 2001, January 2004, April 2006 and December 2009). Capstan is based on the technology underlying the polyphonic version of Melodyne, called Direct Note Access (DNA), but has been heavily optimised specifically to detect and reduce periodic and chaotic speed-variation artifacts, whether caused by wow and flutter in tape machines or off-centre vinyl discs, or one-off phenomena such as tape catching on the reel, a sticky edit passing through, or a power-supply glitch.
The challenging aspect of this kind of process lies in detecting the pitch variations in the first place. Other solutions have involved tracking the bias frequency within a tape recording, or measuring the mechanical eccentricity of a record, but these are obviously format-specific. Capstan's approach, by contrast, is based on analysing the musical content of the recording, and so is inherently agnostic with respect to the source format. Analysing the music over a relatively long time scale means that the assumption can be made that most sustained musical tones are probably supposed to have a constant pitch, and so the average pitch can be determined. Deviations from this average are probably caused by mechanical recording artifacts, and the speed deviation information can be used to vary the digital sample rate (ie. replay speed) in an inverse and complementary manner, thus correcting them. This solution also doesn't introduce any other artifacts, such as pitch-shifting or time-stretching glitches, because it just uses basic varispeed to restore the required stable pitch. The varispeed process results in a varying sample rate, but that can be dealt with easily by employing a standard sample-rate conversion algorithm to generate a fixed-sample-rate output.
Achieving all this requires some very serious number-crunching, so Capstan is available only as a stand-alone application, and only runs on 64-bit operating systems — Windows 7 64-bit or Mac OS X 10.5.8 or later — with at least 4GB of RAM installed. An iLok dongle is required to store the authorisation code, and installation involves the usual faff of registering on the Celemony web site with a serial number to generate an iLok code. You then have to go to the iLok web site to download the code to your iLok dongle before, finally, you can run the program, which was previously installed from a CD-ROM or downloaded from the registered-user area of the Celemony web site.
Capstan can process mono or stereo audio in all the common uncompressed audio file formats (WAV, AIFF, SND, AUD and RIFF), with sample rates up to 192kHz and word lengths between 16- and 32-bit, the latter in either fixed or floating-point versions. A version 1.1 update introduced a workaround which allows its processing to be applied to multi-channel or multitrack recordings, by analysing a representative mix of the source channels to generate the appropriate correction data, which can then be applied to each of the original files (assuming they have the same sample rate, word length and duration as the analysed track). It's a little laborious, but it certainly works!
As it happens, I spent a lot of time last Christmas transferring dozens of quarter-inch tapes bequeathed to me and containing a wealth of jazz and classical recordings, mostly dating back to the early 1960s. The majority were actually quite good-quality historic recordings derived either from original master tapes or records played on a transcription turntable, but a few were of substantially lower quality. In particular, a noisy recording of Chopin piano works exhibited some detracting wow and flutter artifacts, and I had been mulling over how best to restore this material. The Capstan software couldn't have arrived at a better time!
When you instruct Capstan to open an audio file, it immediately starts its analysis, and this can take quite a while: for example, it took about four minutes to process a 10-minute mono piano recording using a 2.5GHz Intel i5 laptop. Once the track has been analysed, it is displayed on the screen, with the audio waveform in the top half, along with a pair of blue lines that I'll explain in a moment. The lower section is the speed analysis window, which displays sustained tones at semitone intervals. The detected speed variation is shown as a wavy blue line, with upward curves indicating speed increases and downward ones speed decreases. A thin red line indicates the average pitch reference. The vertical magnification can be altered with buttons at the bottom right-hand corner, if required.
A set of simple transport buttons at the top of the window aids navigation of the audio waveform, along with a timeline that indicates the magnification level and identifies the cursor position. Hotkeys can be used to control the transport, audition mode and display functions if preferred, with Avid-standard key assignments as the default. Location markers can also be added, complete with comments and notes to aid navigation of a project. The waveform can be zoomed to any magnification, and any portion displayed. A section of audio can also be highlighted by dragging across it for repeat auditioning and process editing, if required. A pair of larger buttons at the top of the program window toggles playback between the original audio and the speed-corrected version. Various configuration options determine the audition pre-roll time, fast playback dim level, audio interface setup, keyboard shortcut assignments, and so on.
The upper of those two blue lines in the top waveform window represents the 'intensity curve', while the lower one, which is a little thicker and has a white core line, is the 'smoothing curve'. Clicking on either line introduces a node which allows the line to be raised or lowered on one side relative to the other — just like editing fader automation data on most DAWs — while double-clicking a node removes it. Dragging the intensity curve upwards or downwards allows the amount of speed-variation correction to be adjusted. I found that Capstan seemed to err on the side of too little correction, and dragging the intensity curve line upwards often helped to create a perfect fix.
The smoothing curve works in the same way: dragging it upwards smoothes out the most rapid speed-curve variations, and again, I found that a little smoothing often seemed to improve the correction, giving a more stable result, as well as making the underlying wow and flutter trends more visible. The more musically complex the material, the better, as far as Capstan is concerned, since this gives it much more pitch information to analyse. Very simple things such as sparse solo piano works are clearly more challenging for it, and the speed-variation curve can become quite wild during periods of silence! Occasionally, the vibrato of a very dominant source can be mistaken for speed variation, but such analytical errors are easily redressed by increasing the smoothing and/or reducing the intensity over the affected section of material.
Where there are really big speed variations of more than a semitone — the kind of thing that might be caused by a tape snagging or a power glitch when recording — the Capstan auto-correction system jumps to the next nearest semitone, producing a kind of Cher-style auto-tuned effect. This situation is easily corrected by dragging the speed-correction curve to the correct pitch level. In extreme situations, you can even draw in the required speed-correction curve by hand! Usefully, if you highlight a section of audio, dragging the intensity or smoothing curves inside the highlighted section automatically places edit nodes at both ends, so only the highlighted section is altered. This makes fine-tuning the Capstan correction curves very fast and straightforward. (By the way, Capstan is also able to detect and correct the kind of gradual speeding up or slowing down that can be caused by slow battery failure, or varying tape tension.)
Since the normal correction process calculates the average pitch tuning of the source material, it conforms the processed track to this tuning and, as a result, the duration of the processed file exactly matches that of the source file. However, if you know the original tuning of the source file, this can be entered into the Inspector window, whereupon the entire file will be replayed at an average speed that delivers this desired tuning, albeit with a corresponding change in the total duration. This is very useful where the record machine ran at a different speed from the replay machine! Conversely, if the file absolutely must run to a specific duration — perhaps because it is associated with video or film — the required running time can also be entered in the Inspector window. Capstan will then adjust the average replay speed to achieve the set duration, but with a corresponding change in pitch. As a practical example, the Capstan Inspector window informed me that the Chopin piano material I was working on had an average pitch of 441.27Hz, and restoring perfect tuning was as easy as clicking the assumed pitch of 440Hz!
The speed processing information for an audio file is saved as a separate '.capstan' file, and this is loaded along with the audio file of the same core name in the same file location to effect the required speed correction. However, as I mentioned earlier, an option introduced in Capstan v1.1.0 added a new file menu option called 'Apply speed curve to other files', and clicking on this opens a directory window to select the relevant files so that multitrack or multi-channel material can be corrected.
Overall, I found that Capstan was a delight to work with. Once it's up and running, the software is brilliantly simple and intuitive to use, and I found it astonishingly effective. The automatic Capstan correction decisions are very good, but in most cases a little fine-tuning will optimise the results, especially where there are silences between tracks, or where there are exposed solo instruments. This fine-tuning is fast and easy to do, and being able to compare the source and processed signals so easily makes it obvious when the processing is optimised.
There's no denying that Capstan looks expensive, but to my mind its price is fully justified by its complexity, ingenuity and sheer effectiveness, especially in the context of the scale of the relatively specialist audio restoration market. For restoration specialists, the cost can be amortised over many years of project work, making it a cost-effective solution. For those of us with a more casual interest in restoring an occasional archive recording, the outright cost is prohibitive, but, thankfully, Celemony have come up with a very attractive compromise: Capstan can be 'rented' for a five-day period (with a time-limited iLok licence) rather than bought outright, making it a very affordable and practical solution for one-off projects. I wish other software manufacturers would adopt the same idea!
There is also a downloadable demo; this won't allow processed material to be exported or saved, and playback stops after seven seconds, but it is enough to test Capstan's functionality and effectiveness. The Celemony web site also includes several very helpful tutorials and demonstration audio files. I was very pleased indeed with the improvement Capstan brought to my Chopin piano material, and the five-day rental option makes Capstan a realistically affordable solution for everyone with a serious audio restoration requirement. It is a very impressive tool and well worth exploring!
CEDAR Audio's Respeed algorithm, which is part of the CEDAR Cambridge suite of audio-restoration tools, provides very similar functionality and costs a similar amount, although it requires the Cambridge hardware platform to function, adding further outlay.
For copyright reasons, I can't make available the Chopin recording I used to test Capstan, but the program comes with several example files which do an excellent job of demonstrating its capabilities. To hear the 'before' and 'after' versions, surf to /sos/aug12/articles/capstanmedia.htm. Celemony also have a very good video on their web site showing Capstan in action: www.celemony.com/cms/index.php?id=capstan