Mixing audio can be difficult, but unmixing is impossible — unless, that is, you have access to Audionamix’s innovative ADX technology.
Founded in 2008 and now based in Paris and Los Angeles, Audionamix have built a global reputation for their audio separation and isolation software. The company’s proprietary ADX technology and its expertise in isolating the speech, vocal and other monophonic melodic elements from a full mix has not only allowed engineers, producers and artists to craft new arrangements, but also has given television and film companies the ability to enhance older mono and stereo soundtracks both by adding new music and sound effects to old dialogue and by creating 5.1 surround mixes.
In early 2014, Audionamix released the Mac-only ADX Trax, the world’s first non-destructive, automated melodic audio source separation software. By retaining the integrity of the original source, ADX Trax allowed repeated refinements of the initial separation to achieve the best possible results, and in late 2014 Trax Pro appeared, bringing with it non-destructive spectral editing. June 2016 saw the release of Trax 3 and Trax Pro 3, both of which offered faster separation processing speeds, a consonant annotation tool and a pan-specific editing feature that enabled direct editing of audio content in user-defined areas of the stereo field. In 2017 a new, speech-optimised algorithm that automatically detects and separates speech from background elements — musical or otherwise — was added to both Trax 3 and Trax Pro 3 to create what Audionamix say is the first fully featured, automated speech separation software: ADX Trax 3 SP and ADX Trax Pro 3 SP.
The Trax Pro 3 SP user interface, like that of Trax Pro 3, has three screens named Separate, Process and Spectral. The Separate screen handles the separation of the target speech or melodic source from the background elements of a stereo or mono audio file. Extraction target options include Speech and Melodic Vocals, Speech Only, Melodic Vocals Only or General, which extracts all melodic content in the file. Initially, the Trax Pro 3 SP software analyses the audio file and encodes specific information from it. It then uploads the result to Audionamix’s cloud-based servers, where an automatic separation is performed. The separated files are then downloaded and decoded to produce a Vocal track (the extracted target) and a Music track (everything else).
The Separate screen displays that which Audionamix refer to as a Pitchogram — a frequency-restricted (60Hz-1.1kHz) spectral view optimised for displaying both speech and melodic vocal — which shows the material that has been automatically identified as the target content. A highlit blue line called the Pitch Guide runs across the screen, marking the fundamental frequencies of the melodic content that it has targeted for extraction. Although only the fundamental frequencies are marked, the software actually extracts all the harmonics of these frequencies that it is able to identify. In addition, if automatic marking of consonants was specified, the pitchogram displays vertical green lines marking those areas where the ADX software has detected what it believes to be consonants. Also at this point, the Process screen carries a waveform display of the automatic extraction that can be switched between Vocal and Music, whilst the Spectral screen displays a spectrogram that can be similarly switched between the two extractions.
Both the automatic Vocal and Music extractions can be auditioned on the Separate screen, and they can be soloed, muted and adjusted in level. For some uses, this first draft may be good enough, in which case you can simply proceed to the next step, which takes place in the Process screen. However, the ADX algorithms are not quite as discriminating as the human ear, and areas of the Pitch Guide may indicate erroneous identification of the target’s frequencies.
To improve the separations, Trax Pro 3 offers several ways of editing the automatically generated Pitch Guide. Perhaps the most useful of these is the Eraser tool, which allows you to remove areas of the Pitch Guide or consonant markers where there is no relevant content. The Pitch Guide can be drawn freehand using the Pencil tool, while the Pitch Magnet function can be used to identify and redraw the most likely line in a section of the Pitch Guide. The Consonants tool allows you to insert or move consonant markers where required. The Pitchogram can be zoomed in and out in the horizontal and vertical dimensions and a useful Guide Tone function not only allows you to hear the pitch of the line being drawn by the pencil tool, but also to check a Pitch in the track against a virtual, vertically oriented piano keyboard. In addition, if you’re trying to extract a melodic track, and you have or can create a MIDI file of the melody, you can load that as the Pitch Guide, provided that its length matches that of the file being processed.
Once the Pitch Guide has been edited, the result is uploaded to the Audionamix cloud servers and a ‘Refined’ automatic separation, based on the revised Pitch Guide, is returned. You can carry on refining this refined separation for as long as you need. A Marquee tool allows you to upload a portion of the Pitchogram, rather than the entire file, to the cloud for processing, speeding up upload, processing and download times.
Once you are satisfied with the revised separation, the action moves to the Process screen, where a new separation, based on the refined separation’s Pitch Guide, can be created. This process can be run in HQ (high quality), with or without boosted consonants, and can also include any short or long reverberation associated with the target. The process can also be run only on a user-defined section of the stereo field, which is useful if the target being extracted is panned off-centre. The resulting separation can then be further refined by the Post-Processing Drum Enhancement/Removal and Spatial Isolation tools. There are no parameter adjustments on the Drum tool, which reduces percussive elements in a separation when run on its Vocal tab and enhances them on its Music tab. The Spatial Isolation tool is just that: by adjusting Frequency, Ambience, Tonal, Noise and Pan range controls, you can isolate the spatial elements of the wanted sound, auditioning and A/B’ing your efforts in real time.
Running the Process and either of the Post-processing functions creates a new separation file, and you’ll probably find yourself building up a number of these. Since you can only display four files at any one time on the Process screen, you may find it advantageous to start comping the best bits into one master comp track, which is a simple procedure in Trax Pro 3 SP.
With some source files, you may have produced completely usable Vocal and Music separations by this stage. However, for more complex separations where, for example, voices are ‘hiding’ in music beds, or too much of one track has been left in the other, you’ll want to move to the Spectral editing screen. Spectral editing is a skilled and somewhat time-consuming operation, but Trax Pro 3 SP’s implementation makes it probably as easy as it can be to produce good results quite quickly.
The key lies in the non-destructive nature of all ADX Trax editing, which means that whatever is removed from, say, the Vocal track is transferred to the Music track and vice versa. This, coupled with an undo history that can be up to 100 changes long, and the ability to select, audition and edit across either frequency or time or through a smart (and scalable) harmonic identification and selection tool, gives you the freedom to experiment freely in moving unwanted content from one track to the other. An Invert function helps in identifying frequencies and areas in a track that might usefully be moved to the other track, and three additional spectral editing tools give you the ability to work on unwanted noises.
The first of these, the Tonal/Noise Filter, targets harmonic or noisy components for removal and could be used, for example, to remove overhanging harmonics from a guitar chord from the sound of a singer’s breath by targeting the area and changing the balance of the filter from tonal towards noise. Doing this would protect the breath sound at the expense of the tonality in the harmonics, and its reverse would protect the harmonics as opposed to the breath. The Smart Attenuation tool uses spectral information to the right and left of a selection to attenuate the selection, so that a finger squeak from an acoustic guitar could potentially be removed without affecting the sound of the track. The third tool, Denoise, can learn the spectral profile of the unwanted sound and then remove that profile from a selected area. For example, if you had bleed on a vocal track from a loud guitar or the singer’s headphones, the tool could be used to sample the bleed from an area of the track without vocals and then remove that bleed across the whole track.
Once you’re satisfied with what you’ve got, the last step on the Spectral screen is to render your track, at which point it appears in the Trax Pro 3 SP file list and can be accessed in the other two screens. The last remaining step is to Bounce your Vocal and Music tracks (and a mix and/or a STEMS-format file if that’s appropriate) out to disk via the Process screen’s ShuttleExport function.
Although Audionamix are currently emphasising the speech separation qualities of Trax Pro 3 SP, this aspect of its operation has simply been added to the program’s existing ability to separate melodic vocal and instrumental parts. With the proviso that the lead vocal or featured instrument is not buried too deeply in the accompaniment, the initial automatic separation of their melodic lines is extremely impressive, as is the software’s ability to keep the reverb with the melodic line when post-processing the refined separation after manually optimising the Pitch Guide.
Detailed spectral editing is a time-consuming process, but Trax Pro 3 SP’s non-destructive operation certainly saves time, if only because I didn’t have to worry about moving a little too much of a part of the frequency spectrum between the Vocal and Music tracks: I found it much quicker to return the excess rather than having to undo the original edit and try to refine the selection while ‘flying blind’.
The speech separation was a revelation. I began with a typical restoration project: a copy of the original recording of Martin Luther King’s ‘I have a dream’ speech, marred by a 60Hz hum. Trax Pro 3’s automatic separation produced a very usable Vocal track that required only minimal Pitch Guide editing to generate a very good refined separation. Post-processing using the Denoise tool and a modicum of spectral editing gave a result that sounded subjectively much better than simply removing the 60Hz hum component.
Never being one for an easy life, I next tried my luck on a track in which a deep male spoken voice was accompanied by a synthesized drone with prominent bass and low-mid elements. Although the automatic and refined separations managed (largely successfully) to extract the higher-frequency components without difficulty, in places where the pitches of the drone and the spoken voice were tightly intermeshed, it took a considerable amount of editing to get to where I felt relatively happy with the result. However, I hate to think how much longer it would have taken me without the ability to refine separations both continuously and non-destructively.
Working with Trax Pro 3 SP enabled me to create, with relative ease, high-quality separations and extractions that I felt were superior to those that I’ve managed to achieve with other software packages in the past. However, my efforts pale in comparison to the results that you’ll find in the Trax Pro 3 SP tutorials and the other videos on the Audionamix web site. I’d urge you to watch these to hear the quality of the results that practised and experienced editors can produce from both spoken and melodic sources.
If ‘unmixing’ of either musical or dialogue-based tracks is something you ever need to do, then you should undoubtedly audition ADX Trax Pro 3 SP; and although the perpetual licence is not cheap, the subscription options provide cost-effective ways both of evaluating the program and of accessing it as required on a project-by-project basis.
I don’t know of any programs with the same automatic and non-destructive capabilities as ADX Trax Pro 3 SP, but Magix’s SpectraLayers Pro 4 does enable you to manually separate speech, vocals and melodic lines from the other elements in a mix, and vice versa. For music-related requirements, the more affordable, non-SP version of Trax Pro 3 is the obvious alternative, with Celemony’s Melodyne offering some similar features.