You are here

Extract Vocals From A Stereo Mix

Tips & Techniques
Published June 2016
By John Walden

Extract Vocals From A Stereo Mix

Have you ever wondered how to separate a vocal or the instrumental backing from a stereo mix? We find out what’s possible — and what isn’t.

Whether it be to sample parts for a remix, mashup or new composition, or simply to create a rough karaoke-style backing track, people often ask how they can remove or isolate vocals from stereo audio files such as you’d find on a CD. Most SOS readers understand that doing this anywhere near perfectly is impossible, but sometimes it’s possible to separate things to a useful extent. In these pages, I’ll explain the tools and techniques you can use to do this as well as is currently possible. Your chances of success depend on several factors, including: the nature of the source, what files and versions of the track are available; how much money you’re prepared to invest, how much time it’s worth spending, and what quality of result is acceptable. I’ll start with the simplest approaches, using tools that are free/bundled with your DAW, before demonstrating the worth of increasingly sophisticated and costly software. You’ll find audio files to accompany the various examples on the SOS web site (see box).

Centre Parting

The first approach gives us separate control over any elements panned to the centre — that’s where the main lead vocal sound typically sits, usually alongside elements such as the bass, kick, snare and hi-hat — and those paned off-centre. The aim is to remove the vocal, leaving behind a usable backing track. Underpinning it is the principle of phase-cancellation, as used by lots of free ‘vocal-removal’ plug-ins. You can do this entirely manually, by inverting the polarity of the left or right channel of your stereo file, but a more elegant approach is to convert the left-right (L-R) stereo file of your ‘source’ mix into Mid-Sides (M-S) stereo using a plug-in such as the freeware Brainworx bx_solo (www.brainworx-music.de/en/plug-ins/bx_solo). I’ll use that here.

If your DAW doesn’t include a Mid-Sides processor plug-in, investigate Brainworx’s freebie bx_solo. One instance is used on the stereo ‘source’ track, and another on a duplicate.If your DAW doesn’t include a Mid-Sides processor plug-in, investigate Brainworx’s freebie bx_solo. One instance is used on the stereo ‘source’ track, and another on a duplicate.

Import your stereo ‘source’ mix to a new track in your DAW, duplicate the track, and insert an instance of bx_solo on both the original and the duplicate. For one track, engage bx_solo’s ‘M solo’ (Mid solo) switch, so you only hear elements in the centre, and label that track ‘Mid’ for easy reference. For the other track, activate ‘S solo’ (Sides solo) and call it ‘Sides’.

Next, solo the Mid track and apply an EQ cut spanning the main vocal. The exact frequencies depend on the nature of the vocal, but if in doubt start with a band centred around 2kHz, then experiment with the amount of cut and the Q (bandwidth). In the example, I ended up with a severe cut (-24dB) centred on 1.6kHz and a broadish (1.7) Q, to ‘scoop’ the vocal bits out of the Mid track. I also applied low-end and top-end boost to augment the kick, bass and hi-hats, as they’d been attenuated a bit along with the vocal. This pretty much removed the vocal but left a chunk of the bass and kick intact.

With the M-S method, the ‘mid-only’ track may require pretty detailed EQ’ing to strike the optimum balance between reducing the vocal level and retaining the character of other elements in the centre of the mix.With the M-S method, the ‘mid-only’ track may require pretty detailed EQ’ing to strike the optimum balance between reducing the vocal level and retaining the character of other elements in the centre of the mix.

Having killed as much of the vocal in the Mid track as possible, play your Mid and Sides tracks together to hear your vocal free(ish) mix. The Sides track gives you a sense of the stereo image from the original track, while the EQ’d Mid track gives you bottom-end solidity. Unfortunately, other central sounds with higher frequency content (such as the snare or hi-hats) suffer as much as the vocal from this kind of EQ cut. To counter that, you can experiment with more complex EQ curves, with modest EQ boosts on the Sides track or, perhaps, a dynamic EQ (good if the snare is louder than the vocal) — the result will never be perfect, but you might improve it.

Using this approach for my ‘sparse’ and ‘full’ mixes (see the ‘Audio Examples’ box) yielded interesting, albeit imperfect, results. The EQ balance is certainly different from the original, but it does retain the essence of the stereo positioning in the original mix, and for a karaoke or rough instrumental version in a rush, it might be sufficient.

Going Solo

For the phase-cancellation approach to work, as well as finding that perfect vocal-free instrumental track in the first place, the full mix and vocal-free mix must be perfectly aligned.For the phase-cancellation approach to work, as well as finding that perfect vocal-free instrumental track in the first place, the full mix and vocal-free mix must be perfectly aligned.

If you attempt the M-S technique above, or listen to the examples, you’ll appreciate how dependent the results are on the source material. Even with ‘good’ material, a lot of effort is required to reach a usable result. Removing the backing to leave you with an isolated vocal is usually even harder, but that doesn’t mean you can’t try. There’s one very simple method (check the audio example; the result is easily good enough) but there’s a significant catch: you need access to both the full mix and an otherwise identical instrumental-only version. You align the two tracks and invert the polarity of one, leaving you with the difference or, in other words, the vocal. Whether such an instrumental mix is easily available depends on the track. Record labels sometimes release an ‘instrumental only’ mix (see the ‘Cheat If You Can!’ box). If one is available (and an a capella isn’t!) it’s going to be your best bet. If you can’t find such a mix, it might be possible to copy a loop from a part of the track with no vocal to cancel an identical loop during the vocal part. Pop-style and electronic tunes are good candidates; it’s the same principle, it just takes a little more work.

Here’s how to do the cancellation. Having imported both versions into your DAW and placed the files onto separate tracks, you need to line them up accurately, to the sample level. You may need to zoom right in on the waveform and find an obvious transient (eg. a drum hit) in both mixes that you can use to align things. Then, invert the polarity of one track (it doesn’t matter which). Material that’s identical in both tracks will be cancelled, so as long as things are perfectly aligned, only a vocal will remain. If things don’t cancel perfectly, the chances are that things aren’t quite aligned: go back and try again. If that still doesn’t work, the mixes probably aren’t quite identical (perhaps they were differently optimised, or just triggered slightly different reactions from any mix-bus processing). In that case, little can be done to improve things, but the results might still be decent.

If you can’t find a backing track, you might wonder if you can use an instrumental created using the M-S method described earlier. I’m afraid you can’t! If you’ve followed the discussion thus far, the key problem should be obvious: that version resulted from EQ’ing, and so isn’t identical to the instrumental bed in the mixed track; phase-cancellation leaves too much behind.

Tooling Up

Thus far, I’ve considered low-tech solutions that are easy to attempt with everyday tools. If you’re more serious about vocal extraction, you’ll inevitably start looking for more capable, specialist software. Unfortunately, while there are lots of plug-ins that claim to offer vocal removal, most are based on the principles already discussed, and we’re not exactly inundated with capable products dedicated to this task. That said, some sophisticated efforts have started to appear in recent years, including Prosoniq’s Isolate and Roland’s R-Mix (both discontinued, as far as I can tell). Again, the results tended to be very dependent upon the nature of the source material, but at least you were given more sophisticated options than polarity inversion and M-S EQ.

The only current software I’m aware of that’s both dedicated to the task and does it with any degree of aplomb is Audionamix’s Trax Pro. But a few other products out there can be usefully pressed into service — in the next section, I’ll look at what Melodyne, spectral editors and Trax Pro have to offer.

Melodyne Editor

Melodyne’s polyphonic algorithm applied to the first few bars of Adele’s ‘Rolling In The Deep’. Can you spot the note blobs for the vocal melody?Melodyne’s polyphonic algorithm applied to the first few bars of Adele’s ‘Rolling In The Deep’. Can you spot the note blobs for the vocal melody?

We’ve reviewed Melodyne a number of times, most recently v4 in SOS February 2016 (http://sosm.ag/melodyne4-review). Melodyne Editor allows you to adjust the pitch (and timing) of individual notes in a polyphonic audio file. You could, for example, take a recording of a piano or guitar that contains chords and, by pitch-shifting selected notes, transform the progression from major to minor. This is the kind of task Celemony had in mind, and the results can be mindblowing.

But if it can identify individual notes and adjust their amplitude, might it help you identify and separate out our vocal melody? Providing you can identify which ‘blobs’ are dominated by the vocal (multiple instruments might play the same note simultaneously), there’s no reason it can’t be attempted, and Celemony clearly recognise this, because it’s demonstrated on their site in a tutorial video based on Adele’s ‘Rolling In The Deep’. This song was probably selected not only because it’s well known, but because it was a good candidate: the instrumental backing is fairly simple, with a single strummed guitar playing muted chords that are quite low in pitch, while the lead vocal is sung in the octave above. As a result, when you see the note blobs identified by Melodyne, it’s fairly easy to identify, and then isolate/mute, the two sources. The result is still impressive, though, and the isolated vocal is more than adequate for, say, a dance remix, where any artifacts can easily be masked by other elements and effects.

I attempted the same trick with my two example tracks (the ‘sparse’ and ‘full’ mixes). I’ll offer a few observations on the results in a minute, but first, let’s run through how to do it, using the plug-in version of Melodyne Editor (as that works with all DAWs). Having inserted an instance of Melodyne on your track containing the stereo mix audio clip, perform the usual Melodyne ‘transfer’ process, so that the plug-in can analyse the audio. If all goes well, Melodyne should automatically choose the Polyphonic algorithm. Once the analysis is complete, you should see a collection of note blobs.

Next, spend a little time looking for blobs that are obviously associated with the lead vocal melody. In the screenshots, from my sparse-mix example, some of the blobs belonging to the vocal melody are obvious: they’re louder, so the blobs are bigger, and their fundamental frequencies are in the approximate range C4 to G4. To refine the identification of this range further, click on some blobs using Melodyne’s Main tool to audition them. You’ll see quickly which blobs represent the main vocal audio, even if some contain other audio elements too.

In my ‘sparse’ mix Melodyne example, the vocal melody is predominantly displayed in the larger note blobs, in the C4-G4 range, but less easy to spot are the harmonics present in blobs above this range.In my ‘sparse’ mix Melodyne example, the vocal melody is predominantly displayed in the larger note blobs, in the C4-G4 range, but less easy to spot are the harmonics present in blobs above this range.

Having done your best to identify the pitch range of the vocal, select all blobs outside that range and delete them. But tread cautiously. Removing note blobs below the fundamental pitch of your vocal usually poses no problem but the human voice generates harmonics above the fundamental frequency. You’ll almost certainly find elements of the vocal in some of the higher-pitched blobs; if you delete them you’ll lose valuable content, leaving the vocal sounding dull.

It’s a promising start, then, but not without problems. Having removed the obvious low- and high-pitch blobs, playback of what remains is likely to confirm the continued presence of non-vocal elements. Quite possibly, elements of the vocal will be missing too (perhaps starts and ends of words), or things will sound a little unnatural. It’s at this point that we stray from the familiar Melodyne workflow, and the whole thing starts to become more laborious!

In Melodyne’s Note Assignment mode, you can control the threshold that determines whether Melodyne makes a  potential blob active or inactive. Note that I’ve zoomed in here for clarity; there are plenty more potential note blobs to consider out of shot.In Melodyne’s Note Assignment mode, you can control the threshold that determines whether Melodyne makes a potential blob active or inactive. Note that I’ve zoomed in here for clarity; there are plenty more potential note blobs to consider out of shot.

When working in Polyphonic mode, if you select the Note Assignment mode (in v3 this had a +/- cursor icon but in v4 it’s a spanner), a number of additional note blobs will appear, including those you’ve just deleted (don’t worry, they have been deleted). There will also be lots of plain white blobs — these are notes the algorithm has extracted from the audio but set as ‘inactive’, because other blobs (the coloured ‘active’ ones) seemed to Melodyne more likely to accurately represent the audio.

You can alter the sensitivity of the algorithm in assigning notes as active or inactive: adjust the position of the ‘ball’ and the brackets around it, in the slider that appears beside the Note Assignment button when that button is active. If doing more conventional pitch/time correction, we might experiment with these controls to fine-tune the threshold between active and inactive notes, to get more or fewer active blobs. For vocal extraction, though, it’s easier simply to whack the ball to the right and make all the blobs active.

Hard Labour

Now comes the tedious bit. Switch back to the Main tool (the standard cursor button at the left end of the tool strip), loop playback, set to work identifying which blobs contain no vocal elements, and delete them. Start by removing all the newly activated low- and high-pitched blobs that are outside the likely range of the lead vocal. But for the rest of the notes, there’s no substitute for a painstaking manual process. If you’re unsure about a note’s contribution to the vocal, use the amplitude tool to reduce its volume instead of deleting it; you can come back to it easily if you need to revisit your decisions. Having removed all the non-vocal blobs you can identify, you’ll most likely still face a few issues: some parts of words (usually the starts and ends) may be missing or unclear; some of the remaining vocal blobs will contain traces of other sounds; and in some blobs the voice itself won’t sound natural.

Melodyne’s Note Separation options may allow you to rescue the starts or ends of particular words in your vocal, which are often missed by Melodyne due to their lack of pitch information.Melodyne’s Note Separation options may allow you to rescue the starts or ends of particular words in your vocal, which are often missed by Melodyne due to their lack of pitch information.

To address the ‘missing’ bits of words, get busy with Melodyne’s Note Separation options. In the Main tool mode, you’ll see that some (not all) notes have two different types of thin vertical markers at their starts or ends, or placed in the middle of a note blob. The two types are distinguished by the presence/absence of small triangles at top and bottom. If you hover the cursor over any of these markers, it will change automatically to the Note Separation tool and you can drag the marker left/right. (Do this with the plain vertical lines and you’ll time-stretch the audio on either side of the marker.) To rescue the lost bits at the starts or ends of words, we really need to be moving the markers with the triangles. Rather than stretch portions of the notes you can already see, this will expose any extra bits of the blobs that Melodyne ‘decided’ not to use.

In Main tool mode, many of the note blobs don’t display these note start/end markers — once you’ve identified a problem note, zoom in on it and switch back to Note Assignment mode, and start/end markers will be displayed for all notes. You can then hover the cursor over the marker and drag left/right to expose more of the note.

The final remaining issue, for which there’s no reliable solution in Melodyne, is that you’re still left with traces of other sounds mixed in with your vocal, or parts of the vocal that have been robbed of some frequencies. As certain resonances/frequencies within any instrumental bed are likely to match and/or mask similar resonances/frequencies within the lead vocal, some of the note blobs contain a blend of sounds that simply cannot be separated further. You can, of course, switch back to Note Assignment mode, and re-activate note blobs in any areas where the vocal quality loss is worst, but there’s no guarantee you’ll strike a better balance.

If this whole process sounds like it might be quite a lot of work, well... it is. But what we’re attempting is something that’s akin to uncooking a curry to leave you with the original ingredients! When you apply the approach to real-world examples, the best results will come from mixes that are quite sparse (and the vocal therefore prominent) in the first place; a busy, sophisticated mix where the vocal fights to be heard and is treated with various effects will prove much more challenging. My two audio examples demonstrate this very clearly. The vocal extracted from the ‘sparse’ mix is far from pristine but place it in an EDM backing track and process it to be ‘artifical sounding’ and you’d probably get away with it. The extracted vocal from the ‘full’ mix would be a much less useful starting point, whatever the genre; it might simply be a case of making the best of a bad job by sampling only the better vocal phrases from this ‘take’, rather than use the complete vocal.

Spectral Editing

Let’s see what we can do with a different set of tools: spectral editors. I’ve already mentioned that the human voice is made up not only of the fundamental note being sung, but also harmonics; while most of the energy might reside around the fundamental, there will also be significant energy at multiples of this base frequency. So, for example, if you sing a note at A4 (440Hz), there will be harmonics at 880Hz, 1320Hz, and so on. These harmonic relationships work pretty well for vowel sounds, but the pitch of consonants tends to be less well defined. (Hold on to that thought.)

Spectral editors allow you to see what’s going on at different frequencies all the way along the timeline. These usually take the form of a plot that displays frequency and time on the X and Y axes, and audio energy distribution in the form of colour. You look at this and use editing tools akin to those in conventional audio editors or graphics software (scissors, marquee tools and the like) to select and process your frequency/time selections. The technology was pionered by CEDAR, but there are now several spectral-editing tools. iZotope’s RX5 is popular, and you’ll find similar functionality in some general-purpose audio editors, including Steinberg’s WaveLab. When preparing this article, I’d recently acquired Sony’s SpectraLayers 3, so I’ve used that here. Again, I’ve used the ‘sparse’ and ‘full’ mixes of ‘So Easy’ by Cristina Vane.

A section of my ‘sparse’ mix in SpectraLayers. The vocal melody and its various harmonics are clearly visible in the spectral display.A section of my ‘sparse’ mix in SpectraLayers. The vocal melody and its various harmonics are clearly visible in the spectral display.The ‘full’ mix in SpectraLayers: the vocal harmonics are still visible, but there’s a  lot of other data in the display created by the fuller instrumentation, which potentially masks the vocals.The ‘full’ mix in SpectraLayers: the vocal harmonics are still visible, but there’s a lot of other data in the display created by the fuller instrumentation, which potentially masks the vocals.

The two screenshots show a short section from both mixes once the audio has been imported into SpectraLayers. The display shows both a standard waveform and the spectral waveform display. The greater the energy, the more intense the colour. In the ‘sparse’ mix, with just a simple guitar/keyboards backing, the strong spectral energy of the vocal and its melodic phrasing can easily be distinguished. It’s centred around 300Hz (the fundamental frequency range). You can also see the same melody picked out in its various harmonics, with similar pitch variations but centred at 600, 900 and 1200 Hz, and so on. In fact, if you zoom in and adjust the intensity of the colour display, you can see 15-plus harmonics, going up into the 4-5 kHz range, getting progressively less intense. As you might expect, the ‘full’ mix spectral waveform is a lot busier: the same vocal melody is in there somewhere, along with all its harmonics, but it’s much harder to discern. Even harder is trying to identify those non-harmonic, ‘noise’ components of the vocal sound. This obviously makes it trickier to isolate the vocal.

Here, SpectraLayers’ Harmonics Selection tool has been used to select the harmonics of the vocal in the first two vocal phrases (they appear highlighted) but not yet in the third. The user can control just how many harmonics are searched for using the settings just above the spectral display.Here, SpectraLayers’ Harmonics Selection tool has been used to select the harmonics of the vocal in the first two vocal phrases (they appear highlighted) but not yet in the third. The user can control just how many harmonics are searched for using the settings just above the spectral display.

While SpectraLayers has a range of editing tools, only a small number are useful for this task. One is particularly handy: the Harmonics Selection tool. This allows you (using the mouse) to trace a clear melodic line such as that provided by our lead vocal in the sparse mix. As if by magic, it will detect and select all the harmonics of that line within the spectral display. For the sparse mix, simply tracing over the fundamental frequency of the melody, which was mostly easily visible, allowed me to select what turned out to be the bulk of the vocal.Each time you process a  separation, another pair of tracks appears in your project, one containing the separated vocal, the other the instrumental backing track.Each time you process a separation, another pair of tracks appears in your project, one containing the separated vocal, the other the instrumental backing track.

A number of configurable settings allow you to fine-tune this tool’s performance. First, it can operate in one of three modes: Replace Selection, Add To Selection and Subtract From Selection. Three buttons at the left end of the toolbar allow you to switch modes when the Harmonics Selection tool is in use. By switching between the latter two, you can gradually move through your audio file, refining your selection while you attempt to grab the vocal. Once you’ve made a selection, SpectraLayers will, by default, solo the selected material on playback, so it’s easy to check your progress.

If the vocal is clearly visible, as in the ‘sparse’ mix, this first pass can be done pretty quickly. The best approach, though, is to select more than you need. Better to select all the vocal, so it sounds full and there aren’t any frequencies obviously missing, than too little, even if you end up selecting non-vocal elements too; you can tidy things up later. Where the melodic structure of the vocal is much less obvious, as with the ‘full’ mix, it’s trickier. This proved particularly true in the area around the fundamental, at approximately 300Hz. But in the area from the first harmonic (600Hz) upwards, the melodic structure appears more clearly. Usefully, in the toolbar, you can tell the Harmonics Selection tool which harmonic you’re going to trace. If you set this to ‘3’, for instance, then trace your selection, the tool will find the two harmonic layers beneath your trace, the lowest being the fundamental. It will still reach upwards, to select the higher harmonics too. This won’t guarantee a clean selection in a more complex (frequency-rich) mix but it certainly helps you make an initial selection.

Having made your selection, go to the Edit menu and choose Cut, then Paste To New. This will separate your selected material from the full mix and place it on a new ‘layer’. If you spent sufficient time refining your selection, it should contain the required vocal. A panel to the right of the main waveform/spectral display shows all the current layers in the project, rather like Photoshop. You can highlight which layer you want to work with, and can mute/solo layers. So, on playback, you can hear more than one layer at a time, and as the layers are automatically colour-coded you can establish which layer is contributing at which frequencies in the main display.

Your different layers can be superimposed on each other in the SpectraLayers display. Here, the main isolated vocal layer (blue) is superimposed on the vocal-free backing track (purple).Your different layers can be superimposed on each other in the SpectraLayers display. Here, the main isolated vocal layer (blue) is superimposed on the vocal-free backing track (purple).

Having selected and copied your vocal to a new layer, you should have two layers; one containing a (mostly) vocal-free instrumental backing track, and a second containing a (mostly) isolated vocal. Unless you’ve been very lucky, the vocal will contain scraps of the instrumental backing and vice versa. But you can continue to refine and improve your result for as long as you can justify spending effort on it. The next step is to edit your isolated vocal layer to remove as many non-vocal elements as SpectraLayers’ toolset allows. These can either be erased or, if a vocal-free backing track is also of use to you, selected and copied to another layer. You then need to do a second pass of the backing track, selecting any remaining vocal elements and to create a further layer containing these. The results should then be four layers; two containing different elements of the backing track and two containing different elements of the isolated vocal. Mute the backing track layers and see just how isolated that isolated vocal actually is. Your OCD-afflicted inner perfectionist will dictate how many further cycles of editing are required!

As with the other approaches in this workshop, I’ve provided audio examples for both my ‘sparse’ and ‘full’ mix tracks. I did my first-pass extraction as described above. Then, in both cases, I spent quite some time tidying up. The results might not be perfect, but you should be able to hear that this approach holds more promise than anything else we’ve considered thus far. With time and effort, you can achieve something that’s going to be usable in the right musical context.

Audionamix ADX Trax Pro

ADX Trax Pro in action. The software can perform an automatic separation of your vocal from the backing track (the results are shown in the two waveform displays at the top of the screen), but you can modify the pitch curve it generates (shown in blue/green in the main Spectral view) manually if required to try and improve the separation (my edits are shown in red).ADX Trax Pro in action. The software can perform an automatic separation of your vocal from the backing track (the results are shown in the two waveform displays at the top of the screen), but you can modify the pitch curve it generates (shown in blue/green in the main Spectral view) manually if required to try and improve the separation (my edits are shown in red).

As far as I know, only one current spectral-editing tool is designed specifically for this task and nothing else: Audionamix’s ADX Trax Pro. The company also offer a lighter version, ADX Trax, and a VST/AU plug-in and iOS app based on similar technology. Only ADX Trax and ADX Trax Pro allow you to attempt a full vocal isolation; the others provide voice changing effects and vocal gain change, rather than full separation. A permanent ADX Trax Pro licence has a professional price tag ($499) but it’s also available on subscription, with a minimum term of one month, so it needn’t break the bank.

The typical workflow for vocal separation in ADX Trax Pro comprises three main steps: an automatic ‘first pass’ separation; use of some ‘cleaning’ tools, to improve the separation; and, if required, some manual spectral editing. The three stages are reflected in the tabbed screens that you can move between using the buttons at the top.

When you load your source audio file, ADX Trax Pro will attempt a first-pass separation automatically. It’s quite an interesting process; it involves the software identifying the vocal and creating a pitch curve for it — rather like Auto-Tune does when faced with a monophonic vocal track, though this is a much more complex task. It uses the pitch curve as a basis for deconstructing the track into two audio files, one for the vocal and another for the backing track.

This deconstruction can take some time, and not only because of the number-crunching: the full audio mix is encoded and sent over the Internet to Audionamix’s server system, where the separation processing is performed. The results are returned to your computer and then de-coded to give you your two audio tracks. (Obviously, an Internet connection is necessary).

There are a few tick boxes to explore for this automatic separation process, but when it comes down to it, you just let Trax Pro get on with it. Depending on your Internet speed, processing times take from a couple of minutes upwards. For my example files (each is about a minute in length), this took about three minutes, but I also tried a few full-length (about four-minute) mixes and these took rather longer. The Separate screen then shows the ‘music-only’ and ‘vocal-only’ waveforms at the top of the display, while the bulk of the screen shows the spectral view of the original file, with Trax Pro’s pitch curve superimposed upon it.

Refinements

If your source mix is fairly simple, or the vocal quite high in level relative to the backing material, these automatic results can be pretty impressive without any further user input. That wait for the processing to complete doesn’t seem quite so long when this happens! But if the mix is a busy one, or the original vocal is ‘buried’ in the track, ADX Trax Pro faces a stiffer challenge, and you’ll need to spend time on the second and third stages.

The first option is to edit the pitch curve manually. There are tools for this in the Separate screen, including a freehand draw tool, an eraser and a Pitch Magnet, akin to Photoshop’s Magnetic Lasso. The last is really useful: it attempts to identify and help you ‘find’ the pitch curve as you drag the mouse over the spectral display. Usefully, any edits you make to the pitch curve are shown in a different colour from the default. Once you’re content with your tweaks, the large Separate button will process the separation based on this refined curve. Another, hopefully better-separated, pair of tracks will then appear.

For some source material, the automatic process can perform a  decent first-pass separation. While ADX Trax Pro cannot guarantee a  perfect end result this way, it certainly reduces the time it takes to reach a point at which further editing will yield only marginal improvements in the separation.For some source material, the automatic process can perform a decent first-pass separation. While ADX Trax Pro cannot guarantee a perfect end result this way, it certainly reduces the time it takes to reach a point at which further editing will yield only marginal improvements in the separation.The ADX Trax Pro Process screen includes a  number of options you can explore to try and improve the first-pass vocal separation.The ADX Trax Pro Process screen includes a number of options you can explore to try and improve the first-pass vocal separation.

The Process screen offers further control, via four key options. The Processing dialogue allows you to refine how consonants and reverb are handled, and includes a HQ (high-quality) option for the separation process; this takes longer but can produce better results, so may be best used when you’re happy with the rest of your tweaks. The two ‘post-processing’ tools allow you to apply additional filters to remove traces of drum tracks and to remove things such as high/low frequencies, ambience and noise. These are all worth exploring but the results are very source dependent; there can be occasions when you find yourself going backwards!

The final Process screen option is a comping track. Once you’ve attempted several versions of the separation, you can display up to four of them and make selections from each to build a composite, as you might in a DAW. Being able to do this in ADX Trax Pro is really useful, as different approaches may produce better results in different song sections.

The Process screen’s post-processing options can also be applied to fine-tune the separation in various ways.The Process screen’s post-processing options can also be applied to fine-tune the separation in various ways.

Once you’ve gone as far as the semi-automated processing allows, you can make further manual edits in the Spectral screen, an environment that’s not unlike SpectraLayers. This provides tools to adjust your view of the spectral data and select spectral data in various ways, including a Harmonic Selection option that’s similar in principle (if slightly different in operation) to that of SpectraLayers. The crucial difference is that ADX Trax Pro’s edits are completely non-destructive. The Spectral view allows you to toggle between the ‘vocal’ track and the ‘music’ track. (If you were to play back both tracks together, you’d hear the original mix unaltered.) What’s partiularly clever is that, as you edit the spectral data in the vocal or music track, that data is automatically returned to the other track. So, if you cut some unwanted drum noise from the vocal, it’s returned to the music track, and both tracks are that step closer to perfection. As all the original spectral data is preserved, you can move back and forth between the tracks in this way, making whatever edits you think might help, safe in the knowledge that nothing will be lost; you can undo or refine the edits as many times as you wish.

Once you’ve generated a  number of different vocal separations, you can create a  composite track based on the best sections from each separation attempt.Once you’ve generated a number of different vocal separations, you can create a composite track based on the best sections from each separation attempt.

As with SpectraLayers, really detailed spectral editing takes time, and there will be occasions where parts of an instrument and the vocal frequency occupy the same space in the spectral data; you simply cannot separate such elements completely. But the fact that ADX Trax Pro automatically moves edits between the vocal and music tracks at least allows a more streamlined workflow for this very specific task. Combine this with the fact the software also does a huge chunk of the initial separation work for you and you should see that you’re likely to arrive at the best possible end result, be that a satisfactory one or not, much more quickly.

Right On Trax

So, it scores highly on efficiency of process, but does ADX Trax Pro actually deliver usable results? My two examples proved interesting. As expected, the ‘sparse’ mix was an easier proposition than the ‘full’ one. While it would obviously depend on what use you had in mind, if it were to be used in a busy EDM-style remix, I suspect the first-pass separation would have got me about 95 percent of the way there without any input from me. It wasn’t ‘perfect’, but a modest amount of editing in the Spectral view cleaned up the most obvious gremlins. The only section that proved at all tricky was the final sung phrase where, in the original mix, the vocal level tails off as part of the performance. The keyboards clash with the vocal at this point, so some quite detailed editing was required. Even so, the quality of the end result for this section wasn’t quite as good.

The additional bass and drum elements in the ‘full’ mix posed greater challenges. The majority of the bass frequencies were dealt with by the automatic separation, and any additional tidying up was easily achieved can using the rectangular selection tool in the Spectral page. While the drum post-processing option in the Process screen will certainly reduce any drum bleed on your vocal track, it’s still possible to ‘see’ drum spectral data in the isolated vocal’s spectral waveform. Drum hits appear as areas of higher spectral energy in a vertical pattern (they’re high in volume over a broad frequency range but short in duration). Providing you have the patience, this is sort of thing can be edited in the Spectral screen.

The Trax Pro spectral editor (shown here with the Harmonic Selection tool in use) automatically pastes material you cut from one track back into the other associated track: trim some piano away from the isolated vocal, and it will be returned to the instrumental track and vice versa, so the totality of the spectral data for the complete mix is always preserved between the two tracks.The Trax Pro spectral editor (shown here with the Harmonic Selection tool in use) automatically pastes material you cut from one track back into the other associated track: trim some piano away from the isolated vocal, and it will be returned to the instrumental track and vice versa, so the totality of the spectral data for the complete mix is always preserved between the two tracks.

In terms of the audio examples provided to accompany this article, for the ‘full’ mix’s isolated vocal, what you’re hearing are the results of the full suite of ADX Trax Pro’s toolset and perhaps an hour spent ‘cleaning’ in the Spectral screen. That said, I’m a relatively inexperienced user of the software. With practice, you soon begin to ‘see’ the different elements of a mix in the spectral display, so with regular use you’d almost certainly be able to achieve better results than this more quickly. (The same goes for SpectraLayers and other spectral editors.) So you can see what’s possible, the guys at Audionamix (who obviously have much more experience) kindly had a go at my mixes, and I’ve shared their results along with mine.

While the nature of the source material will always dictate the possible quality of the end result, there are some examples on the ADX Trax Pro website that are, frankly, staggeringly good, including a suite of vocal isolations that Audionamix did themselves for a Barry Manilow project. They were commissioned to isolate vocals from classic songs by artists including Louis Armstrong, Dusty Springfield, Marilyn Monroe and Judy Garland for Manilow’s My Dream Duets album in 2014. Whether you’re a fan of Manilow’s music or not, the results are very impressive. (I didn’t ask how long that took them!)

For my ‘full’ mix example, while the harmonics of the vocal melody are clear to see in the automatic separation performed by ADX Trax Pro, elements of the drums are also clearly visible in the form of vertical lines. These would require additional editing via the Process or Spectral screens to further clean the isolated vocal.For my ‘full’ mix example, while the harmonics of the vocal melody are clear to see in the automatic separation performed by ADX Trax Pro, elements of the drums are also clearly visible in the form of vertical lines. These would require additional editing via the Process or Spectral screens to further clean the isolated vocal.

Incidentally, in exploring ADX Trax Pro, I applied the software not only to the Cristina Vane examples, but also to a number of contemporary commercial recordings. The process wasn’t always successful but I was surprised at just how far ADX Trax Pro can go. For example, when applied to Adele’s ‘Rolling In The Deep’ (as used by Celemony in their Melodyne video), the results could easily have been used in a dance/electro remix; there may have been a few artifacts, but nothing really obvious or distracting. In contrast, trying to isolate the vocal from the Foo Fighters’ ‘The Pretender’ proved far more challenging. Given the busier mix and the fact that the vocal in this track often fights against a wall of guitars and drums (which is part of what gives the track its sense of power), this is exactly what you’d expect. Regrettably, copyright restrictions prevent me from sharing the results of these experiments with you.

Mission Impossible?

As I stated at the outset, extracting the human voice from a stereo mix is not a trivial task, and the various audio examples are a sobering reminder of just how difficult it is to ‘unmix’ audio. The only results I might consider ‘perfect’ are those that employ phase-cancellation, and then only because the ‘sparse’ mix was deliberately created to illustrate the theory; you won’t find many commercial mixes where the mix engineer has been quite so accommodating! So you’ll never be able to, say, extract Adele’s vocal part and use it in any minimalist arrangement, be it acoustic or electronic; and you’ll not manage to get a ‘clean’ separation of a Dave Grohl vocal from a busy mix that could be used as the main element of a track. But if your needs are more experimental or recreational, you might not need not reach such heights of perfection.

If that’s simply not good enough for you, you’ll need to turn to the sort of tools I’ve looked at in the later parts of this article. I suspect many of you already have access to Melodyne, and perhaps some sort of spectral editor. So if you don’t plan on spending a wodge of cash, perhaps start your investigations there. But if you’re really serious about doing this as well as it can currently be done, you really must check out ADX Trax Pro: this is its sole reason for existence, and it offers an undeniably sophisticated, state-of-the-art toolset. That it can produce isolated vocals that can, in the right context, be repurposed in a fashion that leaves the casual listener blissfully unaware of what you’ve done, is remarkable. But, even with this state-of-the-art software, you’re still at the mercy of the original material; some mixes will make the task very difficult or, if you need ‘perfect’ results, impossible.

It’s not quite a ‘mission impossible’ but, if it is pristine isolated vocals you are after (rather than obviously synthetic vocals for an electronic mix), it is certainly a task to undertake with the right tools, patience and realistic expectations.

Cheat If You Can!

Type ‘[track], [artist], official a capella vocal’ or ‘[track], [artist], official instrumental version’ into a search engine and you may be able to avoid a lot of hassle: once you’ve sifted out the sound-alike karaoke sites, you’ll often find official a capella or instrumental versions of commercial tracks. You’ll also usually find ‘unofficial’ versions, the fruits of the labours of those who’ve trodden this path before you. And there are sites hosting a capella versions Hoovered up from around the Web: some house top-40-style material, while others (for example, www.acapellaheaven.com) offer indie artists’ vocals. There are catches, though: except for the legitimate versions, the quality of the recordings and performances is very variable; there may be a fee or requirement to watch adverts; and the terms of use may limit what you can do with the files.

A simple Internet search might save you unnecessary hard work, as there are lots of a capella versions out there already.A simple Internet search might save you unnecessary hard work, as there are lots of a capella versions out there already.

Audio Examples

You can find the audio examples on the SOS web site (http://sosm.ag/jun16media). All are based on an excerpt of Cristina Vane’s ‘So Easy’, which was the subject of Mike Senior’s SOS September 2013 Mix Rescue; thanks to Mike and Cristina for permitting its use here. You can find out more about Cristina’s music at www.facebook.com/cristinavanemusic and www.soundcloud.com/cristinavane.

Published June 2016