Audionamix are aiming to make the process of vocal extraction not only better, but quicker.
Limits on the power of digital signal processing sometimes have less to do with the processing itself, and more with our ability to use it. If you expose every parameter in a sophisticated system for user control, then getting the best results requires a level of expertise that few users will ever attain. Alternatively, machine learning can internalise some of that expertise, but risks creating a system that is hard to control because not even the developers know what's really going on under the hood.
Source separation is one of those fields where this can be a real issue. The ultimate goal is a system that can take in a mixed music track and spit out two or more clean stems comprising vocals, drums, guitars and so on. At present, though, that goal is most closely approached through extensive human hand-holding in complex packages like Audionamix's ADX TRAX Pro 3 SP, which allows the user to painstakingly guide the computer through the process of identifying and separating vocal parts.
Back in March 2018, I reviewed Audionamix's first attempt to repackage their technology in an immediate, preset-based environment. XTRAX Stems provided simple and entirely automated separation of mixed audio tracks into vocal, drums and 'music' stems. The initial version was undeniably impressive, and eminently usable for subtly rebalancing a mix where the vocal level wasn't quite right; but, inevitably, the results of a couple of mouse-clicks didn't really compare with what you'd get from several hours' work in the full-fat TRAX Pro, and the stems were rarely usable in isolation. And whereas XTRAX Stems faced little competition at launch, it now has to contend with the excellent DeMIX Pro from AudioSourceRE, which can operate at the preset level and also as a more advanced, controllable package somewhat along the lines of TRAX Pro.
But Audionamix aren't resting on their laurels, and they have quickly moved to update XTRAX Stems to version 2. It's still a stand-alone program, but is now available for Windows 7 and Windows 10 as well as on Mac OS. The original featured two separation algorithms, each available in a standard and an 'HQ' variant. XTRAX Stems 2 drops the standard Generic and Automatic algorithms, leaving only the 'high‑quality' versions, but adds a third algorithm called Advanced. Most importantly, however, all three algorithms now offer a significantly greater degree of user control.
As before, the actual separation is handled by a remote server, so you need a fast-ish Internet connection to use XTRAX Stems 2. To initiate a separation, you simply drag and drop an audio file in one of the many supported formats. By default, XTRAX Stems will then immediately analyse it using whatever algorithm you've chosen as the default, though this behaviour can be altered. The actual separation process has been sped up significantly since version 1, but still takes a minute or two for a typical three-minute track.
With other source separation software I've tried, every new iteration of the separation settings requires another trip down Internet alley and an ensuing wait, meaning that the limiting factor on quality is often the patience of the user. XTRAX Stems 2 is different, because the stems it produces aren't static and immutable. Instead, they're accompanied by a three-way Separation Balance control that allows you to retrospectively alter the separation weighting.
This is easier to use than it is to describe, and from the user's perspective, takes the form of a triangle with a red ball inside it. Each corner of the triangle represents one of the three fundamental components of the separated track, and as before, these always combine perfectly to recreate the full mix, so long as you don't alter level or pan settings in the mixer. Dragging the red ball around within the triangle changes the relative weighting given to the three components: they still sum in the same way, but by moving the triangle, you might for instance be telling XTRAX Stems that some of the energy it assigned to the Drums component should be re-categorised as Music.
As a practical example, if you mute the vocal stem and place the ball as far as you can into the Vocal corner, you'll hear something that sounds pretty much like the original track, but with the vocal somewhat lower in level. Drag the red ball towards the middle and XTRAX Stems will identify more and more energy within the track as belonging to the vocal component, so what you hear with the vocal stem muted contains less and less vocals, but also more obvious processing artifacts. By the time you've dragged the ball to the opposite side of the triangle, the soloed vocal stem contains not only all the vocals but a fair chunk of backing track too.
The key, of course, is to find the sweet spot for your chosen end goal, be that stripping out an a cappella lead vocal for use in remixing, vocal removal for karaoke, or whatever. To this end, Audionamix have included a short list of presets in a drop-down menu that recall combinations of red ball position and track mute/solo settings. I didn't find these terribly useful, but the triangular control itself is so easy to use that this really doesn't matter; and the beauty of it is that you hear the results immediately. In fact, it's rather a shame there's no parameter automation, as this would make it easy to alter the separation balance to suit different song sections.
In use, the Advanced algorithm often does seem to deliver somewhat better results than the other two, but it's not always a radical improvement. However, the additional possibilities opened up by the Separation Balance control really do make a big difference to the potential of XTRAX Stems. This control doesn't allow you to address all the possible ways in which separations can fail; for instance, there's still no way of telling XTRAX Stems that it has misidentified a solo lead instrument as a vocal part, which sometimes happens. Nor does it elevate the best possible results that you can get with XTRAX Stems to the level that can be achieved using detailed manual control in DeMIX Pro or TRAX Pro 3 SP, but I suspect that's not really the goal.
In source separation, there will always be a trade-off whereby a greater degree of separation comes at a higher cost in terms of artifacts. What the Separation Balance control offers is a very effective way of optimising this trade-off for your chosen use case, which doesn't require endless re-runs of the separation process. So, for example, XTRAX Stems is never able to cleanly remove every last vestige of a lead vocal from a mixed track and leave behind a pristine karaoke backing track; but by judicious use of the Separation Balance control and the faders, you may well be able to create a clean-sounding backing track where the vocal is reduced in level enough to allow another vocal that matches its timing to be overdubbed over the top. As long as you keep your expectations within the bounds of the realistic, the Separation Balance control allows you to get usable results from XTRAX Stems 2 in many contexts where version 1 would have struggled. And although it remains a basically preset approach, there will be quite a few situations where the speed with which you can get these results outweighs the refinement offered by more labour-intensive alternatives.
- Remains very easy to use.
- Advanced algorithm is a worthwhile improvement over the other options.
- New Separation Balance control allows the separation to be optimised in real time, with no need to repeat the analysis process.
- Remains an essentially preset-based program, so sooner or later you'll run up against its limitations!
XTRAX Stems 2 makes preset-based source separation significantly more versatile without compromising its speed or ease of use.