Cubase Pro 10's Audio Alignment Panel can do more than just deliver tight backing vocals!
The new Audio Alignment Panel (AAP) was one of Cubase 10's most eye-catching new features — it allows you to align the timing of two (or more) audio clips almost instantly, by automating its AudioWarp and Hitpoints/Slices facilities. What was previously a time-consuming manual chore now takes just a few clicks. It holds obvious appeal for working double-tracked lead vocals and stacked backing vocals, but it can also be useful in some other scenarios, as I'll show — but let's start with the obvious...
On The Double
Screen 1 (red waveforms), above, shows a section of a double-tracked lead vocal. The waveform (and audio examples below) show that the singer actually did a pretty good job, but there was scope for improvement — and the AAP allowed us to tighten things nicely.
Open the AAP (from the Project window toolbar or via the Audio menu), select the audio clip that will act as your Reference (timing master) in the Project window, and click the '+' button for the Reference entry in the AAP. The track's name should appear. Repeat this for the Target clip (the audio whose timing you wish to adjust to match the Reference). In this example, that's our second lead vocal take.
For double-tracked vocals, the Match Words option is best; the AAP will use AudioWarp to stretch/compress the Target to more closely match its timing to that of the Reference. You can also set the Alignment Precision, which is similar in concept to quantising (higher values produce a more precise timing match). I chose 100 percent, which could be extreme for some styles, but if you take things too far you can simply undo the processing once applied. Experimenting with the Alignment Precision in this way makes it super-easy to find the setting that sounds the most musically appealing. And that's it. It takes just a few clicks and a few seconds. Typically, no further manual tweaking will be required, but you can move the AudioWarp markers by hand as normal if you feel the need to tweak any little details.
Well Stacked
The same principles apply when dealing with more parts, as with stacked backing vocals. Screen 2 (blue) shows five vocal takes and, as before, I've identified my Reference take. But this time, I've selected and added multiple (four) Target clips. If you do this and then click Align Audio, all targets will be time-corrected relative to the Reference clip in a single operation. Again, experiment with the Alignment Precision — in this case, I opted for 80 percent, which struck a nice balance between tightness and a natural sound.
In my experience, with a reasonable vocal performance as a starting point, the AAP produces a very good result nine times out of 10. If you find that you do need to go in and manually tweak an AudioWarp marker or two, don't get frustrated: just try and remember just how much time you've saved overall!
Read My Lips
Another common application for the AAP is Automatic Dialogue Replacement (ADR), a common practice in film/TV post-production. The idea is that less-than-perfect on-set dialogue is overdubbed and replaced in a studio, but with timing correction applied so that lip-sync with the picture is maintained. There are specialist software tools for this task (notably Synchro Arts' Revoice Pro 4, reviewed in SOS April 2019), but if you can't justify the additional outlay, is the Audio Alignment Panel worth exploring?
Screen 3 (purple) shows a typical case, with a section of on-set spoken dialogue (the top-most track, recorded with a boom mic and with background noise) and a version of the same dialogue recorded in more ideal studio conditions, to be used as a replacement (the middle track; no correction yet applied). As indicated by their respective waveforms, the replacement dialogue is not a disaster, but still there's scope to tighten the timing.
In the AAP, the on-set dialogue acts as our Reference and the studio version our Target. I experimented with different Alignment Precision settings and also with engaging the Prefer Time Shifting setting. The latter is well worth exploring if your voiceover artist is particularly good to start with, as it does less time stretching/compression and more slicing and moving — so, in theory, it ought to sound cleaner. In this case, despite my less-than-average voiceover skills, the default AAP settings did the best job.
One other option worth trying is to split the Target audio into separate clips for each distinct phrase and then process these independently. While this takes more time — and in this case it didn't really seem to bring any great benefits — it could be a useful approach when faced with just a few stubborn phrases.
The bottom-most track shows the AAP-corrected dialogue. The shifts in the waveform transients, while modest, are easily seen and heard; on the whole, the result is a much tighter match to the timing of the original dialogue. Would it satisfy the more exacting demands of a sound editor on a Hollywood film? Perhaps not. But for most of us it's certainly a huge time–saver, even if we find ourselves making the occasional manual edit.
Guitar's The Star
Vocals aren't the only source that's commonly double-tracked. Guitars are another popular candidate; in fact, in some genres (metal springs to mind) they can often be quadruple-tracked. A really good player can pull this feat off without breaking sweat, but us mere mortals might need a little help to get the same super-tight result. In principle, it's no different from tightening vocals, but if the part is busy, or the sound overdriven, or if the player uses techniques such as palm muting or slides, it's not difficult to imagine Cubase struggling to determine where the AudioWarp markers should sit. So, is the AAP a viable option here?
In Screen 4 (green) there are again three waveforms. Top and middle are the original, unprocessed, double-tracked guitar parts (a grungy riff followed by a short chord sequence) and, as the marker positions indicate, the performances were not a million miles apart but the note/chord transient positions are consistently different. The bottom track shows the result of applying the AAP to the middle track, using the top one as a Reference. While some transients don't line up perfectly with the Reference, the match is certainly closer — and if you check out the audio examples, you'll hear that the riff section in particular feels much tighter.
Having tried the same approach with a range of other guitar parts, the results were generally pretty good. Yes, you can defeat the AAP with performance/sound combinations that are rhythmically very complex, or the sounds slathered in effects, both of which understandably make automatic identification and matching of transients more challenging. And if you usually DI your guitar parts, it's perhaps worth using the clean DI signal for alignment, before adding distortion or amp-sim effects. But given that the process takes only a few seconds, it's well worth seeing if the results are usable or not.
So, for vocals, this new facility is a huge time-saver that delivers great results. And even for non-vocal applications the Audio Alignment Panel can be well worth a try.
Audio Examples
Audio Example 01
Double-tracked vocal example. Two passes of the same double-tracked vocal performance are presented in this clip. In the first pass, the two tracks are unprocessed and panned left/right. In the second pass, the Audio Alignment panel settings shown in the screenshot of the main article have been applied. The result is a much tighter match between the two tracks.
Audio Example 02
Stacked backing vocals example. Two passes of the same multi-tracked backing vocal performance are presented in this clip. In the first pass, five takes are included, panned at various points across the stereo field (note that the performance contains two phrases with similar wording). In the second pass, one track has been used as a Reference and the other four as Targets within the Audio Alignment Panel. The settings used are shown in the screenshot within the main article. While there are still a couple of spots where you can hear timing differences (and these could be manually edited within the Sample Editor if required), the result is a much tighter match between the various tracks while still maintaining a natural sound.
Audio Example 03
Automatic dialogue replacement (ARD) example. Three passes of the same spoken phrase are presented. The first contains the 'on-set' audio with the voice and a certain amount of ambient background noise picked up from the location by the boom mic. The second pass contains both the same on-set dialogue and the replacement dialogue recorded in a studio situation (panned left/right) but without any other processing. There are some obvious timing differences between the two performances. In the third pass, the studio dialogue has been timing matched to the on-set dialogue by using the Audio Alignment Panel. This results in a much tighter timing match between the two performances. Further manual editing of the timing might be required in a mission critical work context, but the AAP has done much of the work. The studio dialogue could then be subjected to suitable processing (EQ/reverb, etc.) to 'place' it in the location of the on-screen action.
Audio Example 04
Double-tracked guitars example. Three passes are presented of the same double-tracked guitar performance with the guitars panned left/right. The first pass contains the two performances as recorded. In the second pass, the Audio Alignment Panel has been used with the settings shown in the article screenshot. This produces a tighter timing between the two performances, and this is perhaps more noticeable in the riff section within the first part of the performance. The third pass is presented simply as an experiment. This compares the unprocessed Target track to the processed version of the same track (the track used to reference the processing is not included). This produces an interesting result with (unsurprisingly) a very tight tracking between the two performances. While you might need to listen carefully for phasing issues, or perhaps where the two versions are exactly time locked (no time shifting has been done) and the stereo image folds into the centre (as seems to happen on the last chord), this might offer an interesting alternative means for generating double-tracked parts.