Last month, we showed how to comp a great vocal performance. Now we look at how you can use editing tools to improve both timing and tuning.
In last month’s mag, you’ll find an article (https://sosm.ag/dec21-vocal-comping) in which I offered some tips about how to comp a great vocal performance from multiple takes. Once you’ve done that, though, there may well be more editing work to do. There’s been a lot of hand‑wringing over the years about whether vocal pitch correction is ‘a good thing’, but it’s so widely used in mainstream music styles that the listening public have for the most part come to expect recorded singers to sound superhuman in this respect. And while it gets less press, vocal timing is also comprehensively manipulated in most chart productions, because lead parts are often faded up in the mix to the point where they significantly impact on the song’s overall groove.
As such, there’s little avoiding corrective editing if you’re trying to put out commercial‑grade song productions. But achieving decent‑sounding results can be far from straightforward, because it’s easy to suck the life out of a vocal performance during the process. So in this article I’d like to provide some tips for tightening your vocal performances effectively without sacrificing their naturalness and humanity.
You can make your life a lot easier just by preparing properly for the recording stage. After all, singers respond to what they’re hearing, which means any inherent backing‑track problem can make it tough for them to bring their ‘A’ game. So make an effort to complete corrective edits elsewhere in the production before the singer steps up to the mic, and try to put together a sensible mix balance too. If the drums and percussion are too loud, for instance, it can be difficult to accurately judge pitch; too quiet and the rhythmic accuracy may suffer.
Some headphones don’t have great bass fidelity, which can cause sub‑heavy bass or kick‑drum timbres to all but vanish from the foldback mix, thereby robbing the performer of vital pitching/timing reference points. Monitoring latency, however small, can also mess with a singer’s pitching, which is why I usually prefer to avoid software monitoring if I can — either by setting up analogue monitoring or by asking the singer to slide one side of their headphones off so they can hear themselves acoustically. Some singers swear that adding vocal reverb to their headphones helps them pitch more accurately but I’ve never found that personally, so your mileage may vary!
Minimising the capture of room reflections and background noise on your vocal mic can be another wise move. Pitch/time‑processing algorithms find noisy and reverberant signals more challenging to analyse and process, and this basically translates into a greater likelihood that you’ll end up with unpleasant processing artefacts. And one further recording tip for rhythmic songs: encourage the singer to move a little with the song’s beat while they perform, since that can do wonders for the rhythmic accuracy.
There’s a plethora of great pitch‑correction software these days but, whatever you’re using, and assuming you’re not using it as a deliberate effect, the main principle to follow is to keep the processing to a minimum. The simpler and more targeted your pitch correction, the fewer unmusical side effects it’ll produce. With this in mind, it makes sense to avoid set‑and‑forget automatic tuning‑correction processes, because they process pretty much everything to some extent — irrespective of whether the music needs it or how much the processing is damaging the vocal sound. The fact is that most top‑tier professionals still prefer to do lead‑vocal editing work manually, so they can be guided at all times by their own ears.
I normally suggest working through the performance methodically, phrase by phrase, but then, within each phrase, correcting the most problematic notes first. What you’ll discover by working this way is that many notes aren’t actually that important to the impression that the singer is fundamentally ‘in tune’, and as long as you get the most important ones sorted out, any other pitching vagaries just add emotional colour to the performance. Trying to nail every little syllable to the pitch grid, on the other hand, is great way to kill a performance stone dead.
In a similar vein, try implementing your desired tuning via simple pitch offsets to whole words or syllables, rather than trying to iron out all the tiny variations within each syllable. Now, I realise that you sometimes encounter longer syllables during which the pitch wanders undesirably, and that some tuning software offers a pitch‑variation slider that seems to offer a quick fix for this. It’s a slippery slope, though, because complacency with that slider can quickly lead to ‘robotitis’: a mechanical, synth‑like vocal tone that’s totally devoid of character. Much better to slice longer syllables into smaller sections for offset‑based tuning.
With all that in mind, here are a few other specific tuning issues to bear in mind:
- Noisy components within vocal recordings (things like consonants and breaths) can cause problems for pitch‑shifting algorithms in general, so you should avoid processing those if possible.
- If you notice that your pitch‑correction software has misidentified the pitch of a note, take the time to remedy that manually. While this won’t likely affect the pitch‑shifting of that specific note much, it may well create smoother pitch transitions to and from adjacent notes.
- While most modern tuning correctors can tweak a note’s pitch without altering the frequencies of its vowel resonances (called formants), I’ve found that pitch‑shifts of more than a full tone can still make the vocal formants feel out of line. So don’t be afraid to tweak the formant settings manually on those occasions.
A Stitch In Time
Many of the same software tools that offer tuning correction also provide audio‑stretching facilities that can be used for timing‑correction. Again, though, I’d warn against using any kind of bulk ‘quantise’ function for lead vocals, because that’s never generated musical results for me. To be honest, I actually avoid time‑stretching lead singers at all wherever possible, because I’ve never liked how that kind of processing affects the vocal timbre and reduces the sense of ‘air’. I far prefer to use traditional audio editing techniques, which I demonstrated last month (SOS December 2021: https://sosm.ag/dec21-vocal-comping) and only resort to time‑stretching as a last resort when my timing edits open up some gap in the line that’s otherwise unbridgeable.
As with tuning, if you’re working through a vocal performance phrase by phrase, try to resist the urge to pin every single moment to the grid. Focus instead on which syllables really have to be on the beat and which can afford to be looser without affecting the groove. Furthermore, I’ve found it a decent rule of thumb that the smaller the chunks of audio you’re shuffling around to correct a perceived timing problem, the less natural the final edit will often sound.
The Eyes Have It
Whether you’re editing tuning or timing, there are some important elements of psychology that you have to engage with. Most importantly, you must recognise that we humans are hard‑wired to favour our eyesight over our hearing, and that this can very easily catch you out when working on any detailed studio task. If a vocal note ‘looks’ right on your DAW’s pitch/time grid, your brain will strongly bias you to think that it sounds right whereas, in reality, a naturally in‑tune and in‑time performance will almost always deviate significantly from those grids. In the case of pitch, this may be because of small misinterpretations of the note’s pitch within the software itself, whereas with timing issues there may be ambiguity in what actually defines the perceptual ‘start’ of a syllable.
For instance, if you look at the waveform of the word “crazy”, you’ll see that there’s an opening transient for the “c”, then a waveshape and level transition through the “r” sound before the more sustained “a” consonant fully arrives. Which of these elements constitutes the start of the note? Although you might instinctively put the “c” on the beat, this often (but not always) makes the word as a whole feel late, because the sustained “a” vowel then begins significantly after the beat. There’s nothing cut‑and‑dried about questions like this, so the only solution is to continually challenge yourself to trust your ears more than your eyes.
Now, I’m not saying that your software’s pitch and time grids can’t be helpful in suggesting how a dodgy‑sounding note might sensibly be adjusted — on the contrary, they can save you a lot of time. And in arrangements without a well‑defined tempo grid, the waveforms of rhythm instruments can serve a similar purpose. However, you do have to discipline yourself to close your eyes or look away from the screen as much as you can while you work — both when choosing notes for adjustment, and when deciding whether your changes have improved the situation. For my own part, I also make a point after editing each song section (each verse, say) to get up from my studio chair and listen back from a different part of the room, facing away from my DAW screen! It’s amazing how much clearer your decisions become once you step out of ‘editing mode’ like this and adopt more of a listener’s mindset away from the controls.
Remember that you’ll be just as affected by the backing track you’re hearing as the original singer was, so don’t expect to be able to make appropriate tuning and timing judgements if the backing balance is nothing like it will be in the final mix. Similarly, for any mainstream work, it’s vital to check your edits both on full‑range monitors and under more bandwidth‑restricted listening conditions, because differences in the audibility of the frequency extremes can make a big difference to how the vocal performance feels. I’d also recommend working at moderate volume levels, because pitch perception actually changes slightly with listening level, and in my experience, vocal tuning decisions made at higher listening volumes tend to skew slightly flat.
A lot of engineers don’t consider the impact of their own physical movements while editing, either. I realise that it can be useful to bop along with the music to arrive at a clear opinion about where a vocal should sit within the track’s groove — but there’s also a sense in which this can be a bit self‑fulfilling. In other words, if you’re dancing around while trying to judge the success of an edit, your own physical sensations can overshadow the subtler rhythmic cues coming through your ears, and it’s easy to fool yourself into thinking that your edit is more in time than it actually is. In fact, if I find myself wanting to move around while listening back to an edit, it often causes me to question the quality of the edit; I get suspicious that I’m trying to delude myself that it feels OK when it doesn’t! (It’s a bit like when you keep wanting to turn up the monitoring volume while mixing, which often serves as a red flag that the mix simply isn’t good enough yet.) So I sometimes specifically do a final editing check without moving around at all, to be more certain of what I’m actually hearing.
Another psychological phenomenon you have to acknowledge is that you’ll tend to get more sensitive to pitch and timing disparities the more you listen to any given phrase. Indeed, it’s for this very reason that I think studio productions benefit from tighter tuning than live performances, because the listener is more likely to listen to the recording repeatedly. To an extent this works to your advantage while editing, because precise judgements get easier the more sensitive your ear becomes. I’ll often hold off doing serious editing tweaks at the very start of any editing session, simply because I’m waiting for my ears to ‘warm up’ in this respect.
In my experience, vocal tuning decisions made at higher listening volumes tend to skew slightly flat.
The flip side of this progressive increase in perceptual sensitivity is that it becomes increasingly easy to over‑edit, simply because you start hearing pitch and timing nuances that will never really impinge on the consciousness of regular listeners. This is something I struggle with myself, as a bit of detail‑oriented personality, so I have come to rely heavily on a few countermeasures. One is to avoid spending more than a few minutes on each phrase, concentrating just on making it better, rather than making it perfect. Then, after I’ve edited a whole verse, say, I’ll listen back to it in its entirety and ask myself what actually still bothers me. As often as not, aspects of the vocal performance that seemed wayward when I was deep in the weeds end up just seeming desirably human once I’ve ‘zoomed out’ a bit. Moreover, I can have greater confidence that moments which still feel off‑kilter do genuinely warrant more refinement.
And, of course, you’ll get even more objectivity if you do this kind of full‑section playback after a cup of tea — in fact, in general, breaks are crucial when involved in such detailed work. For this reason, I’ve developed something of a rhythm while editing, where I’ll work mostly in detail (with occasional full‑section passes) for about 45 minutes, and then take a break. Tea and biscuits safely consumed, I’ll play back the sections I was working on beforehand and make final adjustments to those, before starting the workflow cycle again with detailed editing of the next section.
With this advice in mind (and plenty of practice!) you should be well on the way to the ideal outcome of corrective editing — a performance that sounds like it hasn’t been edited at all!
Although this article is primarily concerned with lead vocals, backing vocals often need corrective editing too. Here, however, I’m often happy to lift my moratorium against automatic tuning and timing functions, either because the backing part is low in the mix or because it’s part of a layered texture — in both scenarios the audibility of undesirable processing artefacts is greatly reduced.
My main additional concern with backing vocals, though, is their harder consonants, especially when several parts are layered together, because flamming between consonants on different tracks not only sounds a bit scrappy (especially if the singers are panned in stereo), but can also impede lyric intelligibility. I also usually try to avoid backing‑vocal consonants preceding those of the lead, because it can be quite distracting as a listener, especially over headphones.
Unmatched note‑endings can also stick out like a sore thumb, so that’s another thing to listen out for. When shortening an ending, I prefer not to just fade out mid‑note, because this can sound rather unnatural. Instead, I usually slice a section out of the middle of the final syllable instead, and then close the gap by simply sliding the note’s natural ending earlier in time (and using the matched‑waveform editing technique described in last month’s article about comping). When a note end needs lengthening, I’ll tackle that in a similar way (ie. using simple audio edits) if there’s any risk that chorusey time‑stretching artefacts might become audible, but otherwise I’ll frequently just spot‑fix it with a bit of time‑stretching.
Overall, it’s as well to be aware that how tightly backing vocals are edited is often a strong genre marker, so before you get too carried away with minutiae, do check out a few relevant commercial releases to help you maintain your perspective.
Should I Start With Pitch Or Timing?
For years, I’ve always dealt with pitch editing first, followed by timing, although I mostly owe this habit to the practicalities of working with various versions of Celemony Melodyne (my own pitch‑correction software of choice) over the years, and don’t have a huge preference either way. Furthermore, I do quite a lot of comparative research of studio professionals, and there seems to be no broad consensus about where they start either. So I wouldn’t waste too much energy worrying about it, and just start with whichever you prefer. Mind you, I wouldn’t recommend trying to correct timing and tuning at the same time, personally, because I think each task feels to me like it uses very different ‘muscles’ mentally, and each is also demanding enough to require your full concentration.
Additional Resources Online
To support this article, I’ve set up a special resources page on my website, where you’ll find audio and video demonstrations of the editing techniques discussed in this article, downloadable multitracks with unedited vocal takes (so you can practice your corrective editing), and links to further reading.