Part 2: Having created the conditions for a successful mix last month, what can you do to fit your vocal in the mix?
In Part 1 of this two-part series, I explained some ways to prepare your vocal recordings to maximise your chances of a good result in the mix. This time, I’ll run through what you can do to perfect your vocals during the mix itself. Much of what makes for a good mix is highly subjective and the same goes for vocal sounds too: every voice is different and we all have different tastes. That said, there are certain constants you’ll encounter with almost every vocal and it’s those I’ll address here.
Judicious equalisation (EQ) obviously helps a vocal sit well in the mix, but note that ‘EQ’ isn’t the preserve of dedicated equalisers. The mic itself may already have added a first stage of equalisation — most large-diaphragm condenser mics (and a fair few dynamics) have a natural ‘lift’ in the vocal range, and directional (eg. cardioid) mics will add bass boost due to the proximity effect if they’re used close to the source (as they almost always are for vocals). Mic selection isn’t really a mix issue, of course (unless, I suppose, you’re deliberately using one of the new crop of modelling mics), but it’s well worth listening out for what the mics might have added. Similarly, keep an ear out for what any other processing such as compression and analogue tape emulation does to the tonal balance of the vocal. My point here is that before you even reach for an EQ and start tweaking, you really need to make an effort first to listen to the part and then decide how you want it to sound.
When it comes to applying EQ, there are three main areas it can help to consider adjusting for the male voice (which is primarily what I work with): the lows, below about 150-200 Hz; the lower mid-range, around 300-400 Hz; the upper mid-range, in the 2-4 kHz range; and the highs, which can overlap the upper mids. For female vocals, the formant tone (which gives vocals much of their character) will be somewhat higher, so if these ranges don’t work try moving them up about half an octave. I recommend using the upper bass and mid-range as the ‘reference’ for vocal EQ — try leaving that region alone while you work first on the lows and then the upper mid-range and highs.
Low Frequencies. EQ can easily address excessive bass that was captured due to the proximity effect (ie. bass boost caused by singing close to the mic). However, note that some singers deliberately use the proximity effect to make their voice sound less ‘thin’, so don’t automatically EQ to reduce the low end — do so only if it dominates, or leads to a muddy or muffled sound.
A high-pass filter usually works well here, with a moderate slope (eg. 12dB per octave). Loop a section of the vocal where the proximity-effect bass boost is problematic, then slowly increase the high-pass filter’s cutoff frequency. It’s possible to set this filter too high, of course, leading to a somewhat thin sound — you should strive for a balance in which the low-frequency energy gives a rich, full sound, without overwhelming the rest of the vocal. If it’s difficult to find the ‘sweet spot’ try the same technique but with a gentler, 6dB/octave slope. If, on the other hand, the proximity effect is severe, perhaps try an 18dB/octave filter.
Sometimes, though, you’ll find no suitable compromise setting, and reducing the proximity effect’s lowest frequencies with a high-pass filter will make the voice too thin. In this scenario, a shelving EQ may prove more effective, so here’s a trick that many engineers use. First set the shelf frequency and then cut to reduce the muddiness, as you’d expect. Then, add some resonance (Q) to produce a slight ‘bump’ just above the cutoff frequency. (Some high-pass filters allow adding resonance in this way, but many don’t — in which case just use a regular parametric EQ boost alongside the shelf.) This bump increases bass that’s out of the range of the proximity-effect ‘mud’, allowing you to reduce the lowest frequencies while retaining a relatively full sound.
The Highs. I think of the ‘highs’ as containing two separate regions: the upper mid-range, which provides lyrical intelligibility, and the treble range, which starts around 5-6 kHz and extends upwards, where we perceive ‘air’ and ‘transparency’. A high-frequency shelf with little or no resonance can work well, except where the vocal is hissy or there are ‘ess’ problems that a de-esser can’t fix. In that case, the extended response above 8kHz or so can simply add noise, which doesn’t help the vocal, so try applying a parametric boost in the 4-7 kHz range, with a wide Q for a gentler slope. This should give a glossy, intelligible high-frequency response without boosting those ultra-high frequencies.
The Upper Mids. When it comes to the mid-range, listen particularly carefully to the vocal in context with the rest of the mix, because our ears are most sensitive in this frequency range, and you might just find that fixing the lows and highs was all that was required.
Set the vocal’s level in relation to the mix so that you can hear the low and high ends clearly. If the vocal still sits too far back in a busy track, focus on the upper mids, using a parametric EQ boost, typically in the 2.5-4.5 kHz range. It’s OK if this overlaps with the high-frequency shelf you applied earlier — the shelf is providing a general boost, whereas this upper-mid EQ is a more focused one.
Boost gently with a moderate Q, and sweep slowly across the upper mids; there will usually be a frequency where the vocal sounds present and ‘right’. Avoid boosting too far, though, because the ear’s sensitivity means boosts in this range can sound harsh and unnatural — and they may also make the lows and highs you’ve already attended to seem deficient. If the vocal still doesn’t seem prominent enough after a conservative upper mid-range boost, then you probably need to raise the vocal’s overall level.
The Low Mids. One common problem is excess energy around 300-400 Hz. Because many instruments produce energy in this range, the sounds can ‘pile up’ and sound muddy. A slight, somewhat broad, cut in this area can often tighten up the vocal sound.
As a last tip on conventional EQ, try to avoid what I call ‘iterative EQ’, which is a kind of EQ ‘arms race’: the lows seem thin, so you boost the bass, only to find that the highs don’t seem clear, so you boost the highs; now the lows need a little boost... and so on. Instead, if, for example, one EQ move leaves the vocal sounding thin, try cutting back the highs somewhat (and raising the overall vocal level if required), rather than boosting the bass.
Dynamic EQ has traditionally been a tool for mastering and solving problems with specific instruments (eg. taming an overly bright hi-hat or synth filter) but it can be useful with vocals. It combines a concept similar to multiband compression (the dynamic aspect) and EQ (which offers more precise ‘curves’). You specify a threshold for a particular EQ band, and when the audio in that range exceeds the threshold, the EQ boosts or cuts (depending on which you specify) according to the ratio.
Here’s an example of when you might choose to use a dynamic EQ. With a vocalist, you’ve used a static EQ to boost the ‘intelligibility’ frequencies, but that boost means that the level of those frequencies is excessive in certain places — so you use a dynamic EQ to correct that. Dynamic EQ is also useful for reducing resonances that aren’t particularly problematic at lower levels, but become annoying when they’re too loud. (De-essing vocals is another candidate for dynamic EQ, as discussed last month).
Personal bias alert: I don’t particularly like the ‘sound’ of compression on voices! If I can hear the compressor ‘working’, I’m not happy. That’s why I do the phrase-by-phrase normalisation technique mentioned in the previous article; it gives a consistent dynamic range without altering the internal dynamics of the vocal — the peaks remain just as peaky, and have the same relationship to the valleys they had before, but the part as a whole is more even. Some compression may still be required to control the dynamics, but this way I’m not asking the compressor to work so hard — or so audibly!
When I use compression on vocals, I tend to use two dynamics processors in series. The first is a fairly transparent limiter, with a threshold set so it applies no more than around 3-6 dB of gain reduction. The purpose of this is to bring any remaining ‘rogue peaks’ in balance with the overall vocals. A compressor follows, and is also set for about 6dB of gain reduction. Doubling up the dynamic range control, coupled with relatively light limiting and compression, gives as loud and present a sound as would be obtained with heavy-handed compression — but without the breathing and pumping artifacts. And even if you do want more extreme compression, the vocals won’t sound ultra-compressed.
Not all compressors have conventional threshold and ratio controls. For example, compressors that emulate the classic LA-2A hardware (an optical compressor with tube amplification) have a ‘peak reduction’ knob that controls the amount of gain reduction, similarly to how a threshold control operates, a gain control that makes up for the loss in level caused by compressing the dynamic range, and a switch to choose between two ratios: limiting or compression. The crucial control is the peak-reduction knob; a meter will show the reduction amount. For whatever reason, this kind of compression often seems to flatter voices.
Now let’s ignore my personal bias and assume you enjoy hearing the effect of compression on vocals. Like EQ, although every processor, mic and performance is different, there’s a general procedure that often gives good results.
Because the ear is not as sensitive to level changes as it is to pitch changes, it’s not always easy to hear fine gradations in compression. So let’s break a rule and say that, sometimes, you should listen with your eyes (at least partially), by which I mean looking at the compressor’s gain-reduction meter. I aim for about 6dB of reduction, with the meter ‘dancing’ between 0 and 6 dB of gain reduction, rather than having it stay ‘pinned’ at -6dB.
The key to a good result is finding the right interplay of threshold, ratio and attack time. A 20-30 ms attack will let through enough of the vocal’s attack for it to sound ‘real’. Start with a ratio of 2:1 and lower the threshold until you reach the desired amount of gain reduction. If the effect isn’t dramatic enough, first try increasing the ratio, but then, return it to 2:1 and reduce the threshold — choose which of these two tactics you prefer the sound of. Both will reduce the dynamic range, but lowering the threshold will give you more compression of lower levels, whereas a higher ratio will flatten the louder parts more. The difference may seem subtle, but you should be able to recognise it if you compare the two approaches in this way.
Finally, if you want the effect of lots of compression but without the artifacts, one trick is to place two compressors in series, set with relatively high thresholds and low ratios. There’s not enough compression in either to do violence to the signal, but when combined, they sort of multiply each other to give a compressed sound that doesn’t sound like a compressed sound.
Lots of compressors — particularly plug-in ones — now come with a ‘mix’ or ‘blend’ control that allows you to mix some of the ‘dry’, untreated sound back in with the compressed sound. But you can achieve a similar effect with any compressor by ‘multing’ your vocal part to two tracks and putting the compressor only on one, to leave you with one channel fader for the dry sound and another for the compressed one.
This can be a useful tactic when you want to preserve the peakiness of the louder parts while bringing up the level of the quieter details, whether that’s for effect or to improve intelligibility. You’d generally set the compressor quite assertively, with a high ratio and a fast-ish attack, and then, with the dry vocal playing in the context of the mix, bring up the compressor channel’s fader slowly from full attenuation until you hear those missing details. Because the compressor is clamping down on the loud peaks, when you raise the fader, you’re not changing their level as much relative to the dry sound as you are that of the quieter details.
Note that if you use a second channel for this technique, rather than a compressor’s built-in controls, you’ll probably want to route both the dry and compressed tracks to a group bus so you can apply any further processing or send effects to the composite compressed vocal sound.
An expander is the opposite of a compressor — below a threshold, the output drops off faster than the input. For example, with an expansion ratio of 1:2, for every decibel the input drops off, the output drops off 2dB. In some ways, it can be considered a more refined version of the noise gate. In fact, some processors combine expansion and gating.
The main use with vocals is to set a low threshold, around -45 to -60 dB or so, and use a fairly steep expansion ratio, like 1:4 or 1:10. This will make any low-level noise even lower in level, and may reduce the need to do manual editing to reduce the silence between vocal phrases.
Reverb and vocals were made for each other: very few recordings put the voice totally out front, with no ambience at all, even when they sound relatively dry. However, there’s more to creating the right vocal reverb sound than simply dialing in a preset and crossing your fingers.
Acoustic spaces create the most natural-sounding reverb, but many of us are used to hearing other types of reverbs which have appeared on our favourite records over the last several decades. A limitation of real spaces is that there’s only one ‘preset’ — and the small matter of fitting a concert hall in your project studio! Emulating the classic concrete room sound that was on so many great recordings, let alone other acoustic environments, is not an easy task either.
Synthesized or algorithmic reverb processors model the two main phenomena that create reverb in an acoustic space: early reflections, the initial sound that happens when sound waves first bounce off various surfaces, then the reverb ‘tail’, which is more of a wash of sound caused by emulating the gazillion reflections that occur in a real room with their various amplitude and frequency response variations. Both processes have associated controls.
Another parameter, pre-delay, sets the time it takes for the sound to travel from the source to the first reflection point (ie. the surface in the room that’s nearest the source or the listener) and back to the listener’s ears. The larger the space, the longer the pre-delay. I don’t personally use a lot of pre-delay or early reflections on voice, because I generally want the vocal to sound upfront compared with the rest of the track, and have the reverb tail more in the background. But you may have different aims, and it’s worth familiarising yourself with what all the parameters do. Note that most digital reverbs are not true stereo devices; typically they mix stereo inputs into mono, and then synthesize a stereo space. Hence you can obtain stereo reverb effects with an initially mono vocal track.
Convolution reverb is conceptually more like sampling. A ‘snapshot’ called an impulse response (or IR) of a real room’s characteristics is recorded using real microphones, and the convolution plug-in uses some clever maths to impose those characteristics onto your audio. This approach produces a highly realistic sound, much like how a sampler can produce more realistic sounds than an analogue synthesizer.
The trade off is the traditional sampler versus synthesizer issue: the difficulty lies first in selecting the right sounds (the IR is only as good as the room in which it was captured, and the position of both the source and the mics used to capture the IR) and then in editing the sounds to fit your particular composition. But just as some companies have figured out how to get ‘inside the sample’, most modern convolution reverbs are quite editable, and as easy to use as standard reverbs. (The one thing they can’t replicate successfully is any modulation used in the patches of a high-end algorithmic reverb).
For vocals, choosing whether to use convolution or algorithmic reverb is purely a matter of taste. Algorithmic reverb can give a more diaphanous, airy type of sound, while convolution gives a more realistic, you-are-there vibe. It’s like the difference between an impressionistic painting and a photograph; both can give enjoyment, for different reasons. Once you’ve made your choice, here are some of the key points to consider.
The Virtues Of Diffusion. A reverb’s diffusion control increases the density (‘thickness’) of the echoes. High diffusion places echoes closer together, while low diffusion spreads them out. With percussive sounds, low diffusion creates lots of tightly spaced attacks, like marbles hitting steel. But with voice, which is more sustained, low diffusion gives plenty of reverb effect without overwhelming the vocal from excessive reflections that could ‘step on’ the vocal.
One Reverb Or Many? Back in the stone age of recording, a recording had one reverb, and all signals that needed reverb were bussed to it. Later on, studios often used a specific reverb for vocals. Much of the motivation for doing this was to make the voice more distinctive, and if the studio had a plate reverb, that was often the reverb of choice, because it tended to have a brighter, crisper sound than a traditional room reverb. Another personal bias alert: I don’t use a lot of reverb in my mixes, so I give the voice its own reverb, about half the time using my own ‘idealised’ white noise-based impulses with convolution reverb, or algorithmic reverb when I want to draw more attention to the reverb effect.
Damping. If sounds bounce around in a hall with hard surfaces, the reverb’s decay tails will be bright and ‘hard’. With softer surfaces (eg. wood instead of concrete), the reverb tails will lose high frequencies as they bounce around, producing a subjectively ‘warmer’ sound. If your reverb can’t create a smooth-sounding high end, introduce some damping to place the focus more on the mid-range and lower frequencies. For voice, to maintain a relatively bright reverb sound, I don’t use too much damping.
Decay Time & Frequencies. Many reverbs offer a frequency crossover point, with separate decay times (RT) for high and low frequencies. To prevent too much competition with mid-range instruments, use less decay (and a lower level) on the lower frequencies and increase decay on the highs. This adds ‘air’ to the vocals, as well as emphasising some of the sibilants and mouth noises that serve to ‘humanise’ a vocal. Vary the crossover setting to determine what works best for a particular voice; experimentation is important because of differences between male and female voices, and different individuals’ tonality, range, and so forth. Start at around 500Hz and move up from there until you discover the right sound.
Remember that, within reason, crispness with vocals is usually a good thing, because it increases intelligibility — but be careful adding it with a reverb if you already added massive amounts of high-frequency EQ to the vocal itself!
Adding delay to vocals has been popular since the days of adding echo from a tape recorder. Even subtle amounts can help fill out a vocal, and give a ‘bigger’ sound without the potential pitfalls of doubling the voice. An eighth-note delay is good for thickening the voice, and EQ is your friend when using echo. Rolling off the lows will prevent the vocal echo from stepping on the mid-range-oriented instruments, while boosting the highs can give an ‘airy’ quality. Although some plug-ins allow adjusting both EQ and the echo mix, you can also use a send, insert a filter in the send before the echo, set the echo for delayed sound only, and then mix in the desired amount of echo to the main signal.
When used more as an effect than for thickening, the echo will likely be longer, and include a significant amount of feedback. This can ‘muddy’ the sound, which is something you can solve with automation, applying echo selectively, typically when there are gaps in the vocal. Here are five different approaches for you to consider:
Send Effect. Automate the send to the echo during sections where you want echo, and automate the send level to mix the desired amount of echo in and out.
Multing. Split the section to be echoed to another track. The advantage of this approach is you may not have to use automation, but just set up the echo as a track insert plug-in. It will affect only the parts of the vocal that were moved to that track.
Plug-in Automation. Manipulate the desired controls (mainly feedback and mix), and record your automation moves. These will create envelopes you can manipulate further if needed.
Clip Effect. Split the section of a clip where you want to add echo, and insert an echo effect in the clip (if your DAW supports that approach — not all do). Note that the echo may not extend beyond the clip. This can work to your advantage if the clip is followed by a section where you don’t want echo, but if you wish the echo to sustain and your DAW doesn’t provide that as an option, you can simply extend the clip with silence before processing it.
Ducked Delay. This technique requires that you use the echo/delay as a send effect, and ideally for the effect to be used only by your vocal and not other instruments. You send the vocal part to the external side-chain of a compressor that’s placed after the delay plug-in. The compressor is set up with a low threshold so that when you can hear the main vocal part the compressor is pulling down the level of the delays. When there’s a gap in the vocal, the compressor stops compressing and allows the delay line through. It’s a clever and appealing idea (which, by the way, could also be applied to reverb sends) but note that you don’t have quite such precise control in a mix as when using automation.
One of the main goals when mixing is to give vocals a position of prominence in the mix. For years, engineers made sure vocals were the appropriate level by manually ‘riding the fader’ while mixing, to change levels as needed. Then automation came along, first on sophisticated consoles, and later in DAW software, which allowed those moves to be memorised and refined.
The next big step in terms of automation was in 2009 when Waves’ Vocal Rider plug-in was released. This does automatic gain-riding so that the vocal remains at a target level you specify. That might sound rather like compression and expansion, and in essence it is — but there are a couple of important differences. First, the plug-in can write the gain changes as automation to your DAW, allowing you to edit the ‘gain-riding’ action in detail if desired. Second, a side-chain input can receive a reference signal that can inform the ‘optimum’ level for the vocal. If this reference is a premix of the entire mix, then the vocal level will change based on the level of the mix itself — getting louder in louder passages, and softer in softer passages, but always intelligible. Similar products are now available from other manufacturers, including Melda, Hornet Plugins, and TB Pro Audio.
The main advantage of vocal riding programs compared with compression is they don’t alter the fidelity or moment-to-moment dynamics, just the overall level — there’s no more processing than you’d apply by moving a fader. So are they a panacea? Well, yes and no! For narration or audio books, where a consistent voice level is crucial, vocal gain-riding plug-ins are worth their weight in gold. For music, it depends on the music itself. These programs can’t make artistic decisions, only technical ones — so you will always have more control by riding gain manually, and making edits to your automation. However they can save a lot of time when setting up a mix, and you can always ‘punch in’ automation for those points at which you disagree with the software’s decisions.
A good mix isn’t just about the vocals — there are lots of other elements to consider too. And setting the levels of the vocal relative to the rest of the track, as I discussed earlier, is only part of the story. Just as reverb and delay tails can mask vocal intelligibility, so too can other sources in your mix. Here’s not the place to tell you how to mix your guitars, keyboards, drums, samples and whatever else, but it’s worth pointing out that if you have difficulty hearing parts of your vocal it can often be due to other sounds masking the vocal at that point. And in this case, rather then riding the vocal level up, it’s worth hunting down what part is masking the vocal and EQ’ing it to counter the problem, or turning it down (or even muting it, if that works for the arrangement). In other words, often turning something else down or ‘switching it off’ will make your vocal cut through better without having to do anything to the vocal itself.
One question that comes up time and again about vocals in a mix is how to get them sounding ‘wide’ when they’ve been recorded in mono — and wide in a way that can’t be achieved with stereo reverb and delay alone. It’s another subject that’s big enough to fill a book, but there are a few techniques that can be explained in brief here. Arguably, the best is to plan for width early on and capture genuine double tracks when recording — you sing and record the same part two or three times, and when it comes to the mix you blend them all together, perhaps opposition-panning two parts, or placing a main one in the centre, and opposition-panning two others either side, maybe at a slightly lower level. For songs that have been tracked to a click at a consistent tempo, you might find that you can fake this effect by copying and pasting the second ‘take’ from other parts of the song (eg. you might have sung the same chorus part two or three times).
Another possibility is to fake the double tracks using one of the better pitch-correction applications — Synchro Arts Revoice Pro has a clever facility to generate believable doubles, and Melodyne’s Random Deviation feature is capable of good results too, but you might be able to press into service any offline pitch and time-warp processors, such as Logic’s FlexPitch and Cubase’s VariAudio: just use slightly different pitch- and time-quantisation settings on the doubles. If you don’t have any such software, a less natural-sounding but often subjectively nice technique is to send the vocal to two mono pitch-shifter plug-ins, one panned to one side and shifting the vocal up, the other panned opposite the first and shifting the vocal down by the same amount. You can also apply a pitch-correction plug-in before or after the pitch-shift, which will help to differentiate the two parts still further, and can experiment with short delays to each too. This is similar to the sort of spacious effects that Eventide units became famous for — in fact, the Eventide H3000 Factory plug-in would be another off-the-peg option here!
Note, though, that it’s incredibly easy to overdo this sort of thing, resulting in a vocal that sounds obviously processed, and leaves the vocal sounding a little too ‘diffuse’. So proceed, by all means, but do so with caution!
For neophytes, presets provide starting points from which, with a little tweaking, you can achieve good results, and some manufacturers pride themselves on the quality of their presets — iZotope’s Nectar vocal processing plug-in is a good example, as its large library of presets is of its main selling points.
It’s important to remember, though, that presets can take you only so far. Every mic, voice and mix is different; there’s no one-size-fits-all ‘male vocal’ or ‘female background vocal’ preset. Something that may sound fabulous on a condenser mic could sound terrible with the dynamic mic a singer prefers to use. And pay particular attention to presets that involve threshold-based processors — the threshold must always be adjusted to suit the specific source.
For me, the greatest value of presets is that you can start creating your own, either from scratch or by modifying existing presets. When a certain combination of processors work well with my voice and a particular microphone, I’ll save that as a preset. Sometimes a preset will work with a piece of music without any tweaks whatsoever, and sometimes a few tweaks are needed, but overall it saves time to have a point of departure that’s known to work in a specific context.
I hope these techniques help you achieve more satisfactory results when recording and mixing your own vocals. To hear these techniques in action, check out the vocals in my two most recent projects, ‘Simplicity’ and ‘Neo-‘ at www.craiganderton.com/music.html What’s particularly noticeable is the consistent dynamics, without obvious compression. The DSP techniques from Part 1 are present as well.