Songs suffering from superfluous sibilance? Study SOS's super strategies for sorting esses...
When we talk about sibilance in relation to a vocal recording, we're referring primarily to the 's' sounds; noisy high‑frequency consonants created by the turbulence as air whistles through a singer's teeth. Sibilant sounds (or sibilants for short) often pose technical problems in modern music, because a number of routine production decisions can conspire to emphasise them unnaturally. For a start, the overwhelming preference for close‑miking vocals tends to highlight the noise components, and matters are often made worse if the vocal mic is positioned on the horizontal plane of the mouth, where sibilance is typically focused. The choice of mic can compound the problem, because bright mics are usually favoured for a forward‑sounding timbre, and the most commonly used design, the large‑diaphragm condenser, can have harsh‑sounding high‑frequency capsule resonances, especially in the case of budget models.
Mix processing of vocal parts is also a culprit. High‑frequency EQ boost is par for the course on many recordings, to get the vocal to sound close to the listener and to cut through the mix, and this will clearly add emphasis to sibilants. Compression is also part of the problem, because most compressors won't react a great deal to the high‑frequency energy produced by sibilance, so the sibilants will be controlled less assertively than the rest of the vocal signal, therefore rising in level, relatively speaking.
Whatever its root cause, excessive sibilance is a persistent concern in vocal production, but discussion of it tends to get relegated to a footnote in articles about compression or EQ. So in this article I'll explore in more detail the various tools that the mix engineer can call upon to deal with it.
The simplest approach to de‑essing is to turn down the level of the vocal signal whenever sibilance occurs. Some engineers do this manually, either by carefully editing vocal sibilants onto a separate track or by using detailed fader automation. Manual de‑essing isn't exactly an enthralling task, but it's nonetheless pretty straightforward to carry out in modern software recording systems, because sibilants show up on the vocal waveform as dense pseudo-random regions, which are a doddle to pick out by eye.
Given the dread with which most people look on the idea of manual de‑essing, it's hardly surprising that a number of more 'set‑and‑forget' shortcuts to solving the sibilance problem have arisen. These rely on tuning dynamics processors to respond only to sibilance, a feat which is achieved by EQ'ing the signal feeding the processor's detector circuit independently of the signal passing through the processor's gain‑reduction element. This works by virtue of the fact that sibilant noise bursts are usually focused somewhere in a region from 4-10kHz, and within this region they'll be much higher in level than any other element of the recording. If you can isolate just the sibilant frequency region in the dynamics processor's detector circuit, the processor's Threshold control can be set so that it only reacts to the sibilance.
Some all‑in‑one 'voice‑channel' processors allow you to switch their EQ into the detector circuit, but more often than not engineers set this up manually, using a separate 'side‑chain' or 'key' input on a stand‑alone dynamics module. Whichever way you use, the simplest thing is to try high‑pass filtering the detector signal at around 4kHz, and you might find that this is all you need to do to get your dynamics processor triggering reliably. If you're still having trouble, try supplementing the filter with a whopping peak EQ boost at about 7.5kHz.
Going back to our two manual de‑essing methods, if you want to create an automatic version of the first (chopping out the sibilance onto another track), use a gate as your dynamics processor, setting it up as a send effect and inverting the polarity of the return channel. When an 's' comes along, it'll get through the gate and phase‑cancel that segment of the ungated track. The level of the return fader adjusts how much the sibilant segments will be reduced in level.
If you want to automate the second of our manual de‑essing methods, just insert a compressor on the main vocal channel and adjust its threshold and ratio controls to clamp down appropriately on each instance of sibilance that's picked up. The latter setup is much more common (see, for example, Bob Clearmountain's recent mention of it in SOS March 2009), not only because it's simpler, but because the compression action will respond more or less firmly in proportion to how strident each individual instance of sibilance is. However, the former tactic has some unique advantages too — and don't write it off yet, because I'll be coming back to it later.
Whether you use a compressor or a gate you'll need to choose suitable attack and release times, and in this application they need to be fairly fast because we're dealing with short bursts of high‑frequency energy. I tend to use a sub‑1ms attack time if possible, but even then I've found that the front end of some sibilants can still break through. Different models of compressor and gate may respond quite differently at such short time settings, so it's worth experimenting here if you've got a tough sibilance problem. I also like to use a dynamic processor with a lookahead facility if the sibilance is particularly problematic. With release times, again I tend to set these very short (around 10ms) so that the de‑essing resets as soon as the sibilance is past. Longer release times will give you unmusical level lurches as the processor resets its gain‑reduction.
With simple level‑based de‑essing like this, once your chosen processing setup is triggered quickly and cleanly by the sibilants, the only real choice you have to make is how much to pull down the level of the problem regions — and that's pretty much a question of taste, because norms vary with different musical styles. The only thing to watch out for is that if you soften the sibilance too much you'll start to make the singer sound like they're lisping, and they might think you're taking the pith.
On occasion, simple level‑based de‑essing can struggle to control the harshness of a vocal's sibilance without lisping side‑effects. This is typically because one portion of the sibilant frequency range is particularly strong, such that it sounds harsh even when the sibilance level as a whole is low enough to lisp. All the de‑essing tactics we've looked at so far can be refined in response to this problem, but to understand how this works it makes sense first to cast your mind back to the two manual de‑essing methods I mentioned at the outset. In the first one, where the sibilant regions are sliced out onto a separate audio track, you'll recall that you can achieve simple full‑band de‑essing by simply lowering the sibilance track's fader. However, if you first equalise this track to smooth off any particularly harsh‑sounding frequency regions, then you can achieve a smoother vocal sound with less reduction in the overall sibilance levels, and potentially fewer lisping side‑effects. If, on the other hand, you'd adopted the second manual de‑essing approach of automating fader levels, you might improve your results by automating the level of an equaliser band centred on the harshest‑sounding sibilant frequency.
Whichever of these routes you take, you can achieve similar‑quality results with care, but the more automatic processing approaches are once again more appealing to the majority of engineers. Although it's not impossible to implement such processing for yourself in some software sequencers if you work from first principles, the setup can get quite involved, so this is the point at which most recording musicians turn to a specialised processor called a de‑esser.
De‑essers fall roughly into two main types, along the lines of our manual de‑essing methods. The first of these involves the processor automatically slicing out the sibilant audio segments, allowing you to fade and EQ the sibilant audio stream to taste; while the second provides a single band of dynamic EQ that can be targeted at the harshest sibilance region and will pull down a specified range of frequencies during moments of sibilance. Almost all off‑the‑shelf de‑essers use some version of the latter algorithm, so let's look in more detail at those first, before discussing the extra possibilities offered by the signal‑chopping paradigm.
With dynamic EQ‑based de‑essers, there are a number of things you usually need to do to set them up. First of all, you have to get them triggering reliably, as with the simpler full‑band de‑essing methods, and the side‑chain EQ required to do this is usually built into the specialist de‑essers. You'll also usually get a button allowing you to listen to the EQ'd side‑chain independently, and this usually makes it much easier to find effective values for the EQ parameters — in general terms, the nastier you make the sibilance sound in the side‑chain, the easier it'll be for the de‑esser to detect it!
Once the de‑essing is triggering in the right places, you then turn to a set of controls that define how the dynamic EQ cut responds when triggered. Typically, you'll get a frequency control with which you can home in on the worst‑sounding areas of the sibilance, and some kind of Range or Sensitivity parameter that lets you decide how assertively the EQ kicks in when sibilance is detected. On some de‑essers (such as the Waves Renaissance De‑esser), you get more detailed EQ controls, much as you might on a full‑featured parametric EQ, allowing you, for example, to choose different filter types or adjust the bandwidth of the selected filter.
Dynamic EQ‑based de‑essing can be pretty powerful, but some producers still prefer to work by chopping out the sibilant sections and then EQ'ing the sibilance‑only track. Currently I know of only one manufacturer offering this kind of processing approach in plug‑in form, and that's Eiosis, with their E2 De‑esser, but it's certainly possible to imitate this process from first principles in most sequencers, by expanding on the gated‑send approach I've already touched on. Whatever way you decide to go, having the sibilants on a separate mixer channel affords probably the most powerful processing options. Even if you decide only to EQ the sibilance, you can create a much more complex EQ curve with even a simple parametric EQ plug‑in than is possible in any dedicated de‑esser plug‑in, and that EQ can be combined with compression for simultaneous level‑only control.
However, having all the non‑sibilant parts on a separate track has its own set of advantages too. For a start, you can EQ to add general brightness without making the sibilants harsh, and you can also make freer use of psychoacoustic enhancers such as Aphex's Aural Exciter range. Because these kinds of enhancers often add new high‑frequency distortion components, they can make sibilants very harsh‑sounding without actually increasing their signal levels a tremendous amount, something that is hard to remedy with most de‑essing strategies; much better to avoid the problem altogether by leaving the sibilants unprocessed! Sibilance can also give away the action of many send effects, particularly reverbs and delays, so the option of sending to such effects from your separate non‑sibilant track has some appeal too. However, you have to bear in mind in these instances that other consonants which may get past the sibilance detection (such as 'F' and 'T' sounds) could still cause problems in both these cases.
As you can see, when it comes to de‑essing, there are many ways to skin a cat. The reason it's worth knowing about the different approaches is simply that the ones which make most sense will usually depend on what specific mixing facilities you have available to you. Do check, though, that your chosen de‑esser doesn't alter the vocal sound when it's not supposed to be doing anything — most de‑essers won't, but I've been surprised to come across exceptions.
Irrespective of the tools on hand, the main thing is to take a pragmatic approach to dealing with the problem. In particular, you need to be realistic about the pros and cons of automatic de‑essing processors: while they can save a lot of time by dealing with the majority of a vocal's sibilance problems, most real‑world recordings with excessive sibilance will include a few instances that get the better of the automatic processing and are better tweaked into line by hand with automation of one kind of another. If you try to set a de‑esser plug‑in to rein in the very worst esses, then the likelihood is that the less offensive ones will lisp. One final thing not many people consider is that high‑frequency content on other tracks which happens to be in the vocal sibilance range can make the vocal esses sound worse than they really are, so you might actually need to EQ other tracks correctively to tackle obtrusive sibilants in some cases.
Most of what can be said about lead‑vocal de‑essing applies just as much to backing vocals, if not more so, owing to the fact that they're often more heavily compressed and EQ'ed! Fortunately, you can, in many situations, get away with heavier de‑essing on layered BV parts without lisping creeping in; if you're not getting lisping in the context of the mix, don't worry too much if individual parts seem to be lisping from time to time. You may even find that you can bus the backing vocals to a single de‑esser and still get reasonable sibilance reduction, although there will be some cases where you need more individual control over the separate parts than this. Whatever level you choose for your BV sibilants, though, it's frequently necessary to tighten their timing, because otherwise the intelligibility of the lyrics can really suffer. Nowhere is this more crucial than when you're dealing with lush, panned, multitracked backing harmonies, as any games of 'pass the sibilant' across the stereo field will be really distracting and quickly make your production sound amateurish.
Choosing where to place a de‑esser can be a question of 'suck it and see', as each de‑esser responds differently. I normally process post‑compression and post‑EQ, as it's only then that I'm able to judge whether de‑essing is required. I leave any distortion‑based psychoacoustic enhancement until after the de‑esser, though, as lower‑level sibilants hit the enhancer's internal saturation at a lower level, and thus sound smoother. This processing order suits dynamics‑based de‑essers much better than audio‑chopping methods. Not only is it easier to set up reliable triggering for the gating when it's the first thing in the chain, but it's also possible to EQ and/or compress one of the sliced audio streams independently.
Another location where you might find a de‑esser plug‑in is first in line in an effects‑return channel, where it is used to prevent sibilants splashing around undesirably in reverbs, delays, and modulation treatments. In this application, the de‑esser's setup tends to be much less critical — because even if the processed signal that's feeding the subsequent effect lisps like Daffy Duck, this will very rarely sound unnatural within a full mix. In fact, the only real pitfall to look out for is that if you alter the send level to the effects channel after the de‑esser's threshold has been configured, the sibilance‑reduction may stop working in the way you need it to. To be be honest, I normally use a separate de‑essed send effect for each different group of vocal parts, so that I can just use the effects‑return channel's fader to change the level of the reverb (or whatever) in the mix.
Excessive sibilance is something that is much better dealt with while tracking and mixing than after the mix is complete. Nonetheless, there can be occasions when you want to bring down vocal sibilance levels at the mastering stage, in which case you usually need to have powerful tools on hand. While you could simply strap a dynamic EQ over the whole mix and get some useful improvement, there's an equal likelihood that you'll affect the clarity of other high‑frequency sounds at the same time. In this event, you might be able to gain more surgical control over a centrally‑panned lead vocal if you use M/S matrixing to separate out the mono components of your mix for independent processing. If that's no good, you'll either have to spend some quality time with your system's mixer automation or investigate more specialised spectrogram‑based audio‑restoration tools (such those first featured in Cedar's pioneering Retouch), which are becoming ever more affordable.
This Download contains a demonstration multitrack project for Cockos Reaper. Reaper is available for Windows and Mac OS, and can be tried out for free — just surf over to www.reaper.fm and download the appropriate 3MB installer.
- Track 1 in the project contains a small section of rather sibilant vocal from one of our Mix Rescue mixes (the St. Vitus song, 'Word Gets Around' remixed back in SOS November 2008). The remaining tracks show a variety of different ways (but by no means every way!) in which you can implement de-essing.
- Track 2 (Vox_ManualLevelRide): Fader automation has been manually drawn in to dip the levels of the three sibilant regions.
- Track 3 (Vox_ManualChopEsses) and Track 4 (Vox_ManualChopRemainder): Audio editing has been used to slice the audio region onto two different tracks, the upper one containing just sibilance and the lower containing the remaining audio. A lower fader setting on the upper track then achieves a reduction in the sibilance level.
- Track 5 (Vox_Send) and Track 6 (Vox_GatedSendEsses): Here the upper track is left unprocessed, but feeds a gate set up as a send effect. The gate is made sensitive to the sibilance by EQ'ing its detector signal and then the effect channel is polarity inverted so that the bursts of sibilance passing through the gate phase-cancel with the unprocessed track when they're mixed together. The fader level of the lower track adjusts the degree of sibilance reduction.
- Track 7 (Vox_SidechainComp): This is a common de-essing approach whereby a compressor is made sensitive to sibilance by EQ'ing its detector signal. The compression settings then adjust the amount of sibilance reduction.
- Track 8 (Vox_AutomatedEQ): Here Reaper's automation system is used to control an EQ peak cut, reducing the harshness of just the sibilant sections.
- Track 9 (Vox_ManualChopEQEsses) and Track 10 (Vix_ManualChopEQRemainder): Here the same basic approach has been used as on Track 3 and Track 4, but instead of adjusting the fader level of the sibilance-only track, an EQ peak has been used to target just the harshest frequency range.