Techniques For Vocal De-essing
Sound Advice
Technique : Processing
Songs suffering from superfluous sibilance? Study SOSs super strategies for sorting esses...
Mike Senior
When we talk about sibilance in relation to a vocal recording, were referring primarily to the s sounds; noisy high-frequency consonants created by the turbulence as air whistles through a singers teeth. Sibilant sounds (or sibilants for short) often pose technical problems in modern music, because a number of routine production decisions can conspire to emphasise them unnaturally. For a start, the overwhelming preference for close-miking vocals tends to highlight the noise components, and matters are often made worse if the vocal mic is positioned on the horizontal plane of the mouth, where sibilance is typically focused. The choice of mic can compound the problem, because bright mics are usually favoured for a forward-sounding timbre, and the most commonly used design, the large-diaphragm condenser, can have harsh-sounding high-frequency capsule resonances, especially in the case of budget models.
Mix processing of vocal parts is also a culprit. High-frequency EQ boost is par for the course on many recordings, to get the vocal to sound close to the listener and to cut through the mix, and this will clearly add emphasis to sibilants. Compression is also part of the problem, because most compressors wont react a great deal to the high-frequency energy produced by sibilance, so the sibilants will be controlled less assertively than the rest of the vocal signal, therefore rising in level, relatively speaking.
Whatever its root cause, excessive sibilance is a persistent concern in vocal production, but discussion of it tends to get relegated to a footnote in articles about compression or EQ. So in this article Ill explore in more detail the various tools that the mix engineer can call upon to deal with it.
Basic De-essing: Two Different Approaches
The simplest approach to de-essing is to turn down the level of the vocal signal whenever sibilance occurs. Some engineers do this manually, either by carefully editing vocal sibilants onto a separate track or by using detailed fader automation.

Dedicated de-essing plug-ins like Universal Audios Precision De-esser (left) may be all you need, but attenuating each sibilant manually using automation (above) allows you to be more selective.
Dedicated de-essing plug-ins like Universal Audios Precision De-esser (left) may be all you need, but attenuating each sibilant manually using automation (above) allows you to be more selective.
Manual de-essing isnt exactly an enthralling task, but its nonetheless pretty straightforward to carry out in modern software recording systems, because sibilants show up on the vocal waveform as dense pseudo-random regions, which are a doddle to pick out by eye.
Given the dread with which most people look on the idea of manual de-essing, its hardly surprising that a number of more set-and-forget shortcuts to solving the sibilance problem have arisen. These rely on tuning dynamics processors to respond only to sibilance, a feat which is achieved by EQing the signal feeding the processors detector circuit independently of the signal passing through the processors gain-reduction element. This works by virtue of the fact that sibilant noise bursts are usually focused somewhere in a region from 4-10kHz, and within this region theyll be much higher in level than any other element of the recording. If you can isolate just the sibilant frequency region in the dynamics processors detector circuit, the processors Threshold control can be set so that it only reacts to the sibilance.
Some all-in-one voice-channel processors allow you to switch their EQ into the detector circuit, but more often than not engineers set this up manually, using a separate side-chain or key input on a stand-alone dynamics module. Whichever way you use, the simplest thing is to try high-pass filtering the detector signal at around 4kHz, and you might find that this is all you need to do to get your dynamics processor triggering reliably. If youre still having trouble, try supplementing the filter with a whopping peak EQ boost at about 7.5kHz.
Going back to our two manual de-essing methods, if you want to create an automatic version of the first (chopping out the sibilance onto another track), use a gate as your dynamics processor, setting it up as a send effect and inverting the polarity of the return channel.

De-essing using a gate on a send-effect channel: if you invert the channels polarity, the signal let through the gate cancels out the corresponding part of the signal on the main vocal track.
De-essing using a gate on a send-effect channel: if you invert the channels polarity, the signal let through the gate cancels out the corresponding part of the signal on the main vocal track.
When an s comes along, itll get through the gate and phase-cancel that segment of the ungated track. The level of the return fader adjusts how much the sibilant segments will be reduced in level.
If you want to automate the second of our manual de-essing methods, just insert a compressor on the main vocal channel and adjust its threshold and ratio controls to clamp down appropriately on each instance of sibilance thats picked up. The latter setup is much more common (see, for example, Bob Clearmountains recent mention of it in SOS March 2009), not only because its simpler, but because the compression action will respond more or less firmly in proportion to how strident each individual instance of sibilance is. However, the former tactic has some unique advantages too — and dont write it off yet, because Ill be coming back to it later.
Whether you use a compressor or a gate youll need to choose suitable attack and release times, and in this application they need to be fairly fast because were dealing with short bursts of high-frequency energy. I tend to use a sub-1ms attack time if possible, but even then Ive found that the front end of some sibilants can still break through. Different models of compressor and gate may respond quite differently at such short time settings, so its worth experimenting here if youve got a tough sibilance problem. I also like to use a dynamic processor with a lookahead facility if the sibilance is particularly problematic. With release times, again I tend to set these very short (around 10ms) so that the de-essing resets as soon as the sibilance is past. Longer release times will give you unmusical level lurches as the processor resets its gain-reduction.
With simple level-based de-essing like this, once your chosen processing setup is triggered quickly and cleanly by the sibilants, the only real choice you have to make is how much to pull down the level of the problem regions — and thats pretty much a question of taste, because norms vary with different musical styles. The only thing to watch out for is that if you soften the sibilance too much youll start to make the singer sound like theyre lisping, and they might think youre taking the pith.
Advanced Frequency-selective Processing
On occasion, simple level-based de-essing can struggle to control the harshness of a vocals sibilance without lisping side-effects. This is typically because one portion of the sibilant frequency range is particularly strong, such that it sounds harsh even when the sibilance level as a whole is low enough to lisp. All the de-essing tactics weve looked at so far can be refined in response to this problem, but to understand how this works it makes sense first to cast your mind back to the two manual de-essing methods I mentioned at the outset. In the first one, where the sibilant regions are sliced out onto a separate audio track, youll recall that you can achieve simple full-band de-essing by simply lowering the sibilance tracks fader. However, if you first equalise this track to smooth off any particularly harsh-sounding frequency regions, then you can achieve a smoother vocal sound with less reduction in the overall sibilance levels, and potentially fewer lisping side-effects. If, on the other hand, youd adopted the second manual de-essing approach of automating fader levels, you might improve your results by automating the level of an equaliser band centred on the harshest-sounding sibilant frequency.
Whichever of these routes you take, you can achieve similar-quality results with care, but the more automatic processing approaches are once again more appealing to the majority of engineers. Although its not impossible to implement such processing for yourself in some software sequencers if you work from first principles, the setup can get quite involved, so this is the point at which most recording musicians turn to a specialised processor called a de-esser.
De-essers fall roughly into two main types, along the lines of our manual de-essing methods. The first of these involves the processor automatically slicing out the sibilant audio segments, allowing you to fade and EQ the sibilant audio stream to taste; while the second provides a single band of dynamic EQ that can be targeted at the harshest sibilance region and will pull down a specified range of frequencies during moments of sibilance. Almost all off-the-shelf de-essers use some version of the latter algorithm, so lets look in more detail at those first, before discussing the extra possibilities offered by the signal-chopping paradigm.

A dynamic EQ set up to de-ess a vocal. The EQs upper band (Band 4) kicks in when the signal in that band exceeds a user-determined threshold on the internal side-chain signal.
A dynamic EQ set up to de-ess a vocal. The EQs upper band (Band 4) kicks in when the signal in that band exceeds a user-determined threshold on the internal side-chain signal.
With dynamic EQ-based de-essers, there are a number of things you usually need to do to set them up. First of all, you have to get them triggering reliably, as with the simpler full-band de-essing methods, and the side-chain EQ required to do this is usually built into the specialist de-essers. Youll also usually get a button allowing you to listen to the EQd side-chain independently, and this usually makes it much easier to find effective values for the EQ parameters — in general terms, the nastier you make the sibilance sound in the side-chain, the easier itll be for the de-esser to detect it!
Once the de-essing is triggering in the right places, you then turn to a set of controls that define how the dynamic EQ cut responds when triggered. Typically, youll get a frequency control with which you can home in on the worst-sounding areas of the sibilance, and some kind of Range or Sensitivity parameter that lets you decide how assertively the EQ kicks in when sibilance is detected. On some de-essers (such as the Waves Renaissance De-esser), you get more detailed EQ controls, much as you might on a full-featured parametric EQ, allowing you, for example, to choose different filter types or adjust the bandwidth of the selected filter.
Beyond Dynamic EQ
Dynamic EQ-based de-essing can be pretty powerful, but some producers still prefer to work by

Chopping out the esses from a vocal part in order to EQ them without affecting the rest of the vocal.
Chopping out the esses from a vocal part in order to EQ them without affecting the rest of the vocal.
chopping out the sibilant sections and then EQing the sibilance-only track. Currently I know of only one manufacturer offering this kind of processing approach in plug-in form, and thats Eiosis, with their E2 De-esser, but its certainly possible to imitate this process from first principles in most sequencers, by expanding on the gated-send approach Ive already touched on. Whatever way you decide to go, having the sibilants on a separate mixer channel affords probably the most powerful processing options. Even if you decide only to EQ the sibilance, you can create a much more complex EQ curve with even a simple parametric EQ plug-in than is possible in any dedicated de-esser plug-in, and that EQ can be combined with compression for simultaneous level-only control.
However, having all the non-sibilant parts on a separate track has its own set of advantages too. For a start, you can EQ to add general brightness without making the sibilants harsh, and you can also make freer use of psychoacoustic enhancers such as Aphexs Aural Exciter range. Because these kinds of enhancers often add new high-frequency distortion components, they can make sibilants very harsh-sounding without actually increasing their signal levels a tremendous amount, something that is hard to remedy with most de-essing strategies; much better to avoid the problem altogether by leaving the sibilants unprocessed! Sibilance can also give away the action of many send effects, particularly reverbs and delays, so the option of sending to such effects from your separate non-sibilant track has some appeal too. However, you have to bear in mind in these instances that other consonants which may get past the sibilance detection (such as F and T sounds) could still cause problems in both these cases.
The Big Picture
As you can see, when it comes to de-essing, there are many ways to skin a cat. The reason its worth knowing about the different approaches is simply that the ones which make most sense will usually depend on what specific mixing facilities you have available to you. Do check, though, that your chosen de-esser doesnt alter the vocal sound when its not supposed to be doing anything — most de-essers wont, but Ive been surprised to come across exceptions.
Irrespective of the tools on hand, the main thing is to take a pragmatic approach to dealing with the problem. In particular, you need to be realistic about the pros and cons of automatic de-essing processors: while they can save a lot of time by dealing with the majority of a vocals sibilance problems, most real-world recordings with excessive sibilance will include a few instances that get the better of the automatic processing and are better tweaked into line by hand with automation of one kind of another. If you try to set a de-esser plug-in to rein in the very worst esses, then the likelihood is that the less offensive ones will lisp. One final thing not many people consider is that high-frequency content on other tracks which happens to be in the vocal sibilance range can make the vocal esses sound worse than they really are, so you might actually need to EQ other tracks correctively to tackle obtrusive sibilants in some cases. 0
Working With Backing Vocals
Most of what can be said about lead-vocal de-essing applies just as much to backing vocals, if not more so, owing to the fact that theyre often more heavily compressed and EQed! Fortunately, you can, in many situations, get away with heavier de-essing on layered BV parts without lisping creeping in; if youre not getting lisping in the context of the mix, dont worry too much if individual parts seem to be lisping from time to time. You may even find that you can bus the backing vocals to a single de-esser and still get reasonable sibilance reduction, although there will be some cases where you need more individual control over the separate parts than this. Whatever level you choose for your BV sibilants, though, its frequently necessary to tighten their timing, because otherwise the intelligibility of the lyrics can really suffer. Nowhere is this more crucial than when youre dealing with lush, panned, multitracked backing harmonies, as any games of pass the sibilant across the stereo field will be really distracting and quickly make your production sound amateurish.
Where Should I De-ess?
Choosing where to place a de-esser can be a question of suck it and see, as each de-esser responds differently. I normally process post-compression and post-EQ, as its only then that Im able to judge whether de-essing is required. I leave any distortion-based psychoacoustic enhancement until after the de-esser, though, as lower-level sibilants hit the enhancers internal saturation at a lower level, and thus sound smoother. This processing order suits dynamics-based de-essers much better than audio-chopping methods. Not only is it easier to set up reliable triggering for the gating when its the first thing in the chain, but its also possible to EQ and/or compress one of the sliced audio streams independently.
Another location where you might find a de-esser plug-in is first in line in an effects-return channel, where it is used to prevent sibilants splashing around undesirably in reverbs, delays, and modulation treatments. In this application, the de-essers setup tends to be much less critical — because even if the processed signal thats feeding the subsequent effect lisps like Daffy Duck, this will very rarely sound unnatural within a full mix. In fact, the only real pitfall to look out for is that if you alter the send level to the effects channel after the de-essers threshold has been configured, the sibilance-reduction may stop working in the way you need it to. To be be honest, I normally use a separate de-essed send effect for each different group of vocal parts, so that I can just use the effects-return channels fader to change the level of the reverb (or whatever) in the mix.
De-essing At The Mastering Stage
Excessive sibilance is something that is much better dealt with while tracking and mixing than after the mix is complete. Nonetheless, there can be occasions when you want to bring down vocal sibilance levels at the mastering stage, in which case you usually need to have powerful tools on hand. While you could simply strap a dynamic EQ over the whole mix and get some useful improvement, theres an equal likelihood that youll affect the clarity of other high-frequency sounds at the same time. In this event, you might be able to gain more surgical control over a centrally-panned lead vocal if you use M/S matrixing to separate out the mono components of your mix for independent processing. If thats no good, youll either have to spend some quality time with your systems mixer automation or investigate more specialised spectrogram-based audio-restoration tools (such those first featured in Cedars pioneering Retouch), which are becoming ever more affordable.
Downloads
This
download contains a demonstration multitrack project for Cockos Reaper. Reaper is available for Windows and Mac OS, and can be tried out for free — just surf over to
www.reaper.fm and download the appropriate 3MB installer.
Track 1 in the project contains a small section of rather sibilant vocal from one of our Mix Rescue mixes (the St. Vitus song, 'Word Gets Around' remixed back in SOS November 2008). The remaining tracks show a variety of different ways (but by no means every way!) in which you can implement de-essing.
Track 2 (Vox_ManualLevelRide): Fader automation has been manually drawn in to dip the levels of the three sibilant regions.
Track 3 (Vox_ManualChopEsses) and Track 4 (Vox_ManualChopRemainder): Audio editing has been used to slice the audio region onto two different tracks, the upper one containing just sibilance and the lower containing the remaining audio. A lower fader setting on the upper track then achieves a reduction in the sibilance level.
Track 5 (Vox_Send) and Track 6 (Vox_GatedSendEsses): Here the upper track is left unprocessed, but feeds a gate set up as a send effect. The gate is made sensitive to the sibilance by EQ'ing its detector signal and then the effect channel is polarity inverted so that the bursts of sibilance passing through the gate phase-cancel with the unprocessed track when they're mixed together. The fader level of the lower track adjusts the degree of sibilance reduction.
Track 7 (Vox_SidechainComp): This is a common de-essing approach whereby a compressor is made sensitive to sibilance by EQ'ing its detector signal. The compression settings then adjust the amount of sibilance reduction.
Track 8 (Vox_AutomatedEQ): Here Reaper's automation system is used to control an EQ peak cut, reducing the harshness of just the sibilant sections.
Track 9 (Vox_ManualChopEQEsses) and Track 10 (Vix_ManualChopEQRemainder): Here the same basic approach has been used as on Track 3 and Track 4, but instead of adjusting the fader level of the sibilance-only track, an EQ peak has been used to target just the harshest frequency range.