You are here

Tracking Vocals Without Headphones

Kick The Cans! By Matt Houghton & Neil Rogers
Published October 2015

Kick The Cans!

Live rooms and headphones can kill the mood for some vocalists — so why not ditch the cans and record in front of your monitors?

From a technical point of view, tracking vocals is normally one of the simplest recording tasks. Very good-sounding results are achievable using modestly priced gear even in sub–par rooms. For starters, vocalists are often close-miked, which puts the voice–to–room–sound ratio in your favour. Where the room sound remains an issue, creating an acoustically ‘dry’ area behind the vocalist and mic with a polyester duvet (or something more stylish and expensive!), positioning a pop shield, setting headphone levels and coaching a singer to use some basic mic technique are all fairly straightforward means of improving matters.

None of this advice is of much use, though, if a singer is struggling to deliver a great performance because they’re ill at ease in the studio. Anxiety can manifest itself in a singer’s inability to be as accurate with pitch as they normally are, or in performances that lack the required energy or emotion. Unfortunately, helping a singer to raise their game when they’re feeling anxious, uncomfortable or lacking confidence can be right up at the other end of the difficulty spectrum!

Working With Cans

One factor that can influence a singer’s ability to perform is the recording environment. Some artists like to work on creating a more comfortable atmosphere in the studio — we’ve known more than one bring in candles and insist on removing their shoes before performing! However off the wall their approach, if it puts them at ease and helps them ‘get into the zone’ it’s likely to be a good thing for the recording.

For some people, though, the very fact of having to wear headphones makes them uncomfortable. That’s hardly surprising: no matter how much effort you put into perfecting the cue mix, singing while wearing headphones isn’t exactly a natural situation! Some vocalists try to overcome the problem by taking the headphones off one ear. That means they can hear the natural sound of their voice and the live–room acoustics via their ‘free’ ear, while listening to the cue mix in the other. When working with such an artist, note that they often position the spare earcup so that the cue mix spills into the mic, so make sure they can hear the whole mix in the channel they’re listening to and, if possible, mute the other. Also, note that even if that approach feels more natural, a vocalist may still find that being ‘put on the spot’ in a live room or booth makes them anxious.

In Control (Room)

Neil sets up the vocal mics in front of the monitor speakers in his control room at Half Ton studios in Cambridge.Neil sets up the vocal mics in front of the monitor speakers in his control room at Half Ton studios in Cambridge.Let’s examine one way in which you might overcome such problems: inviting the vocalist to perform their part in the control room, with their cue mix playing over the speakers. The advice applies equally to the typical home recordist, who has no choice but to record and monitor in the same room — and yet, in our experience, typically still relies on headphones. While you might imagine that performing in front of speakers would conjure a nightmare of spill and phase cancellation, it needn’t actually be more problematic than the conventional headphones and live–room/vocal–both approach — it’s just that you’re faced with a different set of issues. Importantly, it allows for direct communication between singer and engineer, and frees the singer entirely from the shackles of headphones.

Although we’ve used these techniques in the past, it’s usually been in the heat of a session, and we haven’t had the luxury of time to experiment. So, to help us prepare this article, we spent half a day in the studio tracking vocals for local Cambridge band Fred’s House, trying out a few variations on the technique and figuring out which yielded the best results in terms of eradicating the problems of spill and of encouraging a good performance.

Spill Versus Sonics

The primary technical concern when recording in front of the speakers is how to minimise the monitor–mix spill on the recorded vocal — too much spill makes it difficult to process the vocal part when mixing, and even where processing is minimal, spill can cause undesirable phase cancellation that affects the tonality of the corresponding parts in the mix.

In theory, there are several ways of jumping this particular hurdle. You can, for instance, bring the vocalist closer to the mic’s diaphragm. This increases the ratio of voice to spill — so the spill ends up lower in level and therefore less problematic. That’s one reason why so many people tend to associate moving–coil dynamic mics such as the Shure SM58 with the sing–in–front–of–speakers approach, but there are plenty of alternative handheld mics, some rather more high fidelity, that could be used in this role (see the Handheld Mics box). A different tactic, potentially, is to select a mic whose polar pattern enables you to reject much of the sound being played back via the speaker.

Another option is to use one of several variations of the phase–cancellation technique, the aim of which is to neutralise the spill in one of two ways. The first is to play the cue mix in mono, invert the polarity of one of your two speakers, and place the vocal mic equidistant from both speakers. The idea is that the out–of–phase (but otherwise identical) mono signals from each speaker cancel out in the middle, and so aren’t picked up by the mic. The other approaches require you to play back and capture the cue mix using the vocal mic, as a separate take from the vocal itself. The polarity of this ‘instrumental’ take is inverted in your DAW, and so should, in theory, neatly cancel any spill captured when tracking the vocal.

There are potential downsides to both approaches. The inverted speaker–polarity trick is a bad choice for several reasons. It requires you to play back the monitor mix in mono, and as the aim is to make things feel more natural for the vocalist, that’s not ideal. More importantly, it sounds odd for the vocalist standing between the speakers — it can feel disorientating, almost as if the sound is being sucked out of the middle of your head. Again, not really an improvement over headphones! There’s also a practical problem, because to get the best results you really need to place the mic between the speakers, and that’s a physical position that will require you to move any mixing desk or other studio furniture.

The biggest downside of the recorded–cue–mix technique is that it takes more time than conventional recording because you need to capture two takes: one with the vocal and one without. That said, as long as you use an identical cue mix at the same volume each time you do a take, you only need to capture the cue mix via the mic once and either double its level or duplicate its track in the DAW for every additional vocal take. There’s the risk of things like mic–stand droop causing small differences in mic position, though, resulting in less than perfect cancellation, unless you recapture the cue mix.

As our examples will show, you’re likely to attain the best results using this basic two–take cancellation tactic when you try a few refinements. While surprisingly effective, the cancellation isn’t perfect in practice and so you need to find ways to minimise the audible imperfections.

Testing Techniques

Before the band joined us, we discussed which tests to do. We decided to set up two mics side by side. The first was a Shure SM7 cardioid dynamic, a model that’s typically used in fairly close proximity to the vocalist’s mouth, and the second was a Neumann U87 multi–pattern large–diaphragm condenser mic, which we knew would suit the particular vocalists singing on this session. This would enable us to consider what sort of tonal sacrifices would need to be made if the dynamic mic offered greater rejection of spill, as well as to compare the results using the U87’s cardioid and figure–of–eight patterns. Although this meant the mics would perhaps not be quite optimally positioned, it did mean we could try all options with both mic types, without requiring a ridiculous number of takes.Two mics were used simultaneously while testing: a Shure SM57 moving-coil dynamic and a Neumann U87 multi-pattern capacitor.Two mics were used simultaneously while testing: a Shure SM57 moving-coil dynamic and a Neumann U87 multi-pattern capacitor.

The mics were placed about two and a half meters in front of the speakers, so that the singer was looking straight towards the centre of the monitors’ stereo image, and both mics were hooked up via identical Focusrite ISA preamps. The acoustic treatment at the rear and sides of the control room provided a nice, controlled environment for the recordings, neither too dry nor lifeless, and all the vocals were recorded without compression or EQ. Having let the vocalists warm up, we captured a first take using a headphone monitor mix in the usual way to provide a sort of reference point. Then, with the mics in the same position, we moved on to the sing–in–front–of–speaker techniques.

How Loud?

As the main reason for recording like this is to create the right vibe and allow the singer to perform naturally, the level of playback is important. So, ignoring the potential for spill for the time being, task number one was to work with the first vocalist, Griff, to set a level that would enable him to sing with the right amount of energy for the song and for his voice to work properly. We found that a level just above the point at which you could comfortably have a conversation across the room was loud enough for Griff to do his thing without it becoming too oppressive after more than a few minutes.

The optimum monitor–mix level might be different for each vocalist and for different material; a heavy rock part requiring a particularly gutsy vocal performance might benefit from it being louder still, for example. But bear in mind that a key reason we settled on the chosen level was that Griff was still able to hear his own voice in the room when singing; a louder cue mix might negate this benefit.

As we were arriving at this conclusion, we’d recorded some takes so that we could also discover how much spill resulted from the different monitor levels, without any of the phase–cancellation trickery applied. We captured one take with the monitor mix at the preferred volume and one at a slightly quieter volume — quiet enough to be able to talk comfortably over the music. As you’d expect, there was noticeably less spill with the quieter take. However, Griff made it known that he felt very uncomfortable singing with the music at this lower level.

We also took the opportunity to compare the recordings captured via the Neumann and Shure mics. There wasn’t a huge amount of difference between the two in terms of the room sound and speaker spill captured — and it’s worth mentioning that the spill on both was lower than we’d imagined it would be, and low enough that it would probably have been possible to get an acceptable result using these takes without further work.

It’s worth noting that there might have been scope to move the singer an inch or two closer to the SM7 than the U87, which would obviously have improved the spill performance further. We all preferred the sound of the U87, though, which is reason enough to explore ways to reduce the spill.

Reducing Spill With EQ

Next, without using any phase–trickery we tried the simplest way of reducing the of spill further: EQ. Rather than EQ the spill on the vocal recording itself (which would affect the vocal part’s tonality), we used an EQ plug–in on the DAW’s stereo master bus, the idea being to remove problematic elements from Griff’s monitor mix. That way, it couldn’t spill in the first place!

The most audibly annoying spill on the earliest takes was in the higher frequencies, and as mix engineers in some genres often emphasise breath or ‘air’ frequencies on vocal recordings, it made sense to start by applying a low–pass filter. The idea was to roll it down gradually, until Griff felt that his monitor mix was compromised, and then roll it back up slightly to find a turnover frequency that would eliminate much of the HF spill without affecting the performance. Eventually, we settled on a 12dB/octave LPF set just below 8kHz. Although Griff said he could notice a difference when comparing the EQ’d and unprocessed mixes, he didn’t feel that he was losing any of the information he needed to be able to perform, and wouldn’t be remotely worried about singing to the EQ’d version. In terms of results, this tactic met its aims rather well — Griff seemed able to perform quite naturally to the EQ’d monitor mix, and the phase–cancelled result was much cleaner, without sacrificing anything in terms of the recorded vocal’s tonality.By using a  low–pass filter on the playback in the room, the spill in the microphone will be kept away from the high-end ‘air’ frequencies of the vocal. Our tests showed that you can also remove other problematic elements of spill in the same way, such as the snare drum, which is the 5kHz cut in the second screen.By using a low–pass filter on the playback in the room, the spill in the microphone will be kept away from the high-end ‘air’ frequencies of the vocal. Our tests showed that you can also remove other problematic elements of spill in the same way, such as the snare drum, which is the 5kHz cut in the second screen.Kick The Cans!

Snare Solution

Despite our satisfaction with the results, there remained some spill issues that we felt could be improved. The snare drum, for instance, remained very audible on the vocal recording. While this spill didn’t really detract from the vocal, it had the potential to detract from the sound of the snare drum in the mix. As the snare was important to the song’s groove, it was important for the singer to hear it clearly while tracking so we couldn’t simply mute the part! Instead, we repeated the filtered–monitor–mix process, but this time in addition to the 8kHz LPF we added a 3–4dB cut at around 5kHz, which softened the snare somewhat, without really interfering with the groove. Griff noticed a slight change, as we suspected he would when asked, but it was not enough for him to be at all distracted. In fact, he seemed quite happy singing to the playback with the EQ applied, and it certainly had the desired effect of reducing the level of the snare drum in the spill.

You could get much more sophisticated with this technique, removing non–critical mix elements, and EQ’ing the mix–bus to your heart’s content. And if you’re working alone in a single–room home studio, it’s well worth putting some effort into this — you might be surprised by just how spill–free a result you can achieve. But you have to balance that against the natural flow of a real recording session and consider whether it would disrupt things too much.

There’s one final important point to make that didn’t crop up in the tests — given that there’s going to be some spill when working in this way, you shouldn’t include anything in the cue mix that isn’t going to be in the final mix. Click tracks and speculative guide parts, for example.

Opposites Attract

Next we tried the rudimentary phase–cancellation technique described earlier, first tracking the vocal to a backing track which was playing over the speakers, as before, but this time recording the same track played back at the same level over the speakers, using the same fixed–position vocal mics, but this time without any vocal. We then polarity-inverted (or ‘phase–flipped’) this instrumental take in the DAW to see how close to perfect the phase–cancellation of the spill was, and whether it had interfered with the recorded vocals in any way.

We’d used this technique previously to remove some click–track spill and had been pleasantly surprised with how successful it had been. The same was true here — the phase cancellation did a remarkably good job of reducing the spill. Yet, while the result was probably usable, it wasn’t perfect. While effective in removing the low frequency and mid-range spill, the higher frequencies didn’t cancel quite so precisely. Also, loud, transient–rich sounds, including the snare drum hits, remained audible. On the plus side the vocal part itself suffered no ill effects. In fact, it sounded good. Our only concerns were how the remaining spill might interact with the corresponding sounds in the mix, and how we might best tackle any problems.

When using this technique, it’s important to ensure that the playback level is the same when recording the instrumental as when you record the vocal and that the mic position doesn’t change. If those factors vary, the cancellation will be only partial and insufficient to be of use. Another potential drawback is that if you have people in the control room during the session (this is about getting the atmosphere right, after all) any noise they make whilst tracking will be recorded. They’ll tend to keep quiet while the vocalist is singing, but you’d be amazed at the temptation to comment while playing back the instrumental monitor mix you need to record!

It’s also important to think about how you process the two channels when mixing. You need to combine the two tracks before any processing, level change or panning. So you’ll either want to route them to a group bus, or render them as a single part before you start processing — the latter’s probably the best option, as you’ll have no use for the individual tracks in the mix. Also, if you’re using one instrumental take for, say, three vocals, you’ll need to copy the instrumental take to partner up with each individual vocal take.Having recorded a  pass of the instrumental played over the speakers using the vocal mics, we inverted the polarity of that channel to try to eliminate the spill on the vocal track through phase cancellation.Having recorded a pass of the instrumental played over the speakers using the vocal mics, we inverted the polarity of that channel to try to eliminate the spill on the vocal track through phase cancellation.

EQ & Phase Cancellation

Though encouraged by the success of the techniques we’d tried so far, there was still room for improvement, so next we decided to combine the EQ and phase–cancellation techniques — the idea being that any elements that weren’t cancelling particularly well could be eliminated at source, leaving phase cancellation to take care of the rest. It’s always nice when you test a theory and it works out well in practice: the EQ had precisely the effect we’d hoped for, particularly when combining a low–pass filter and a little dip around 5kHz (a region in which there can be a lot going on for both snare drums and vocals). The overall result was a level of spill comparable with that when a singer moves the headphones off one ear. The recorded vocal sounded nice, natural and unprocessed.

Considering how little extra effort this had taken, the results were impressive. We were happy to track like this — there’s barely any sonic compromise, and yet it can make a world of difference for the artist. And if the time taken to track the instrumental is a worry, you have to set that against the potential time–savings reaped through the instant, eye–to–eye communication you have with the singer, given that you’re working in the same room.

Null & Void?

A figure–of–eight microphone has a  deep ‘side null’ which means the sides of the mic reject sound very efficiently. Directing one of these nulls at the speakers will help to minimise spill. While this might make a  single vocalist a  little uncomfortable, as their monitor mix will be delivered on one side of them, it can be a  great way of working with dual vocalists if they’re able to balance their parts as naturally as Vik and Griff (pictured here) can.A figure–of–eight microphone has a deep ‘side null’ which means the sides of the mic reject sound very efficiently. Directing one of these nulls at the speakers will help to minimise spill. While this might make a single vocalist a little uncomfortable, as their monitor mix will be delivered on one side of them, it can be a great way of working with dual vocalists if they’re able to balance their parts as naturally as Vik and Griff (pictured here) can.Kick The Cans!Despite our satisfaction with the results of the EQ’d cancellation approach, there were a couple more tricks that we wished to explore. The next one involved switching the U87 to its figure–of–eight pattern and positioning it so that one of its deep side nulls pointed towards the speakers — in other words, we aimed it to capture the vocal but reject the sound from the speakers. This yielded a worthwhile further reduction in the level of spill but had its downsides. First, this configuration meant that Griff now faced sideways relative to the speakers and therefore heard the guide track louder in one ear than the other. He found this a bit strange and, while he did a professional job, we can’t say he looked particularly at ease with it! It was also noticeable that the tone of the vocals changed, which was possibly due to the mic’s increased pickup to the rear; although it rejected the direct sound from the speakers, it picked up more of the room sound. If you feel it’s worth trying this, then it might be worth putting some sort of absorber behind the mic.

Given that we had a figure–of–eight mic set up already and two vocalists to hand, we thought we’d try the old technique of tracking two vocalists on the one figure–of–eight mic, with one singing into the front, and one the rear. So Vik joined Griff at the mic, each facing the other as they sang. It takes a well-drilled duo (which it has to be said Vik and Griff are) to balance themselves nicely over the course of a whole take on a single mic, but it’s well worth trying. Interestingly, they both seemed to really enjoy being able to perform like this, and they remarked that having each other to focus on overcame any unease Griff had felt previously about the monitoring being off to one side.

How Loud Can You Go?

Given the levels we’d been working at and the absence of headphones, Griff could hear his voice in the room and so hadn’t felt the need to monitor his vocal over the speakers. But in some scenarios that might be required, especially if the singer wants you to crank the levels up a bit while they channel their inner rock god! If you can avoid this without compromising the performance, then do. But we decided to explore what effect it would have on proceedings. The first thing we did was raise the playback level and then, careful to avoid feedback (obviously a risk), began to introduce some of the vocal foldback into the monitors.

As you’d expect, the increase in playback level meant that there was more bleed in general. Another downside was that while experiments with phase cancellation did reduce spill, they also interfered with the vocal sound. Obviously, the vocal part coming over the speakers interfered with the one we were aiming to capture directly. On the plus side, Griff found that he quite liked the louder monitor mix, and you might find it worth working this way if you find it brings about a noticeable improvement in performance — you can always decide to just live with the spill, particularly it you’re going to be the one doing the mixing!

Do remember, though, that listening to music at these levels will end up skewing your own perception of what sounds good or bad, so it would be worth you (the engineer) either using ear plugs while you track or stepping out of the room during takes. That way, you’ll be better able to assess the quality of the take you just tracked!


Although it’s interesting to explore techniques like these in a controlled environment, it’s easy to get sucked into thinking only about the spill and not about the quality of the vocal parts themselves. It was difficult to get too analytical about this during our session due to the number of takes, but the one thing that was very clear to us was how dramatically the performance changed the second we lost the headphones: we’d heard a singer trying to sing in a controlled and precise way and then heard a singer just doing their thing. There are a few ways to think about this. Studio technique can certainly be learnt and it’s very possible for singers to learn how to sing naturally with headphones, in much the same way that it’s possible for a drummer to learn how to play naturally to a click track. Thinking about how we’ve used this technique in the past, it’s normally been as a way of breaking the ice, or for if we’ve got a bit stuck when recording vocals. Also, forgetting about the sonics for a moment, the psychological effect of being in the same room is important factor; the whole thing becomes more of a collective process, rather than a me–and–them affair.

Downsides? The spill, of course, if it’s noticeable when mixed in with the track, although some producers say the spill is one of the reasons they like this technique as it can add some ‘vibe’ to a sterile–sounding track. And while the cancellation technique seemed to work the best in our session, in terms of workflow it can be a bit of a fiddle, especially if you have a band kicking about. Having to do several takes of the instrumental with everyone keeping quiet could be difficult to manage in some situations. Another consideration is that your ears inevitably get tired very quickly if the playback is loud, and it can seem a little strange putting headphones on and off or leaving the room all the time. How you like to process a vocal might also be a consideration, as if you like to touch up a vocal sometimes with some tuning software you can be left with some unpleasant artifacts if there’s too much spill in the background.

Summing up, though, testing the nuances of these techniques in a controlled setting was a really useful exercise — and a pleasant one too, given we had such willing and helpful vocalists (thanks to Fred’s House for being such good sports!). The phase cancellation combined with the low–pass filtering worked really well, and we’ll certainly be trying this out on future sessions. Recording vocals without the headphones in general is a good trick to have up your sleeve, and if you find yourself a bit stuck in a vocal session, or things are just not happening, I’d encourage you to give it a go. And don’t feel that you have to change your mic of choice to make it work: provided your room is a decent–sounding space for recording, you should be able to use whichever vocal mic you usually would.

Handheld Mics

Handheld mics such as the SM58 pictured here offer potential pros and cons. Handheld mics such as the SM58 pictured here offer potential pros and cons. As we’ve indicated in the main article, spill is effectively made lower when using a mic in close proximity to the singer’s mouth — the mic still picks up the same spill and room noise but the vocal is much louder in relation to it. The mic is not in a fixed position, of course, so phase–cancellation techniques won’t work, but as the spill is so low, that’s not a problem. Note that you needn’t be limited to the classic choice of a Shure SM58, as not only is there a newer more capable equivalent (the Beta 58) and high–quality handheld dynamics from several manufacturers, but there is now also a whole host of handheld condenser mics intended primarily for on–stage use, such as the Neumann KMS105. (You’ll find a roundup featuring various models on the SOS web site at

We didn’t have sufficient time or access to enough models to run through all the possible alternatives on the day, but for a final take with Griff, we decided to track in this way with an SM58 just to demonstrate that the approach is workable, and encouraged him to ‘get into’ things and move about a bit. There was very little spill on this recording, as we’d expected, and Griff seemed to enjoy himself. The SM58 didn’t really suit Griff: it emphasised the mid–range in an unpleasant way, and there were pops and plosives all over the place. But such artifacts can often be fixed with EQ or audio repair software, and sometimes vocal clarity is just not the main issue — first and foremost, it’s often all about capturing the right vibe and performance.

In all honesty, we wouldn’t recommend this approach as a first choice if your vocalist is happy singing in front of a higher–quality fixed–position studio mic. As the other tests show, you’ll probably get better results that way. You’ll also probably have a greater range of mics from which to choose. But it’s all about horses for courses: if there’s a stage condenser that seems particularly to suit the vocalist’s voice, or the handheld experience serves to enhance their performance, then it’s certainly worth a try.

Vocalist Reaction

Kick The Cans!Given that the main reason for considering all these headphoneless recording techniques is to enable the artist to give their all to the performance, we made a point of asking Griff and Vik for their thoughts on the experience as we went along. We’ve given a flavour of that in the main article, but Vik perhaps summed it up best when she explained that Fred’s House are “a live act at heart and it simply feels more natural, and more like a performance singing to the speakers — especially on a more upbeat-type song”.

That last point is an important one; it’s a technique that you would perhaps use on some artists and on only some tracks, and might go back to a more conventional headphone setup for others. You can find out more about Fred’s House on their web site.

Audio Examples

Fred’s House kindly agreed to let us supply some audio examples of the approaches we explored with them on the day. You can find these on the SOS web site.