The spoken voice is something most people take for granted, but recording a voice properly is certainly not the trivial task many people assume it to be, and there are just as many traps and pitfalls for the unwary as in any other recording situation. In fact, the spoken voice is, in many ways, far more demanding of studio acoustics, microphones, recording equipment and the expertise of the engineer than the singing voice. The reason for this is quite simply that a singing voice is, more often than not, accompanied by other instruments and processed to some degree with reverberation, both of which tend to mask minor deficiencies in quality.
In the case of a spoken voice, any unnatural coloration (perhaps from room acoustics, bad mic technique or a low-quality microphone) stands out, because we are all very used to hearing human voices and can spot anything 'wrong' extremely quickly. Likewise, heavy compression may be applied to a singing voice to provide control and keep it at the top of a mix, but with the spoken voice, even relatively small amounts of compression can be very obvious indeed.
As with any instrument, placing a microphone too close to the source of a spoken voice will 'distort' the sound, because it won't capture all elements of the sound in the correct balance. Moving the microphone further away will usually achieve a better, more natural sound quality, but also captures more 'room sound', which may not be acceptable.
A human voice (whether speaking or singing) has three sources contributing to the complete sound: the chest, the mouth (including teeth and lips), and the nose.
CHEST: The chest is basically a large resonant cavity, and this accounts for a lot of the lower frequencies in the voice. Just place your ear on someone's chest while they talk to see how significant this element actually is (better ask their permission first!). Both the front and the back of the chest cavity vibrate, and being at a fairly low frequency, this part of the sound will tend to travel as a 'spherical wave' which means that it will radiate out in all directions.
MOUTH: The mouth, teeth and lips are responsible for the main articulation of words, and produce mainly mid and high frequencies. These tend to promulgate as 'planar waves', which means they spread out slowly with increasing distance, but primarily travel in the direction the mouth is facing. In other words, this element of the voice is only available in front of the person speaking, not behind or to the sides.
One of the biggest problems when recording a voice is that of 'plosives' -- the 'P's and 'B's which create a distinct blast of air, often causing the microphone diaphragm to bounce off its end-stops! You can feel the strength of the wind blast by placing your hand in front of your mouth and saying a few P's and B's. Although you will probably not be able to feel the blast on your hand more than a few inches away from your mouth, there is often enough energy to upset a microphone up to a foot or so away. Fortunately, these plosive blasts are remarkably directional, and tend to occupy a relatively narrow 'corridor' extending directly forward and usually slightly downward from the mouth.
NOSE: Few people associate the nose with talking or singing, but you only have to suffer a heavy cold to realise how important its contribution is. Very little actual sound is generated, but there is a surprising amount of 'wind' -- not from speaking, but from breathing, and from some characteristic mannerisms, such as a 'humph' reaction to a comment, or a 'snorting' laugh. These actions generate wind blasts which are very similar to plosives, but are generally directed straight down and are only likely to cause problems if the microphone is very close to the mouth, or is mounted on the chest (a personal microphone on a tie-clip, for example).
The ideal place for a microphone will vary considerably with the speaker and the location. However, there are some guidelines which may be deduced from the description above about where the voice comes from. Firstly, there's no point in putting the microphone behind the speaker, since the only element of sound to travel in that direction is the low-frequency chest resonance -- so it has to be around the front somewhere. This may seem like an obvious statement, but it's always worth thinking these things through logically.
The only real problem area in front of a speaker is directly in front of the mouth, where plosives will cause unacceptable pops and thumps. A good guide to finding a position for a microphone is to imagine a large cone (like a megaphone) with the pointy end attached to the mouth and facing straight ahead for about half a metre or so. A microphone will usually give good results if placed anywhere on the circumference of the wide end of that imaginary cone (see Figure 1), although the furniture and room acoustics may make some positions better than others -- a point I'll return to shortly.
Each type of microphone has its own blend of characteristics, and selecting a microphone is as much an artistic decision as a technical one. Different situations, voices and acoustics will affect the decision, but there are no absolute rights and wrongs. Try different mics in different places, and resist the temptation to reach for the EQ to try to create a sound you like. As always, it pays to get the right microphone in the right place. Fistfuls of EQ rarely produce a satisfying solution and can often lead to other problems.
If you surveyed every studio in the land about their favourite microphone for voice work, probably 90% would specify a large-diaphragm condenser (or capacitor) microphone. The size of the diaphragm has a subtle but important effect on recorded quality, which seems to be particularly appropriate to the human voice. Most large-diaphragm condenser microphones also have switchable polar responses, which makes this kind of microphone a particularly appropriate choice for voice work. Suitable voice microphones include top-end ones such as the Neumann U87, TLM170 and KM(S)84, or the AKG C414. Condenser microphones with valve head-amps have always been a popular choice too, and although vintage valve microphones are both rare and expensive, there are an increasing number of modern valve microphones available, such as the Rode Classic.
If your budget doesn't stretch to a condenser mic, there are still a number of options available in the dynamic and electret categories. Electrets essentially operate in the same way as condensers, but are considerably cheaper to manufacture (which is handy, since their life-span is strictly limited!). Although most electret microphones have relatively small diaphragms (extremely small in the case of personal mics -- see 'Personal Mics' box), they share many of the same useful qualities as condenser mics, especially in their extended, smooth high-frequency response, which helps to convey naturalness and clarity.
Dynamic microphones have a lot to offer and should certainly not be ignored. The BBC, for example, often uses a ribbon microphone (whose design can be traced back to the 1940s) for voice work, especially in the studios of the World Service. The microphone in question is an STC/Coles 4038, which has a fantastically accurate figure-of-eight polar response and an extremely light ribbon diaphragm, exhibiting many of the same sonic qualities as a good condenser microphone. Typically, the 4038 is positioned between a pair of speakers (the human kind, not the monitor kind!), each being captured by their own figure-of-eight lobe (see Figure 2). The correct balance between the voices is achieved by moving the microphone along an imaginary line between the two people, closer to the weaker voice and away from the louder voice. This may seem a crude technique, but it produces excellent results, which always sound natural and well balanced.
Moving-coil microphones tend to have a slightly dulled sound quality compared to both ribbons and condensers, because all that copper wire glued to the back of the diaphragm slows it down a bit! However, this is not necessarily a disadvantage, and moving-coil microphones are often used for voice work, especially where robustness and reliability are important considerations. The ElectroVoice RE20, for example, is a large, robust moving-coil microphone which can be frequently seen in use as a DJ microphone in American radio studios. AKG D202s and D222s, and Beyer M201s are also very popular for voice work.
The one kind of dynamic microphone best avoided for the spoken voice is the stage vocalist mic -- designs like the Shure SM58 or the Beyer M88 (see 'Vocalist Mics' box).
The polar response of a microphone is determined by the mic's physical construction. There are basically just two forms: a sealed box with a diaphragm on one end (much like a drum) is called 'pressure operation' and produces an omnidirectional polar response (picks up sound equally well from all directions). This is the simplest kind of microphone to design and build, and usually provides a very smooth, extended and even frequency response. The other form of construction supports the diaphragm top and bottom, leaving it open to the air on both sides. This is called 'pressure gradient' operation and produces a figure-of-eight polar response (picks up sound well to front and back, but rejects sounds coming from the sides). This design is far from simple and suffers a whole host of mechanical and physical problems, which manifest themselves operationally as extreme susceptibility to handling noise, rumble, sensitivity to wind, and a tendency to emphasise low frequencies when close to the sound source (known as the proximity effect, or bass tip-up). However, these problems, though important, are overwhelmed by the advantages associated with a microphone which is able to reject sounds from specific directions.
The most common and popular polar response is undoubtedly the cardioid (picks up sound mainly from one direction) and, to a lesser extent, the hypercardioid. This type of polar response is created by combining both forms of operation (pressure operation and pressure gradient) within a single unit. Consequently, cardioid and hypercardioid designs also suffer from the handling, rumble, wind and proximity effects of figure-of-eight mics, although not usually to quite the same degree. So what does all this mean in relation to positioning a microphone to record a spoken voice?
Most importantly, anything other than an omnidirectional microphone will generate increasing amounts of bass the closer it is to the speaker. With a directional microphone designed for a flat frequency response at distances of a foot or so, the proximity effect can be deliberately used to create a 'warm' sounding voice, but use this trick with care, as increased proximity can also lead to increased popping. The classic deep and rich 'advertisement voice-over' sound relies directly on the proximity effect.
Other than making deliberate use of the proximity effect, directional polar responses should be chosen to control and reject sound reflections from the studio furniture, scripts (paper can reflect a surprising amount of sound), control room window and room acoustics. Consider all reflective surfaces carefully, then position the mic (and select its polar response, if you can) to achieve the best isolation between voice and reflections. I'm assuming here that the room sound is not an artistic part of the recording, which is normally the case -- the voice is wanted 'dry'. However, there are cases when the room sound is important (radio drama, for example), so the techniques should be modified accordingly.
In an ideal, well-treated room, reflections will not be a problem, and in this case, an omnidirectional microphone is usually the best choice, since its inherent design has far fewer compromises and almost always (in my opinion, anyway), sounds better than a directional microphone.
If there are two or more people talking, it may be possible to arrange them so that they share the useful part of the polar response of a single microphone (as in the case of the BBC's 4038 ribbon mic). This should be the first approach, because the more microphones there are, the more room sound will be heard and the harder it will be to balance them.
Often it is necessary to use additional microphones, but try to avoid having all microphones fully open at the same time. Ideally, only one microphone should be 'open' at any time, the others being partially faded down (typically about 5 or 10dB) to avoid distant pickup and excessive coloration. Don't close the other mics completely, because as they are opened and closed, the room sound will change.
Driving the faders in this way is an acquired skill, and if a fader is opened late, it is usually extremely obvious. The first part of a successful technique is to always watch the person (or people) not currently talking. There are usually subtle clues when someone is about to talk, perhaps in their eye contact with the current speaker, perhaps in the way they draw breath or wet their lips, and these clues will allow you to anticipate them and open the correct microphone. Secondly, you need to develop a physical technique which allows you to rapidly open and close multiple faders with very precise positioning.
These techniques must be practised over and over again to develop the necessary level of skill, but it is worth it. Automatic devices like gates and expanders can not achieve anything like the same level of subtlety, and usually do more harm than good, because they always miss the initial transients as they open a microphone channel!
Spoken voice recording usually involves the speaker sitting at a table with a script, but don't be afraid to ask the speaker to read standing up -- this usually improves breathing and posture and the results can often be heard. It also avoids the reflective problems of a table.
If a table is to be used, it should be chosen with care, since it will be perfectly placed to reflect the voice back up to the microphone, resulting in coloration. Also think carefully about where to put it in the room, and try to keep it away from walls (or control room window) to avoid horizontal reflections back to the microphone.
A professional acoustic studio table is not load-bearing (never try to sit on one!) because the top is made from perforated hardboard covered in thick felting. The felt absorbs high frequencies and the low frequencies pass straight through the table, so nothing should be reflected back up to a microphone. If the script is laid on the table, this will form a reflective surface, so professionals usually use a script rack to angle the paper and aim reflections away from the microphone. Again, think carefully about where you position the mic to avoid reflections, and also to avoid wind-blasts and other noises from turned script pages.
The script rack has another important advantage, which is that it helps to keep the reader's head up. If the reader is continually moving his or her head whilst reading (looking up when familiar with the words, but looking down to read from the script), the sound quality will change dramatically. If you want proof, listen to the sound quality changes on a television news-reader's voice when they read out-of-vision (narrating a filmed insert, for example). When a TV presenter is in-vision, they usually read from the tele-prompter on the front of the camera, but when out-of-vision, most look down to their scripts.
If there are two or more people talking, position them to provide a comfortable eye-line, and arrange the microphones so that their polar responses reject the unwanted speakers. The further apart they are, the easier it will be to balance the microphones, but don't take this to extremes -- increased distance will tend to make them raise their voices, causing stronger room reflections!
As with most mic techniques, if the right mic is in the right place, with the performer in the right part of the right acoustic environment, the sound will be perfect! But in an imperfect world, some form of signal processing may be necessary.
Gentle compression is often used on a spoken voice to help improve intelligibility and reduce the dynamic range slightly (especially in the case of radio drama). Try to avoid limiters if possible, as these tend to sound rather obvious because of the unnatural step-change in dynamics. If you do use a limiter, keep the gain reduction below about 4dB to avoid the most obvious audible effects. Gentle compression at 2:1 or 3:1 is far better (going up to 5:1 if the speaker is particularly dynamic!) with a low threshold so that everything is squashed gently. Personally, I prefer to insert the compressor ahead of the equaliser whenever possible, but if the equaliser is up-stream of the compressor, be aware that altering the EQ settings will upset the compressor thresholds! This is not such a problem with gentle compression slopes, but can become significant if you are using limiters.
EQ should always be used with care in any circumstances, but particularly so with a spoken voice, because coloration can be detected so easily. A high-pass filter to remove deep bass is often very helpful in minimising the audible effects of popping and wind-blasting, and removing any exterior noises such as traffic, air-conditioning and so on (there is very little voice content below about 80Hz in most cases). A very gentle upper-mid lift around 2-6kHz can help to improve clarity and brighten up moving coil mics if needed.
Beware de-essers! I have never found them particularly effective at reducing spoken sibilance, and they're extremely time consuming to set up. It's far better to re-position the microphone, change the mic type (for a moving coil, perhaps) or even change the speaker if you have a serious problem with sibilance!
Reverberation is not often needed on a spoken voice (other than for dramatic effect), but small voice-booths are often excessively damped and sound extremely dead. A tiny amount of a 'small room' reverb program may help to make the sound more natural -- but be very careful not to overdo the effect. If other listeners are aware of it, you've used too much!
The spoken voice is a very critical sound source, purely because everyone knows what a voice is supposed to sound like. Mechanical and electrical colorations are very obvious, even if the particular voice is unknown to the listener. As always, take the time to experiment with different microphones in different positions, and try to get the best possible sound from the studio environment before resorting to signal processing.
If you've ever watched TV, you will undoubtedly have seen people wearing 'tie-clip' or 'personal' mics. In general these are used for their aesthetic appeal rather than for reasons of high sound quality, although to be fair, many current personal mics are actually pretty good.
The chest cannot be considered as the ideal position for sound pickup, and manufacturers of 'personal' microphones have to doctor the frequency response of the microphone in a fairly serious way to create an acceptable sound. There are two major problems: firstly, the close proximity to the chest cavity produces a disproportionate amount of low-frequency sound; secondly, the microphone 'sees' very little of the sound output from the mouth, so would tend to sound dull and inarticulate. Typically, personal microphones have an extremely 'non-flat' frequency response, to overcome this problem, generally rolling off low frequencies (to regain an appropriate balance) and boosting middle and high frequencies to extract the best articulation and detail.
Place this type of microphone anywhere other than on the chest and the sound quality will change radically as a result. The classic example is probably in theatre and costume drama use, where microphones are often hidden in the hair or wigs. In this situation, much the same high-frequency boost characteristic is needed as if it were mounted on the chest but there is very little low-frequency pickup from the head, and the mic's LF roll-off exacerbates the problem, often needing extreme amounts of EQ from the mixing desk!
One thing you may have noticed about tie-clip microphones is that they often appear to be mounted upside down (with the wire coming out of the top rather than the bottom). Virtually all personal mics are omnidirectional, and their very small size means that the frequency response remains remarkably constant from the front or back, so mounting the mic upside down does not change the sound quality in any appreciable way. However, this does protect the diaphragm from wind blasting. Remember my earlier comment about wind-blasts from the nose, which travel straight down? A personal mic is almost inevitably positioned directly in the line of fire but by turning the microphone over, you can protect the diaphragm from this kind of abuse!
The voice-over studio is a lonely place when there's only one performer. Always try to let the performer know what is going on in the control room over a talkback system. Many engineers set up an open talkback system between them and the artist as soon as the tape stops rolling, just so the performer knows that there's still someone out there! Similarly, it is important that the artist knows when his or her words are being recorded and when the reading is a rehearsal.
Ideally, the studio configuration will allow the performer to see the control room through a window. However, when you set the studio up, avoid positioning the artist so that the engineer (or producer) falls directly on the same eye-line as the script. Sudden movements in the control room (or inappropriate facial expressions like smiles and frowns) are likely to cause stumbles in the reading or performance. It's far better if the performer has to turn his or her head to the side to see into the control room -- not so far that it is difficult or uncomfortable, but enough to prevent the view from falling within sight whilst looking at the script.
Vocalist dynamic mics like the Shure SM58 and Beyer M88 are designed to be used extremely close to the mouth, in order to get enough separation between the voice and spill from the backline instruments. They are also of cardioid or hypercardioid polar response to reject foldback from the monitors.
The polar response means that they inevitably suffer bass tip-up or proximity effect (made worse by using so close to the mouth) and consequently, manufacturers engineer the frequency response of this type of microphone to fall away rapidly below 200 or 300Hz, relying on the proximity effect to restore an acceptable bass balance in use.
If this kind of mic is used more than an inch or two away from the mouth, there is little proximity effect, so when used at a more discreet distance, the sound becomes extremely thin and lacking in bass. Another reason why these microphones are not suited to spoken voice work is that they often have uneven mid-band frequency responses, which are intended to help vocal clarity in a PA situation, but merely add coloration to a spoken voice.
A studio designed specifically for recording speech (or a dedicated voice booth) has two key features:
Firstly, it should be relatively free of reflections so that the recorded voice has no discernible reverberation. Studios designed for this kind of work (as opposed to any kind of music studio) would typically have a reverberation time of about 0.3 seconds.
Secondly, the studio should be free of any ambient or external noise (typically specified as a 'Noise Criteria' of NC20 -- which, loosely translated, means 'extremely quiet'!). A quiet studio is very important for speech work, because the spoken voice is a relatively quiet sound source, and there are unlikely to be other sounds to mask background noises such as road traffic.
Creating a truly quiet studio can be difficult and extremely expensive. Box-within-box constructions, where a small studio or voice booth is built within an already quiet room can be a great help, but are far from cheap. A more pragmatic solution might be to find a room in a quiet location in the first place, and then make the actual recording in the dead of night! Wherever you end up recording the voice, pay particular attention to any kind of background noise -- particularly if you will be editing re-takes into an earlier master recording. Abrupt changes in background 'atmospheres' are usually pretty obvious to the listener.
Fortunately, dealing with a reverberant room is a little easier and cheaper. An average-sized, well-furnished living room will already have a reverberation time of about 0.4 seconds, and making the room even deader is simply a matter of covering an increased proportion of the reflective surfaces with absorbent material (try drawing the curtains, for example). Fortunately, for the human voice, the most critical frequency range, as far as reverberation is concerned, is between 500Hz and 2kHz. Carpets are pretty good at soaking up energy in this region, so you can easily control reverberation by hanging old carpets on the walls (it will give your studio the 'Medieval Castle' look too!).
If you want a more professional look to the walls, there are a wide variety of acoustic foam tiles available (Canford Audio sell both their own make and the highly recommended German Illsonic tiles -- call them on 0191 415 0205).
If you don't want to go to all this trouble, there are a couple of alternative ideas that might work to an acceptable level. The first is to place the absorbing sides of a couple of acoustic screens at the sides of the microphone, forming a 'V' shape with the wide end where the performer will be. The idea here is to stop any reflections from getting back to the microphone. (If you don't have acoustic screens, try the old carpets again!). One last idea (and I don't recommend you ask a professional voice-over artist to play along with this one) simply requires a large and heavy overcoat. Drape the coat over both the performer and the microphone (a second mic-stand to hold the coat clear of the microphone is a good idea) and voila!
Before you roll about on the floor in fits of laughter and total disbelief, I've seen many seasoned professional reporters use this very technique in the back seat of the crew car, when recording their voice-overs to a late news story. One last word of advice though: have a torch handy.... it's hard to read a script in the dark under a heavy coat!