If you're a musician, podcasting is the perfect application for the studio skills and equipment you already have.
The term 'podcast' has been around for 15 years, and the basic concept dates back to at least the turn of the millennium, but the last few years have seen podcasting really hit the mainstream. For many, the unprecedented success of award-winning true-crime show Serial really blew the lid off in 2014, and we've since seen concerted development by mainstream media outlets such as the BBC, NPR, the New York Times, the Wall Street Journal and Audible, as well as an explosion of niche channels developed by grass-roots enthusiasts. But the podcasting scene still feels a bit like the Wild West, with plenty of opportunities for creative new shows to carve out their own piece of this rapidly expanding audience.
In this article, I'll provide some down-to-earth tips for SOS readers who are considering putting together a podcast for the first time. The good news for project-studio owners is that your music equipment and technological skills give you a massive head start. Indeed, I only got into podcasting myself a few years ago, but have managed to produce a couple of different monthly podcasts — the Cambridge-MT Patrons Podcast (www.cambridge-mt.com/podcast) and Project Studio Tea Break (www.projectstudioteabreak.com) — without the benefit of any traditional broadcasting experience.
The simplest podcast format is where you speak directly to your listeners about a topic. This requires minimal gear: a mic, a stand, and a cable; an audio interface with at least one mic input; and some DAW software. And if you're recording direct from your studio chair, you can combine the workflow of recording and editing in a very natural way. Just record a few phrases; edit and quality-check them; re-record and patch up anything substandard; then rinse and repeat! You quickly learn the habit of backtracking a few words whenever you make a mixtape... er... whenever you make a mistake, so that it's easy to edit seamless repairs. It also becomes second nature to favour slice-points in gaps, breaths, or noisy consonants, and you soon realise how much quicker it is to rerecord dodgy sections than spend ages editing audio snippets around.
The mechanics of speech recording should be straightforward for SOS readers, since it's not a million miles away from capturing sung vocals, but a crucial difference is that the voice will usually be more exposed in podcasts. This means paying greater attention to background details, so large-diaphragm capacitor mics are a decent first choice, on account of their typically high output and low noise floor. A down side is that many such mics are designed to emphasise a vocalist's high frequencies, bringing a danger of excessive sibilance and lip smacks (those little clicks you get when the speaker's lips and tongue briefly stick to each other), both of which are more distracting in the absence of a backing band!
One way to square this circle is to use a dynamic mic (the Shure SM7B and Electro-Voice RE-20 are popular choices), minimising the noise floor by miking up close and perhaps using an inline gain booster such a Cloudlifter, sE Electronics DM1 or Triton Audio FetHead (pictured) to help out the mic preamp.
The dynamic mic's heavier diaphragm will reduce lip noise and sibilance, and typically gives a more rounded 'radio DJ' tone that many people like. I prefer the high-frequency 'air' and detail of a capacitor mic, so I use the least forward-sounding of my large-diaphragm mics instead, miking above my mouth height to reduce sibilance (which tends to be worst in a horizontal plane at lip height). The secret to keeping lip smacks at bay is to take a sip of water every few phrases so that the vocal apparatus remains well hydrated. But the biggest reason I like using a capacitor mic is that its higher sensitivity allows me to work further away without noise problems: maybe 12-18 inches, with the mic around forehead height. This keeps the mic out of my line of sight while editing, but also allows me to move around a fair bit without the mic's proximity-effect bass boost or off-axis frequency-response variations unduly affecting the vocal tone. For me, this makes it easier to talk freely and naturally while recording, without causing mix difficulties later on.
Whatever gain-management precautions you take while recording, noise-reduction processing can prove beneficial at the editing stage. Most real-world project studios aren't particularly well soundproofed against external noise (especially from traffic), and there will frequently also be sources of unwanted noise in the room, such as fans, central-heating systems, lights and other appliances. Fortunately, products like iZotope's RX Elements make removing steady-state background noise (hiss, buzz, hum, and the like) straightforward and affordable. My main advice here is to remember, when recording, to capture some of the background noise in isolation, so that you can use it to train the noise-reduction algorithm for the best results. (Don't leave this until later, as some noise sources will vary in character at different times of day).
One way to make a podcast feel instantly more polished is to add theme music — and this is where being a recording musician really works in your favour! Not only does using your own music sidestep the potentially thorny issue of licensing fees, but it also allows you to generate multiple variants of your theme music for different purposes. You might have an intro theme ending with a simple rhythm-section 'bed' that fades away gradually under your opening comments, and a selection of short five-second 'stings' to place between your podcast's different thematic segments — things like readers' questions, product tests, interviews, quizzes and (if you're lucky!) advertising spots. And, finally, you might have an outro version of the theme, with a slow fade-in and a strong, clear ending to round out the show. This is one area where music producers are uniquely placed to set their podcasts apart, both because plenty of new podcasts can't afford to use music at all, and because your spoken content will be easier for the listener to digest if you use regular musical 'punctuation' to give listeners a bit of a breather from the sound of your voice!
One of the most frequently cited tenets of online content generation is that your output should appear regularly, so every extra 'production hour' you spend per episode increases the likelihood that you'll have to start postponing or skipping episodes...
There are also plenty of shows that make much greater use of musical underscore and sound design, especially in the documentary/drama space. However, those are typically much more editing-intensive, so I'm not sure I'd recommend that route for your first foray into podcasting — save it for when you're already comfortable with the basic mechanics and can better gauge the work involved. After all, one of the most frequently cited tenets of online content generation is that your output should appear regularly, so every extra 'production hour' you spend per episode increases the likelihood that you'll have to start postponing or skipping episodes.
Adding music has some loudness implications, given that no clear loudness standard has yet been agreed for podcast files, and there are plenty of podcast players that don't yet implement loudness-normalisation routines. In general, my advice for choosing a loudness level would be to download some of your favourite podcasts, run them through a loudness meter, compare their loudness-matched sonics, and then take your cues from that. (If you don't already have access to loudness metering, Youlean and Melda both produce decent freeware options, but my own preference is Klangfreund's affordable and more fully-featured LUFS Meter.) However, when there's music in your show, an important additional consideration is how you set the relative levels of the voiceover and music.
You see, it makes sense to mix speech and music elements at similar subjective loudness levels, so that listeners won't have to keep adjusting their volume dials during your episode. But music typically delivers significantly higher peak levels than speech, and if you subsequently try to use mastering processing to match your voiceover levels with mainstream speech-only podcasts, you'll likely squash your music examples to a pulp! So if you plan to use music tracks in your podcast and you want them to sound good, you'll have to keep the speech level lower. For instance, my Cambridge‑MT Patrons Podcast relies heavily on audio examples to illustrate mixing techniques, and I deliberately keep the voiceover level quite low (around -18dB LUFS), so that I can avoid using loudness processing, which would mangle the dynamics of my audio examples. On the other hand, Project Studio Tea Break is a more traditional (and rather rowdy!) speech-based discussion format, which means I can afford to have the voiceover levels quite a lot higher, around -13dBFS.
[Since this article was originally published, the podcast industry has now pretty much standardised on -16 LUFS as the optimum level - Ed.]
The main problem with one-person podcasting is that it's limited by the creativity of the host, and there's a risk of monotony, on account of the single voice and single point of view. For this reason, one of the most popular formats involves two people conversing about a given topic. One variant is where they're both permanent co-hosts, for example in mainstream shows like The Dollop history podcast and the BBC's Curious Cases Of Rutherford & Fry, or (in the music-production niche) Ian Shepherd's The Mastering Show or The UBK Happy Funtime Hour.
Although this kind of show can enliven its subject matter by virtue of conversational interplay and personal chemistry between the co-hosts, the trade-off is that you have to develop new content every episode to keep the listener's interest. An alternative two-hander format that mitigates this is where a regular host interviews a series of guests. Because each participant will naturally bring their own perspective to the table, you continually get fresh content without having to research or script very much for yourself. Mainstream shows such as The Joe Rogan Experience or Marc Maron's WTF provide good examples of this, while shows like Working Class Audio and Recording Studio Rockstars — and SOS's own new podcasts, of course — cater directly to the project-studio niche by featuring a wide range of audio engineers and producers. The down sides here are the logistical challenges of finding and scheduling a regular supply of suitable guests; there's no such thing as a free lunch!
Whichever model you go for, easily the most common recording method is to use some kind of Internet telephony service, like Skype or FaceTime. The simplest approach is to use a telephony service with built-in recording functionality, and this may be the only practical option for many host-guest scenarios — you may be an audio specialist, but your interviewee may not! The main thing in that case is just to make sure that both parties are using headphones for the call, so as to avoid problems with echo-back (when your voice emerges from your guest's phone speaker, re-enters their phone mic, and you hear it repeated back to you following the system's transmission delay).
If your guest is techno-savvy enough to be able to route a proper vocal mic through their call software, that's a bonus, but the sound will still be at the mercy of their recording technique/environment and the audio-streaming data compression. My main advice would be to ask your guest, if possible, to avoid/reduce any obvious sources of background noise and to hang up a duvet behind them (an open clothes closet works well too) to reduce room-reverb pickup. Duo podcasts tend to feel most natural and intimate when the listener can imagine both protagonists in the same physical space, and that illusion's a lot trickier to achieve if one side of the conversation is swathed in tumble-dryer noise or honky small-room ambience! This is also a situation where, if asked for technical input, I'd normally suggest the guest adopts the 'eating a dynamic mic' recording approach, simply because I think there's less to go wrong with that method, especially in untreated acoustics.
To be honest, though, I prefer to record my own duo podcast in a different way: my co-host and I track our voices independently to our own local DAW systems, leaving the telephony only to fulfil a communication function — we use the little earbud/mic headset for the phone call, but each of us also has a separate studio mic actually picking up our voices for the podcast recording. That way, at the end of the call, we've each captured just our side of the conversation, and my co-host can send me a WAV of his contribution to line up against my own for editing purposes.
This offers a few advantages. It allows us to keep the audio recording software independent of the telephony software, so the audio quality and reliability of the Skype connection don't compromise the production values of the podcast. In fact, I also like to use a completely separate device for the telephony, rather than running it on my DAW system, to reduce the likelihood of driver conflicts between the different applications. Trust me, you really don't want to have to re-record any conversation on account of audio glitching or computer crashes, because it's a nightmare trying to conjure any sense of spontaneity the second time around.
Having each voice on its own track, free from spill or overlaps, gives you much greater freedom at editing time. I find this useful when compensating for the time lag from a poor telephony connection, because it means I can edit out all the awkward pauses and overlaps that inevitably result from that rather unnatural conversational situation. I'd recommend recording the telephony audio as well, though — this not only provides a guide for sync'ing the two hi-res audio files, it can also save your bacon if one of the DAW systems crashes mid-take or either person forgets to press Record! On one occasion, I was able to salvage a podcast take during which my DAW PC blue-screened, simply by re-recording my side of the conversation (which was captured on the telephony audio) at the editing stage. Yes, it inevitably involved some slightly hammy voice acting on my part, but it was still preferable to trying to restage the whole unscripted segment from scratch.