You are here

Controlling Synths With Your Voice

Opening Up Your Creative Potential By Robin Bigwood
Published September 2022

Pitch to MIDI in PreSonus’ Studio One is a two‑stage process. First of all Melodyne, integrated at the track level, equips audio events with pitch information. Second, the audio event is dragged to an instrument track. Other DAWs offer variations on this theme.Pitch to MIDI in PreSonus’ Studio One is a two‑stage process. First of all Melodyne, integrated at the track level, equips audio events with pitch information. Second, the audio event is dragged to an instrument track. Other DAWs offer variations on this theme.

We’re used to playing and sequencing synth parts in our productions, but why not sing them as well?

It might seem obvious to say it, but the human voice is often a uniquely expressive element in recordings and musical productions. Only a few instruments, mostly acoustic, get anywhere close to its capacity for inflection or (to use an equally valid term from the synth world) modulation. Pitch, timbre and intensity can all be exquisitely controlled, and that’s to say nothing of the extra layer of communication that comes from word‑generated meaning and imagery.

How can we make use of that potential, in electronic‑leaning, synth‑based productions, even if (and I speak from personal experience here) we’re not necessarily a good singer ourselves? That’s what this article is all about. We’ll look at some of the interesting gear out there, and a range of approaches that can open up creativity‑loosening possibilities in this area.

Playing Synths With Your Voice

Getting straight to the heart of the matter, it’s possible (and can be really liberating) to play synth parts with your voice. Or for that matter many other melody instruments too, like sax or flute. You regard the voice or instrument as a ‘front‑end’, an alternative to a MIDI keyboard, or use an existing recording of it in a DAW track.

In lots of modern DAW software, a pitch‑to‑MIDI ability is built in as an offline process, and the results are often really good. You start by recording your voice (or guitar, sax, kazoo... OK, maybe not kazoo) to an audio track. Perhaps after some kind of analysis takes place the pitch information can then be extracted to a MIDI/instrument track. The process varies from DAW to DAW: in Ableton Live, for example, the MIDI track, data and a placeholder virtual instrument are all created for you with one command; in others you might have to drag an audio region to a MIDI track and instantiate or configure an instrument of your choice.

Pitch‑to‑MIDI tools... make for intriguing and potentially fruitful alternatives to MIDI keyboard controllers.

For some tasks, this could be enough for great results. A simple monosynth line, or a decaying bass sound, could work straight off the bat. For more overtly shaped, expressive synth lines though, you might well want to do some additional work on the pitch‑to‑MIDI generated data.

For example, even very smooth, legato‑style singing (or playing) will tend to generate individual MIDI notes whose lengths abut each other at most. As a result nuance such as legato and portamento transitions are lost, not to mention vibrato, bends, and variations in intensity.

On that first point, it becomes an issue if you’re driving a synth sound with any obvious attack, like a ‘wow’ filter sweep or short‑lived percussive element. You’ll almost certainly get a re‑trigger on every detected change of pitch, even if the sung notes were connected in legato fashion, or you just introduced some subtle bends or fall‑offs.

So a good solution for expressive results is to use a monophonic synth or other solo sound, and switch in its legato option. Then looking at your MIDI data in a typical ‘piano roll’ editor consider which pairs or groups of notes should be connected without a retrigger of envelope generators or a sample start, for best musical effect. Extend these notes’ ends a little past the start point of their immediate neighbours to the right: that will be enough to cause the legato transition on the synth. Adding some portamento/glide in the synth patch can add a somewhat vocal‑like ‘swoop’ too, that you can trigger at will.

As for additional expression and shaping, well, some DAWs do extract level information from your audio and pass it on in the form of varying note velocity. It’s a start, but you might still choose to write in MIDI CC or automation data for synth parameters, such as volume, filter cutoff, or vibrato depth and speed. In no time at all that could give you really dynamic results.

Offline pitch‑to‑MIDI usually won’t attempt to preserve articulation in the original audio like legato or portamento transitions. In this example three notes have been manually extended beyond their neighbours’ start points. With a suitable monosynth sound this reinstates some smooth upward transitions for pairs of notes, which makes them sound much more like the audio source. I’ve also manually added some modulation wheel data, for some vibrato and swells on the main notes.Offline pitch‑to‑MIDI usually won’t attempt to preserve articulation in the original audio like legato or portamento transitions. In this example three notes have been manually extended beyond their neighbours’ start points. With a suitable monosynth sound this reinstates some smooth upward transitions for pairs of notes, which makes them sound much more like the audio source. I’ve also manually added some modulation wheel data, for some vibrato and swells on the main notes.

Get Real

Offline pitch‑to‑MIDI is mightily useful, but the next level for audio control of synths is to do it with plug‑ins and other tools that analyse pitch in real time. They let you work live, which definitely feels more experimental and exciting, and the immediacy of feedback can draw you into creative areas outside of the well‑trodden paths around your keyboard controller. Having said that, you can of course also use them on pre‑recorded material in your DAW tracks too. As I write this I’m not aware of any DAWs that natively incorporate dedicated real‑time pitch tracking, so third‑party is the way to go.

Simple to use, but offering staggering potential for sound variation, are a number of plug‑in‑based solutions. For real‑time use you’d instantiate them on an audio track or aux channel and route audio to them from a mic in your studio. Using a small audio buffer size your computer will ensure a crisp response, and most plug‑ins of this type need a healthy input signal from a good neutral/clear vocal mic too, in a dry/dead room with no background noise.

It’s also essential to use headphones if you’re working this way: any bleed of the output of the plug‑in back into the pitch detector front‑end generally throws it into confusion. Having said that, it can be hugely helpful to hear your un‑effected voice directly somehow as you work, so you’re not pulled out of tune by the synth timbres produced. You can do this either by wearing only a single side of your headphones, or routing some dry input signal to the headphone mix in your DAW, via your audio interface or with an external mixer.

Waves OVOX plug-in.Waves OVOX plug-in.

A great plug‑in option is Waves OVOX, which is available in most plug‑in formats, and often for not too much money. It’s a combo of Auto‑Tune‑like pitch recognition (with selectable scales and so on, which really helps with accuracy), two synths/vocoders, and a generous provision of effects. With relatively few controls a huge number of sounds can be conjured, and what’s impressive is how much vocal nuance can make it through to the synth‑toned output: a bit like Auto‑Tune, when used with more naturalistic settings, OVOX will respect vibrato and subtle pitch inflections while keeping results bang in tune overall.

iZotope VocalSynth 2 plug-in offers a darker, more complex sonic flavour.iZotope VocalSynth 2 plug-in offers a darker, more complex sonic flavour.

A different, rather darker and more complex sonic flavour is offered by iZotope’s VocalSynth 2. Four of its five processing modules have synth‑like architectures, but in fact the character here is less electronic (in the analogue synth sense) and distinctly more cyborg. Arguably it’s more an extreme pitch‑tracking vocal effect than a vocal‑driven synth, and undoubtedly VocalSynth can (and generally does) preserve a good deal of phonetic intelligibility. For some jobs that’s a really good thing.

Native Instruments’ The Mouth, which runs in Reaktor and the free Reaktor Player.Native Instruments’ The Mouth, which runs in Reaktor and the free Reaktor Player.

Weirder — brilliantly so — but flexible enough to be used in conventional and very beautiful‑sounding ways is Native Instruments’ The Mouth, which runs in Reaktor and the free Reaktor Player. It’s now more than a decade old, but still feels fresh and relevant. The Mouth takes a monophonic audio input and can create from it synth lines with up to eight voices, harmonised to a chosen key and scale, as well as monophonic bass sounds and MIDI‑driven vocoder and resonator timbres. There are some simple effects (delay and compression) to boot. As an aside, it’ll do exciting things with beatboxing and drum loops too, via a dedicated Beats mode. All the internal sounds are preset, and there aren’t many, but variation is provided by a bank of performance macros, and a handful of synth parameters. It absolutely invites experimentation, and for me any plug‑in that has a ‘Nonsense’ parameter is automatically of interest.

Theremouth plug-in, a pitch-driven Theremin.Theremouth plug-in, a pitch-driven Theremin.

Also in Reaktor (full version only this time) is the one‑trick‑but‑a‑lovely‑trick Theremouth, available from boscomac.free.fr along with many other gems. It’s a pitch‑driven Theremin, and I would defy even electronic‑savvy listeners to spot this wasn’t the real thing, without forensic analysis. The unquantised pitch recognition system used is amazingly responsive, allowing for very subtle vibrato and portamento. Use it for effects almost impossible to recreate on a keyboard‑based synth, unless it has a huge pitch ribbon or similar.

Finally, for more of a novelty try BitSpeek from SonicCharge. This affordable plug‑in applies pitch tracking to the tone‑generation technology that underpinned the Speak & Spell educational toys from the 1980s. There’s not huge scope for variation in the synthetic timbre, aside from introducing noise, grungy downsampling artefacts and strange ‘shuttered’ rhythmic fluctuations (which can be tempo sync’ed). But the effect is magnetic, full of historical and cultural resonances, and it’s often a great alternative to more traditional vocoder or talkbox sounds.

SonicCharge BitSpeek.SonicCharge BitSpeek.

Techniques

What are some good ways to put pitch‑to‑MIDI tools and pitch‑driven synths to use? As I mentioned elsewhere, they make for intriguing and potentially fruitful alternatives to MIDI keyboard controllers. There’s a role for them in creative work even if you do also rely on a keyboard, because generating material by singing (or playing a monophonic acoustic instrument) forces you to think differently: more about melody and riffs than block chords for one thing, with the potential to cut you free of the keys and constructions you might normally gravitate towards. Somehow the effect is always intrinsically ‘looser’ too.

On a much more practical level, DAW‑based pitch‑to‑MIDI can be a good starting point for generating music notation, for musicians who aren’t conversant with it. It’s vital here to record to a click in your DAW, so that there’s a proper metrical framework, but that can be enough to generate useful scores to share with collaborators.

In typical vocal‑based pop production, layering is an especially useful technique. You might reinforce certain parts of a production: for example, taking a normal chorus‑section vocal and layering it up with a synth or string line. Or doubling an electric bass with sub‑bass from a monosynth. To be able to do this with one or two clicks is super handy.

You might also try replacing or layering sections of a vocal with an otherworldly noise‑like timbre, almost like a synthetic whisper. OVOX and VocalSynth 2 are both good at this, but you could arguably drive any suitable synth with an offline DAW process, a pitch‑to‑MIDI bridge, or one of the hardware solutions I mention. Analogue synth self‑resonant filter sounds, with some noise oscillator input too, and perhaps not even tracking accurately, can lend a particularly spooky atmosphere.

Real-time Pitch-To-MIDI

Yet another approach to real‑time pitch‑to‑MIDI is to use a standalone application dedicated solely to the task. These let you drive virtual (and often hardware MIDI) instruments of your choice alongside or separately from your DAW.

Imitone’s speed, accuracy and flexibility make it a stand‑out application‑based option.Imitone’s speed, accuracy and flexibility make it a stand‑out application‑based option.About the most affordable (free with a time limit, or £8.99 without) is Beat Bars A2M, a Polish company that make a range of intriguing, mould‑breaking hardware and software controllers. This Apple‑only product runs as an AUv3 plug‑in (in Audiobus, AUM and apeMatrix on iOS) and standalone (iOS and Mac OS) with identical capabilities: pitch recognition via two selectable algorithms, adjustable latency/accuracy trade‑off, and some basic control over input level and MIDI velocity output. It’ll transmit MIDI to virtual ports that other apps will see, a Network MIDI bus, or any hardware instruments connected to your device.

Imitone, available in two editions ($30 or $75, both very capable), is significantly more sophisticated and accurate, and nicer to use. This colourful and clear standalone application for Mac OS or Windows offers fast and accurate pitch/note recognition, includes optimisations for different voice ranges, modes for handling legato pitch transitions and vibrato, and can generate varying velocity or CC11 values (respectively) from expressive percussive or sustaining input. It has some onboard sounds, so can truly be used standalone, but can play hardware synths connected to your computer, or in your DAW via inter‑application MIDI. Recommended.

Other players in this field include the more expensive Jam Origin MIDI Guitar 2, which despite the name can still work with the voice. Also Vochlea Dubler 2, which can be bought in a bundle with a specially calibrated mic and includes ‘percussive recognition’ for triggering samples or drums from beatbox‑like vocal input.

Hardware & History

Accurate pitch‑to‑MIDI is potentially a computationally‑intensive process, and it’s rarer to see it in hardware. Some vocal‑specific and multi‑effects pedals do something very like it as a basis for intelligent harmonisation, but for dedicated units you have to go back to the 1980s and the likes of the Roland VP70, Fairlight Voicetracker and Digitech/IVL Pitchtracker: the latter was technically a guitar‑oriented product, but some vocalists used it too. All as rare as hens’ teeth nowadays.

If you must work in hardware, first check out the Sonuus G2M: it’s a little battery‑powered box that’s guitar‑oriented but will also work with the voice and other instruments via a suitable preamp.

For straight synth control, without a more general MIDI output — in some ways the hardware equivalent of the plug‑ins I’ve mentioned — there are a few options. The Korg MS20 (and many of its modern clones and miniature versions) will track an audio input, often amazingly well, fed into its External Signal Processor section. In Eurorack check out the luxury‑option Sonicsmith ConVertor E1, the slightly more affordable Analogue Systems RS‑35, and the pitch‑to‑CV modes in Expert Sleepers’ ever‑handy Disting module.