Phonem sees wavetable synth pioneer Wolfgang Palm turn his attention to the tricky problem of vocal synthesis.
Synthesising the human voice — convincingly, anyway — is notoriously difficult. Although it’s true we’ve progressed a long way since Stephen Hawking began using his ‘Perfect Paul’ voice, the business of making a computer speak, let alone sing, remains a subtle and complex challenge. It shouldn’t be a surprise that a subtle, complex and — inevitably — challenging solution is required; one example is Phonem, the latest P from PPG.
Phonem is named after phonemes, which are the smallest discernible chunks of any language. By stringing phonemes together we construct words, which is simple enough in principle, but let’s see how the reality compares...
After a small download and straightforward installation, you’re ready to go. In appearance, Phonem resembles PPG’s other apps/plug-ins, and my impression is that it’s slightly more CPU-hungry than either Wavemapper 2 or Wavegenerator. For now, it’s confined to VST and AU formats (32- or 64-bit hosts), and the question of its eventual appearance on iOS is mere idle speculation.
The PPG revival began with the iPad and this has set the tone for all subsequent releases on Mac and PC. It explains the on-screen keyboard, which is an integral part of Phonem whether you can touch it or not. There’s also a ribbon controller and two XY pads, and it’s surprising how often, even with a mouse in hand, you keep reaching for them. Since they are valuable modulation sources, it’s worth assigning them to external MIDI control, although there’s no MIDI Learn function as such, just a text file to be edited manually.
Phonem is deep, blue and faintly intimidating so, like a total wuss, I sought instant comfort in the factory patches. I found much to admire in the Kraftwerkian robot voices, strange vocal snatches, weird choirs, whispers, resonant drones and synths. Synths? Well, yes — Phonem isn’t constrained by a human vocal tract and can extend its output into synthetic tones easily. Its multi-resonator filter is a thoroughly manipulable processor and, as this is a PPG, wavetable synthesis is also bundled in. It’s none too shabby either; wavetables and time-corrected samples (WTC and TCS) created in either Wavegenerator or Wavemapper 2 can be imported as sources, and there are other familiar synthesis elements such as envelopes, LFOs and a modulation matrix. The whole thing is topped off with reverb and delay effects.
After exploring the example sound banks, I was eager to try rolling my own. Without so much as a glance at the manual, I initialised a patch and clicked the ‘new text’ box, spontaneously beginning my first song since college. Unfortunately, my lyric-writing has not matured with age and my efforts were instantly rejected, the words not found. It transpires that (1) a (not very extensive) dictionary is built in and (2) it’s a very good idea to read the manual.
After some research, I discovered that the three important but unlabelled icons on the top bar are: timeline, parameter and effects/setup. The first of these is where the bulk of the action takes place, and due to the amount of detail involved, it’s further split into three sub-menus: Phonem, Track and Wave.
Having dropped the profanity and gibberish (for the time being), I started afresh with a quick line of Wordsworth. This gave me an ‘utterance’, represented on screen by overlapping frames of sound containing phonemes and pauses. Each frame can be selected for in-depth editing; click on one and it becomes the focus of the multi-resonator — that 12-peaked graphic positioned centrally. With the ‘trace’ option activated, you can watch the real-time actions of the resonators and excitation.
Had I wished, I could have entered a string of phonemes directly and generated my utterance with a click of the ‘Go Phon’ button. Either way, having selected an individual frame to edit, you can lengthen it, change the way it blends into the next frame, adjust the mix of the source excitation and noise, modify the roughness and more. Probably the most obvious step is to reshape it in the multi-resonator window. Adjustment of the three lower-frequency peaks makes fundamental changes that are comparable to the action of formant filters, while tweaking the upper frequencies provides fine-tuning of the voice character.
From a small menu on the right, you can select alternate phonemes from a list of over 50; they’re arranged into groups with names such as Sonor, Dipthong, Vplosive, Ufricative and Nasal. A dipthong is two vowels that blend into each other, but if you’re ever unsure of the sound or its usage, each phoneme in the list is accompanied by an example word. There should be enough material to phonetically construct any English, French or German word regardless of the dictionary, providing you have the time and patience.
Early in the proceedings you’ll probably wish to choose a voice to use. Phonem isn’t like, say, Yamaha’s Vocaloid, in which you begin with a voice with a specific personality. Those available here are more generic and include two female, three male and one ‘fx’. To hear your phrase in one of these alternatives, you need to reprocess it (by clicking the ‘Transf’ button) but initially there didn’t seem much to choose between them. For example, Female 2 is breathier than Female 1 and is designed for singing rather than speech, but that’s only a starting point; the eventual application is up to you. To her credit, Female 2’s delivery of ‘I wandered lonely as a cloud’ was intelligible enough, but not an immediate replacement for Adele or Enya. Once again I was reminded of the huge number of ways in which an individual might sing or say the line — and there’s no way to avoid thinking about them.
In the search for a natural speech pattern, the adjustable blend curve controls the transition between one frame and the next, ensuring each phoneme can begin abruptly or more gradually. I mentioned that you can freely modify the formants in the multi-resonator, but it’s well worth studying the built-in tips for improving an utterance. Unless you’re a speech therapist, these will almost certainly contain suggestions you won’t have thought of.
The utterance can be played back using a variety of techniques, and perhaps the easiest of these to grasp is setting the playback speed with the time envelope. The envelope is capable of time-stretching and reverse effects, and can loop portions of the phrase too. Armed with only this simple modulation source, Phonem can speak in tongues or deliver alien vocals with minimal effort. However, in order to have my utterance work properly in the context of a song, hitting a key and playing it through wasn’t going to suffice. Amongst the other mechanisms on offer, you can opt to play just the selected phoneme, or sections between markers or — and this is where it gets interesting — you can define a series of ‘anchor points’ to serve as suitable pauses.
The process of adding anchor points feels a little odd. They are always inserted before the active point, rather than at the position you click, and it can take a fair bit of zooming in and moving them around before the utterance is broken up correctly. Having done that you have the option to play up to each anchor point based on a Host trigger (a note), the Host BPM or a combination of these. Here again I felt it was slightly cumbersome, at least compared to a similar process in the Roland V-Synth, because to navigate through involves playing legato. This wouldn’t be an issue except that the next anchor point only plays when you release the old note, not when you play a new one. Still, with a bit of correction in Logic’s piano-roll editor I was able to spread my utterance so that it was spaced exactly as I wanted it.
Having dealt with the timing, it was time to address the pitch, which can be controlled either by the notes played in the host DAW or from an internal pitch track. The default delivery is a C3 monotone, but with a range of C0 to E8 you needn’t limit yourself to human capabilities. If you make use of the pitch track, you can draw in the transitions freely, which often sounded far more natural than playing notes on a keyboard. The pitches entered are independent of your host notes, and they needn’t be bang on either. Underneath is another track, the Curve Controller, which is available as a free modulation source.
So far our efforts have been to confined to a single utterance, but Phonem also has a song mode. It’s a bit basic though. Once activated, any utterances you’ve created will be played in sequence. They’re always played in alphabetical order and to manage them requires you to break out (via the System Browser button) and tweak the files directly using your Mac or Windows file manager. Via the usual methods you can rename or delete each utterance, and I found prefixing them with a number was the most straightforward way to get all the lyrics happening in the correct order. You’re advised to set up sub-folders within your own user directory to keep it neatly organised. An alternative method of song creation would be to run multiple instances of Phonem, each with its own voice and text.
Ordinarily, the source for vocal synthesis is a raw spectrum which is shaped in the Excitation Editor before it hits the filter. For more synthetic exploration, it’s possible to specify a wavetable instead. On the Wave page, up to 64 bars represent the harmonics of a single wave and up to 64 of these can be arranged in sequence. You can draw the contents of each frame individually, or import them from either of the previous PPG apps. You can then apply vocal filtering in exactly the same way to this new source; there’s even a wavetable envelope ready and waiting to perform the usual scanning functions.
If you turn to the Parameter page, you can modify the base values of the underlying oscillators, changing the balance and character of the noise in the excitation signal or giving your voice a chorus-like quality with the Beat control, which simulates the detuning of three oscillators.
The modulation matrix is a fast graphical means of connecting Phonem’s many sources and destinations, with each destination modulated by a single source. The matrix may be small but it contains important parameters such as aspiration, formants offset, excitation brightness, ‘vocal fry’ and more. There’s a certain amount of ‘suck it and see’ involved in modulating these, and the tiny bi-polar depth sliders aren’t easy to set precisely, but other than those limitations, the matrix is invaluable for bringing a voice to life. When you make a connection — eg. LFO1 to formants offset — the relevant LFO appears in the display ready for tweaking. This is particularly helpful since you can only view one LFO and one envelope at at time.
In total there are six envelopes and seven LFOs. Several of the latter are specifically designed for vocal purposes; there’s an LFO for vibrato, one for growl and one for flutter, each tailored appropriately. For example, the flutter LFO has an initial peak followed by a fair degree of randomness. To reduce clutter, the LFOs and envelopes that are pre-wired don’t appear in the matrix.
Lastly, the effects are minimal, just a delay, reverb and overdrive. It’s probably assumed you’ll use the effects in your DAW (I did), but the reverb doesn’t sound at all bad, the delay offers host sync and the drive provides a quick way of dipping your utterances in dirt.
A human voice synthesizer has a potentially scary number of parameters to juggle, and PPG have chosen not to hide these from the user nor simplify them by adding programmed voice personalities. As a result, Phonem is immensely powerful but also quite complex, with an interface that takes time to master. During the review period, I experienced the occasional CPU peak and unexplained crash, and it was only with a late revision to the manual that I began to understand how the program manages all its resources. Even after reading the manual and watching the online tutorials I still find it time-consuming to create utterances of more than a few words. Assembling these into finished songs is even tougher, not helped by the fairly basic song mode.
With patience, Phonem can be taught to sing, speak, whisper or rasp like a 40-a-day smoker, and it’s a relatively painless process to coax forth Kraftwerk-style robots or alien chatter. Perhaps the most instant rewards come when you aren’t in pursuit of conventional speech or intelligible vocals at all; Phonem can be a rich source of shifting atmospherics, ghostly wails and all kinds of resonant, choral pads. If you have any interest in unusual, sometimes extraordinary vocal textures, you should check it out.
- A powerful and flexible voice synthesizer offering detailed editing of tonality, timing and pitch.
- Wavetable synthesis is slotted in too, plus a modulation matrix and effects.
- It is therefore capable of far broader range of tones than it appears.
- Not for the faint-hearted or impatient.
- Even with practice, creating a natural-sounding voice is not easy.
- A bigger dictionary would help.
- The song mode is fairly basic.
Human voices, singing robots, other-worldly choirs and evolving synths, Phonem is capable of all these and more. While it’s also demanding and occasionally obtuse, if you put in the effort, the creative possibilities are endless.
- Review system: Mac Pro running OS 10.8.5, 16GB RAM, 2x2.66GHz Quad-Core Intel Xeon running in 64-bit mode.
- Version reviewed: 1.0.1 (AU).
- DAW Host: Logic 10.0.7 in 64-bit mode.