Singing Synthesis Software [XP/ Mac OS X]

Published October 2004

If your computer could sing, what would it sound like? With Virsyn's Cantor software synth, you can find out...

To an engineer of 30 years ago, today's software tools would be the stuff of science fiction. Who could have predicted that a small grey box would be able to record hundreds of audio tracks, put a tone-deaf singer in tune, or place a virtual symphony orchestra at our fingertips?

We're used to this sort of technological miracle by now, but even so, it's hard not to feel a touch of Tomorrow's World-style wonderment about the advent of singing synthesizers. Yamaha's Vocaloid technology, introduced last year, allows developers to sample the characteristic building blocks of a human voice and create a virtual vocalist, ready to sing anything you care to throw at him, her or it. At this year's Frankfurt Musikmesse, meanwhile, innovative German soft-synth designers Virsyn gave the world its first glimpse of their take on the singing computer concept. A mere four months later, the finished program is with us.

Cantor bears a number of operational similarities to Vocaloid. It can operate either as a stand-alone application or a VST or Audio Units plug-in, with Rewire and RTAS support promised in a coming update; and even when used as a plug-in within a host sequencer, it employs its own piano-roll-style grid for note and lyric entry. There are, however, three fundamental differences between Cantor and Vocaloid. First, Cantor is not based on samples. Instead, a morphing additive synthesis engine derived from Virsyn's Cube software synth is used to generate the 39 phonemes which Virsyn use to reproduce English speech or singing. Each phoneme is created by passing an additive sound source through a formant filter, which morphs between a start and an end state. These filter responses are fully editable, and up to to six peaks and three troughs in the formant filter response can be specified as morph points.

The second crucial difference, which is a consequence of the first, is that unlike Vocaloid, Cantor is a true virtual instrument, which can be 'played' in real time from a MIDI keyboard. What's more, all changes to notes, lyrics and vocal timbre are made instantaneously, with no need to wait whilst a file is rendered, and all parameters can be MIDI controlled and automated within your sequencer.

The third key point of departure is a conceptual one. Whereas Vocaloid is intended to create realistic, human-sounding singing voices that could replace a real lead or backing vocalist without the listener being aware, Cantor belongs more to the realm of 'special effects' vocal tools such as vocoders, guitar talkers, ring modulators and Mellotron choirs. Virsyn don't claim that the results will be similar to real vocals, but have designed Cantor in such a way as to have qualities that no human vocalist could achieve, including a formant range that varies well beyond the usual male/female spectrum.

Start Singing

The Cantor box contains a single CD-ROM with Mac and Windows installers, plus a slim but welcome perfect-bound manual in English and German. Installation is straightforward, and the copy protection consists of a serial number which you enter when you first run the program. Registering at Virsyn's web site is highly recommended, as it enables you to download the frequent updates, along with additional files such as example projects and new phoneme sets.

Cantor 's user interface is almost identical whether you use the stand-alone or plug-in version, apart from a few minor features which I'll point out as they crop up. Each instance of the program is eight-part multitimbral, but the parts are monophonic, so if you want Cantor to sing multi-part harmonies, you'll need to create each line separately. Editing duties are distributed between five screens, which are brought to the front using tabs at the left-hand side, with the bulk of your work being undertaken in the first Score window. This combines a fairly conventional piano-roll editor with control over the most important parameters used for adjusting the timbre of the synthesized voice.

Cantor's Voice editing page. The partial display in the top half of the window allows you to edit the additive source used to generate voiced phonemes such as vowels, whilst the noise transfer function below determines the frequency content of the noise used in generating unvoiced phonemes such as 't'.

Getting started is pretty easy. You select the Pencil tool from the small array above the piano-roll and draw in a note of the required length. When you let go of the mouse button, the text field above the note will turn yellow, indicating that you should now enter a word or syllable. As you do so, Cantor automatically translates it into the appropriate phoneme or combination thereof, optionally displaying the results below the note bar. A hyphen is used to tell Cantor that the following note should be treated as part of the same word. The engine used to translate words into phonemes is licensed from Carnegie-Mellon University, and is remarkably good. In the entire time I used Cantor, the only cases I encountered where it didn't know the correct translation were with proper names, and even then it usually makes a good guess. In practice, you quickly forget it's even there. If it does fail you, it's possible to edit phoneme data for each note directly, but unlike Vocaloid, Cantor doesn't offer any way of adding your own words to its dictionary.

Other available editing tools include an Arrow tool for selecting and moving existing notes and an Eraser tool. Also in the toolbar are buttons to copy and paste selected notes, an Undo button allowing you to reverse up to 16 of your actions, and options to enable/disable quantisation and the phoneme display. Scrolling and zooming are achieved as in Cubase SX by clicking and dragging in the ruler view; this works well enough, but I missed having scroll bars and clickable zoom settings too. Overall, anyone familiar with the basic concept of a piano-roll editor will find Cantor 's pretty straightforward to use, but one thing that gets annoying is that there are no keyboard shortcuts whatsoever. Editing would be a lot faster if you could use key input to switch between the different tools, but as it is, you can't even use the backspace key to delete selected notes; nor can the QWERTY keyboard be used to zoom or scroll the screen, or nudge notes. Some plug-in formats do impose restrictions on keyboard input to plug-ins, so this is perhaps understandable in the plug-in versions of Cantor, but the stand-alone version is no different. More superficially, it would be nice if selecting the Pencil or Eraser tools actually changed the shape of the cursor to a pencil or eraser; as it is, you just get an arrow, whichever tool is selected. I also encountered occasions when the Pencil tool didn't put the notes quite where I expected, but this wasn't a big problem in practice.

Once you've entered some note and lyric data, you can use the transport buttons above the window to play it back; if you're using the plug-in version, you can also choose to slave Cantor to the host program's transport. However, it's more fun to play Cantor 'live' from a MIDI keyboard. In this mode, Cantor still cycles through the lyrics as you play, but the pitch and timing of the notes you've entered into the piano-roll editor is ignored in favour of incoming MIDI data, whether it's coming directly from your keyboard or a MIDI track in your sequencer. If you have the Legato button switched off, Cantor still uses the note lengths from the piano-roll editor, whilst engaging Legato mode means that each note is sustained until you let go of the key or play another note. When you're playing Cantor in this way you do have to be careful to leave gaps where a word ends in a consonant, as these consonants are triggered by the MIDI Note Off, and playing legato causes them to be missed out. With 'pedal legato' mode set in the Voice editing page, it is possible to have Cantor sustain a single syllable over multiple MIDI notes by holding down the sustain pedal and playing legato.

Finding The Balance

Having provided Cantor with some raw materials in the shape of lyrics and sequence data, you can begin to explore the voice-editing controls. Sensible user interface design means that you can do most of the editing you're likely to need using the dozen or so knobs at the left of the main Score editor. These include familiar parameters such as Volume, Glide (portamento), Pan, and Vibrato Rate and Depth, plus a selection of slightly more unusual controls. Of these, Ensemble provides a gentle chorus effect, whilst Bright allows you to adjust the amount of high-frequency content in the output and Humanise introduces elements of random variation in the pitching, volume and vibrato rate and depth, to offset the machine-like quality of the results.

The Phoneme editing window displays the start and finish states of a morphing formant filter. You can draw transfer functions for each with the mouse, and place morph points to tell Cantor how to map the filter response between the two.

Four other controls have a deeper influence on the nature of the synthesized voice. Metallic, according to Virsyn, 'turns the vocal source from a harmonic, partial structure into an inharmonic, metallic one', with actual results not unlike a ring modulator. Balance adjusts the relative levels of voiced phonemes and unvoiced ones, which for most purposes means the balance between vowel sounds and soft consonants on the one hand, and sibilant and hard consonants such as 's' and 't' on the other. Breath introduces breath noise into the sound, and Gender is perhaps the most fundamental control, covering the full spectrum from impossibly deep bass, through obviously male, female and child singer territory into areas previously inhabited only by the Smurfs and Kate Bush.

All of these parameters can be automated if you're using Cantor within a suitable plug-in host, and most can also be automated within Cantor 's Score editor itself, so there's plenty of scope for variation even within a single vocal phrase. A nice touch is that when you hover the mouse over a control, its name disappears to be replaced by a numeric readout for that parameter.

Voice Working

If you want to get more deeply involved in shaping the vocal sound, you need to head for the Voice editing page. Here you'll find two graphical windows in which you can 'draw' by clicking and moving the mouse. The upper window defines the spectrum of the additive source that is to be passed through Cantor 's formant filters, whilst the lower one specifies a transfer function (ie. EQ shape) for the breath noise. These use the same click-and-drag drawing method as Virsyn's Cube additive synth, which is about as easy to use as it gets.

As on the Score editor, further parameters are available for control at the left of the screen. You can specify the number of partials to be generated by the additive source — using more partials increases the sound quality but also the CPU load — and the corner frequencies of a high- and a low-shelving filter. Less familiar parameters include Blur, which makes Cantor sing in a Mockney accent, and Noise Mod, which determines the extent to which the noise is related in timbre to the pitch of the note; at low values, it's pure noise, but as you increase the dial, the noise acquires more of a pitched quality. This, again, is presumably derived from the noise-generating system in Cube, which works by modulating the partials to create a more or less anharmonic noise signal.

Actually, I lied about Blur. What it does is allow you to speed up or slow down the rate at which Cantor morphs between phonemes. The results of varying this are described in the manual as sounding like a 'contrast enhancement' or 'motion blur' of the phoneme sequence, and are usually pretty subtle in practice.

Although some of the more unusual parameters can be a little unpredictable until you get used to them, all of the editing tools on the Voice page are easy enough to use. The difficulty level takes a pretty vertical hike when you get to the Phoneme page, which enables dedicated users to create their own phoneme sets. That's no reflection on Virsyn's editing tools, which seem well designed for the job in hand — it's just that the job in hand is frighteningly complicated and requires a lot of specialised knowledge. In essence, you design the morphing filter response for each phoneme using two click-and-draw transfer-function graphs, one representing the filter state at the start of the phoneme and the other the end. You then mark the most prominent peaks and troughs in the transfer function as morph points, and specify whether the phoneme is voiced or unvoiced — voiced phonemes use the additive partial generator as a source, whereas unvoiced ones use the noise source. Other parameters include Transit (morph) times for the voiced and unvoiced parts of the sound, and Sustain, which specifies at what point during the morph Cantor holds the sound when you play a sustained note.

Designing a phoneme set from scratch will be beyond the ken of most users, but it can be useful and interesting to play around with the default sets. For instance, the default 't' phoneme in one of the sets generates a burst of high-pitched noise that I found distracting, and the phoneme editor makes it possible to tone this down.Cantor's built-in effects include distortion, chorus, delay and global reverb.

There are two further editing pages, FX and Mix. The former offers basic distortion, stereo delay, chorus and reverb effects, which might be useful in the stand-alone version of Cantor but will probably remain unused where there's access to dedicated effects plug-ins. Sensibly, the reverb is set up as a global effect, whilst the others are implemented on a per-voice basis. The Mix page allows you to assign MIDI input channels to each of Cantor 's eight voices, and provides global control over their output level, pan and reverb send level.

A simple mixer allows you to adjust the level, pan and reverb send for each of the eight monophonic voices in an instance of Cantor, and to specify which MIDI channel they will respond to.

Middle Of The Load

CPU load is unlikely to be a huge problem with modern computers. On the modestly specified review machine, Cantor 's CPU meter rarely exceeded 20 percent for a single-note part, and I imagine that a G5 or Pentium 4 box will cope without trouble — it's not as though you'd want to have multiple instances of Cantor playing 27-note chords in any case.

Pardon?

When I first heard Cantor at this year's Frankfurt Musikmesse exhibition, I thought that it would probably become the voice of a thousand novelty hits. After I'd tried it out myself for the first time, though, I wasn't so sure. The results were certainly novel, and you'd have been unlikely to mistake them for a human vocal, even one that had been stuffed through Auto-Tune and vocoded to death. However, there's no way you'd have been able to use Cantor v1.0 for a lead vocal, because its singing was pretty much unintelligible: the output was recognisably vocal-like in character, but often impossible to decipher unless you had the lyrics in front of you.

Fortunately, Virsyn are very responsive to user feedback, and they have worked hard to alleviate this problem. Whereas the initial release of Cantor came with just a single phoneme set, version 1.02 ships with eight different factory sets, most of which are clearly better than the original. The vowel sounds in the default set were always fine, as were some consonant sounds such as hard 'g' and 'k', but many of the other consonant sounds were very indistinct. The new phoneme sets go some way towards putting this right, with noticeable improvements to sounds such as the 'w' in 'we', the soft 'th' in 'this', the 'f' in 'fish' and the 'p' and 'd' in 'jumped'. The user can also aid intelligibility by playing in a sympathetic manner, leaving rests in the right places and not sustaining notes on unsuitable syllables, as well as keeping the Gender setting appropriate for the playing pitch; judicious twiddling of the Balance between voiced and unvoiced phonemes and the Noise Mod parameter can also help.

Despite these improvements, though, making out what Cantor is singing still demands careful attention, and the clarity is substantially inferior to a typical sample-based speech synthesizer — it's about on the same level as a vocoded human voice. Some sounds are still problematic in most of the sets, including the 'n' in 'new', the 'y' in 'you', the 'sh' in 'shell' and the 'b' in 'bite'; with voiced consonants, as in the latter case, it's often impossible to emphasise an initial consonant sufficiently compared to the vowel that follows. However, the average intelligibility has been raised enough that you can now often decipher a dodgy word from the context, which definitely wasn't the case with version 1.0. What's more, the eight different factory sets do have noticeably different characters to them, although as you'd expect, these are all obviously machine-like.

Cantor In Context

The improvements in v1.02 now make it possible to use Cantor in a role where it's required to convey a lyrical message, but it would be asking too much of it to replace a human lead singer. You still need to concentrate to understand the singing, and with its distinctive, synthetic character, it seems to work better for short, repeated phrases such as you might hear in a dance track. Creative abuse of parameters such as the vibrato rate and depth can generate some unique vocal timbres which it's easy to imagine the likes of Daft Punk exploiting, although there are plenty of dead ends to be encountered here, and it takes practice to achieve something you'd actually want to use.

The sound that Cantor makes is undoubtedly distinctive, but whether it's actually pleasant or useful is of course a matter of taste. Personally, I found it most effective in small doses. It doesn't really sound lush or warm like a classic vocoder, and nor does it have the organic quality of a guitar talk box; on the other hand, if you're planning to bring A Brief History Of Time: The Musical to the West End, you might find it just the thing. There's often a pay-off to be made between intelligibility and musical usefulness, whereby increasing the prominence of the voiced consonants makes Cantor easier to understand, but also makes those consonants tend to become spitty or sibilant and stick out of the track in a way that's awkward to deal with.

Cantor also has instrumental applications: you can get some very usable lead and bass sounds by forgetting that this is supposed to be a vocal synth, and strange vocoder-esque pads are easy enough to come by. In addition, there are lots of interesting effects to be had by forcing Cantor to gabble through a vocal sequence at ridiculously high speeds, or abusing parameter automation. The fact that each Cantor part is monophonic is quite a limitation when you're using it in a more textural role, though, as it's not possible to play chords in one pass from a keyboard.

Even taking into account its potential as a pad or bass-line generator, you couldn't really describe Cantor as a versatile synth, but it is something you would turn to if you wanted to lift a track out of the ordinary. The best way to decide if Cantor is for you is to download the fully functional 4MB demo version from the Virsyn web site; if you do try it, make sure to listen to the different factory phoneme sets in order to give it a fair hearing.

Finally, let's not forget that Cantor is just a really interesting piece of software, with obvious value as an educational tool. Although it doesn't sound realistic, it does makes it a lot easier to understand the nature of human speech and singing, and playing around with the phoneme sets can be fascinating. Virsyn deserve enormous credit for taking something as complex as vocal synthesis and implementing it without the aid of samples, in a format that's so intuitive and fast to edit. You won't want to use it on everything you record, but in a world full of 'me too' products and vintage emulations, Cantor stands out as an ambitious, intriguing and above all original software synth.

System Requirements

MAC

G4 400MHz or faster, 256MB of RAM, Mac OS 10.2 or later.

Pentium III 600MHz or faster, or Pentium 4 or Athlon XP/MP, 256MB of RAM, Windows XP.

Pros

Given the complexity of what Cantor is doing, its user interface is simple and intuitive.
Automatic translation of English words into phonemes is flawless.
All edits are implemented in real time.
Vocal lines can be played from a MIDI keyboard.
Lots of control over the sound of the voice, so plenty of unusual effects are possible.
Has great potential as an educational tool.

Cons

The singing is much less intelligible than a human vocal or a typical speech synthesizer.
It's not always easy to produce a pleasing timbre, or one that fits within a track.
No keyboard shortcuts for Score editor, even in stand-alone version.

Summary

Cantor might turn out to have more theoretical interest than practical usefulness, but it's got enough of both to make it worth investigating.

information

£199.99 including VAT.

You are here

Virsyn Cantor

Start Singing

Finding The Balance

Voice Working

Middle Of The Load

Pardon?

Cantor In Context

System Requirements

Pros

Cons

Summary

information

New forum posts

Active topics

Recently active forums

You are here

Virsyn Cantor

Start Singing

Finding The Balance

Voice Working

Middle Of The Load

Pardon?

Cantor In Context

System Requirements

Pros

Cons

Summary

information

New forum posts

Active topics

Recently active forums

Login