Yamaha made quite an impact when they announced Vocaloid towards the end of last year. Zero-G were the first company to enter a licensing agreement to use Yamaha's singing synthesis engine, and they released the female Lola and male Leon in time for the January 2004 NAMM show. I was lucky enough to review both Vocaloid and the Lola/Leon combination in the March 2004 issue of SOS (read the review on-line at www.soundonsound.com/sos/mar04/articles/vocaloidlandl.htm). My feelings were a little mixed — Vocaloid is undoubtedly a remarkable technology and, at its best, capable of equally remarkable results. However, the down side is the amount of detailed editing of the various expression controls needed to craft a natural-sounding vocal line. While this is not too daunting a prospect for short vocal phrases or harmony backing vocals, trying to create a lead vocal for a full-length track would be a pretty major undertaking.
At the time of the release of Lola and Leon, Zero-G also announced that a third vocalist, Miriam, was in development. Based on the very considerable vocal talents of Miriam Stockley (see the 'Taking Stock' box for some of Miriam's credits), 'virtual' Miriam is now available. The release also brings some significant price cuts — Lola and Leon now retail at £129.95 rather than £199.95 — and an update to the synthesis engine. As I am unlikely to be in the privileged position of inviting the real Miriam Stockley into my studio, I was keen to see how her virtual counterpart might sound.
Miriam is supplied with the latest version of the Vocaloid engine. As suggested by the version number (126.96.36.199), this is not a major upgrade but it does include some minor tweaks and fixes. However, the release notes suggest the upgrade does tackle three niggles I'd raised in the earlier review: VST Instrument functionality, the response of the synthesis engine and the Play With Synthesis option have all been improved.
Miriam has the same copy-protection system as its predecessors, which ties it to the particular Ethernet LAN card installed in your system — if you don't have one, you'll need to spend an extra £20 or so to get one. While the earlier two releases each required around 600MB of hard disk space to store their sample database, once expanded from the installation CD, Miriam requires close on 2.5GB. This difference is not explained in the documentation but is probably due to significantly more detailed sampling of each of the phonemes used as the building blocks for pronunciation of lyrics. In theory, this ought to result in an improvement in the intelligibility of the singing produced by the synthesis engine.
On the surface, nothing obvious has changed in the Vocaloid interface. This was discussed in detail in the earlier review, so I'll avoid too much of a recap here. In essence, notes to form a melody line are entered into a fairly standard piano-roll editor and, above each note, lyrics can be typed in. Each syllable of a word needs to be given a separate note, with syllables being connected via a minus sign (-).
Once the lyrics are entered, they are automatically transformed into phonetic sounds, which can be displayed below each note and edited manually if required. The synthesis engine then extracts the required phonemes from the sample database of the selected virtual vocalist and pitch-shifts the fundamental and overtone elements of the sounds to the required note, leaving the formants intact. It is at this stage that the additional samples within the Miriam database ought to be an advantage, as less pitch-shifting is likely to be required.
The final, and generally the most time-consuming task, is to add all the necessary expression to the vocal to make it sound as natural as possible. The Icon Palette helps here and includes attack, vibrato and dynamics presets that can be dragged onto each note. These can then be edited manually and further expression can be added via the Control Track (at the bottom of the editor) through settings such as pitch-bend, velocity, harmonics, brightness and gender factor. All of these can add character to the vocal, making it more lifelike, but this stage certainly does take some experience and experimentation to get to grips with. As before, multiple Vocaloid tracks are available so harmony vocal parts can easily be constructed once a main line has been created.
As noted earlier, the major drawback of Vocaloid is the lengthy nature of this expression-editing process, but the changes to the synthesis engine in this version do provide some help here, as less time is spent waiting for the vocal to be re-synthesized after each minor edit. In addition, the Play With Synthesis option, which allows playback to begin while synthesis is going on in the background, now appears to work smoothly.
When I reviewed Lola and Leon, using the Vocaloid Editor application as a Rewire client to a Cubase SX host had worked fine, but I'd had no luck with using it as a VST Instrument. Things have certainly improved in that regard. As before, vocal lines have to be created within the stand-alone editor. MIDI files are then saved and imported into a VST host to be played back via the Vocaloid VSTi. The Vocaloid MIDI files do not contain Note On or Off events, and the data in the MIDI file is actually sent to the VSTi a little prior to playback of the vocal (to allow for processing). Aside from requiring a little trial and error in the initial line-up of the Vocaloid part within SX, this created no problems. A further plus with the VSTi is that it allows real-time control of many of Vocaloid's expression parameters and, unlike the control track in the stand-alone editor, all of these are displayed at the same time.
Miriam Stockley's CV is impressive. As well as solo albums (most recently Second Nature, released in 2002), she has an extensive list of credits as a collaborator and session singer. These include work with Tina Turner, Elton John, George Michael, Freddie Mercury, Chaka Khan, David Bowie, Seal, Mike Oldfield and Adiemus. She has also contributed vocals to material within a number of major film soundtracks, including Rob Roy, One Night Stand and Lord Of The Rings. Miriam was the subject of an SOS interview in the May 1999 issue (www.soundonsound.com/sos/may99/articles/miriam.htm) at the time of the release of her first solo album, simply titled Miriam. If you want to find out more, then Miriam's web site www.miriamstockley.com has all the latest news on her activities — and if you want a real vocal treat, listen to some of the music clips.
These useful improvements to the synthesis engine aside, how does Miriam actually sound? I tested by creating some new vocal phrases and comparing the performance of Lola with Miriam when singing the same lines. My impression is that Miriam's pronunciation is perhaps just a little better at times, although with simple phrases or vocal ad libs, these differences are less noticeable. In comparison to Lola (or Leon for that matter), results with Miriam sound a little smoother, and this seems most noticeable in the transitions between pitches in multi-syllable words. My guess would be that both of these improvements are down to the larger phoneme sample base supplied with Miriam.
Zero-G supply a somewhat larger collection of example pieces created with Miriam than was the case with Lola or Leon. These included some WAV demos of the virtual Miriam with a full musical backing, 'performing' both solo lead and harmony vocals. Given that this is still very early days for the Vocaloid technology, some of these are frighteningly good (check out the 'Never Give Up' audio demo via the Zero-G web site for an example) and they clearly demonstrate the potential of the synthesis engine. While the lead vocals are not, perhaps, 100 percent convincing as yet, some of the backing/harmony vocal parts that can be created are clearly very usable in the right musical context.
That said, the main frustration is still the extensive expression editing (and occasional manual phoneme editing) required to create phrases that do not sound obviously synthesized — and while Miriam does seem to produce better results than the original Lola and Leon, crafting a convincing lead vocal line is still a very lengthy process. A more minor complaint is that the selection of Vocaloid MIDI files supplied with Miriam does not include the very best of the audio demos. This is a great shame as loading these into the editor would allow new users to see exactly the sorts of detailed editing work required to create a more natural finished vocal. Perhaps these will be added to the Zero-G web site at some stage?
Finally, in reviewing Miriam, it is difficult not to make some comparison between Vocaloid and Virsyn's Cantor (reviewed in the October 2004 issue of SOS), despite the fact that Virsyn themselves are keen to play down any such comparison! As explained by Sam Inglis in that review, Cantor uses synthesis to create its phoneme sounds, rather than the sample-based approach adopted by Vocaloid. While the two applications share some obvious visual and operational similarities, Cantor is not intended to emulate a live singer. If you want special-effect-style vocals that are obviously synthesized, either application will deliver, although Cantor will probably do it quicker and with less CPU overhead. However, if you want vocals that might replace a live singer, then Vocaloid is the way to go — it might take a lot of work, but it can be done, particularly in the context of backing vocals.
Pentium III 1GHz or faster, 512MB RAM, Windows 2000 or XP, Ethernet LAN card, 2.6GB hard disk space.
One seemingly simple improvement Yamaha might consider is to provide MIDI input to Vocaloid Editor so that initial note entry could be done via a keyboard. I found it easier to create my melody lines in SX and then import the MIDI part into Vocaloid Editor to begin the vocal construction, but it would be nice to be able to avoid this step.
More significant (and obviously more complex!) would be some improvements in the expression editing process. As I commented in my original review of Lola and Leon, I wonder whether the vocalist models could include information on how the live singer performs transitions between pitches of different intervals and at different tempos, in such a way that Vocaloid could apply suitable expression parameters automatically. Additional Icon Palette tools for the ends of notes to match those available for note attacks might also be useful.
Miriam brings some improvements to the Vocaloid concept and, once again, demonstrates the considerable potential of Yamaha's synthesis engine, but a virtual vocalist is a much more ambitious undertaking than a virtual drummer or rhythm guitarist. While Vocaloid may still be a little way from the songwriter's Holy Grail of a 'singer in a box', the significant price drop may well encourage some users to take the plunge simply out of curiosity. Whatever your take on the desirability of replacing human singers with their virtual counterparts, it will be very interesting to see just how far Yamaha can take their technology.