Can Dreamtonics really deliver a session singer in software?
Virtual musicians, in the sense of virtual instruments with powerful performance elements, are now a fixture in the workflows of countless composers, producers and recording musicians. However, while the drummer, bassist, keyboardist and guitarist of your virtual session band can now undoubtedly produce the goods, what about the vocals? Well, Dreamtonics might well suggest that their flagship product — Synthesizer V — can do just that. Session singer in a software‑shaped box, anyone?
Voice In The Machine
The human voice — spoken or sung — is a hugely complex instrument, making the technical challenge of synthesizing it a very considerable one. A number of brave developers have tried, though, and for solo voices, perhaps the most widely known product is Yamaha’s Vocaloid. While the potential of the technology was clear to see when SOS reviewed the Sonika voice database for Vocaloid 2 in March 2010, the workflow was somewhat laborious. Backing vocals or obviously processed EDM vocal styles were possible (and became a thing in their own right) but creating a lead that might fool the listener into believing it was a ‘real’ voice remained out of reach.
Of course, in music software terms, 2010 is a long time ago. While Yamaha have continued to move Vocaloid forwards, over that same time span, competition has also appeared. One of these newcomers is currently gaining a lot of interest; Dreamtonics’ Synthesizer V.
Dreamtonics are also based in Japan, and Synth V is very similar in concept to Vocaloid. Running either standalone or as a plug‑in, the software has two main components; the synthesis engine and a selection of voice databases for individual ‘singers’. The most recent iterations of the engine include AI elements with machine‑based learning to improve the realism of the end result. The current selection of voice databases (built from recordings of real singers and available individually as separate purchases) include native Japanese, Mandarin Chinese and English singers. I had access to a number of the native English voice databases for this review but, for newer voices, the engine does enable them to switch between these three languages.
Two versions of Synth V are available. The Basic version is free‑to‑try with some function/feature limitations, but does at least allow you to experience the synth engine in action. The paid‑for Pro edition obviously removes any limitations and, as described more fully below, provides an extensive list of editing and style options that can be applied to the synthesized voice.
It’s also worth mentioning that — at the time of writing at least — the available documentation is lagging somewhat behind the development of the software itself. Dreamtonics are actively working on improving matters on this front, but it did leave me unsure whether I fully understood all the features during the course of the review. Watch this space...
Sing Something Simple
Synth V’s UI contains three key elements. First, a vertical strip of buttons (far right) allows you to toggle open/closed a series of sub‑panels, each focusing on a specific set of command options and that can be placed (by dragging) either on the left or right sides of the overall UI. Second, the Arrangement panel provides a DAW‑like ‘project window’ containing a vertical arrangement of the tracks within your current project and a bar‑based timeline display. Mini note displays along this timeline provide useful visual feedback for the overall arrangement.
A project can contain multiple synth voice tracks based upon one or more of the voice databases you have installed. Very usefully, you can also add audio tracks (termed Instrumental Tracks) into the arrangement. In the standalone application these might most obviously be used for an instrumental mix as musical context for your synth voice creation. All the tracks — voice synth or audio — have volume, pan, mute and solo options. There are no effects options but it’s perfectly adequate for the core task of creating the synthesized vocal(s).
Third, for the selected voice track in the Arrangement panel the Piano Roll panel shows the MIDI‑like note ‘blobs’ that represent the melody and timing of the sung performance. The display also shows the engine’s AI‑generated pitch curve, with features such as pitch slides between legato notes and vibrato on sustained notes. However, these properties are fully editable, and the Note Properties panel can be used to specify the pitch transition and vibrato settings for individual (or selected groups of) notes. The Piano Roll also shows any lyrics you have added for each note blob (just double‑click on a blob and type the word you want) and the phonemes the engine has assigned to these words to generate the appropriate pronunciation.
At the base of the Piano Roll panel, you can pop open further sub‑panels that allow you to create modulation curves for key properties of the voice. These include Pitch Deviation, Loudness and Tension, amongst a few others, and, as with any virtual instrument, modulation can add considerable expression and realism to the performance.
You can add notes into the Piano Roll editor manually, via a MIDI file or by recording them from a MIDI keyboard. New notes are given a default ‘la’ lyric and, if you enable the Instant Mode button (top right of the Piano Roll panel), Synth V automatically revises the vocal waveform as you add or edit notes. The note editing toolset is very much like a MIDI editor, allowing you to change pitch, position and length very easily. Keyboard shortcuts are also supported, and you can check the default configuration for these in the Settings panel.
Having added a voice track and assigned the required voice database, generating a sung vocal requires you to enter/create a pattern of MIDI‑like notes and type in the lyrics for each note. As you enter or edit notes/lyrics, Synth V works away in the background generating the resulting vocal, and you can see the calculated waveform within the Piano Roll panel. You then simply hit the Play button to audition the result.
Whatever Dreamtonics are doing with their AI‑based algorithms under the hood, it is very, very clever indeed.
The first few times you use Synthesizer V, you may find yourself staring at your studio monitors with your jaw smacking off the floor. Yes, the odds are that there will be some further editing work required — words where the pronunciation or phrasing doesn’t flow in a truly ‘human’ fashion, for example — but, equally, there will be other phrases (and, quite possibly, a lot of them) where the degree of realism is, frankly, quite staggering. This is way beyond the ‘good enough for backing vocals or robotic EDM vocal hooks’ stage; it’s right into prime‑time lead vocals territory. Whatever Dreamtonics are doing with their AI‑based algorithms under the hood, it is very, very clever indeed.
Get Real
Once Synth V has created its initial vocal, it’s then up to you to decide just how much additional finessing is required. At the Track level, if you are using one of the newer AI‑based voice databases (as are all the ones I had access to), the Voice panel offers a set of ‘Voice Modes’. These vary for each voice but might have labels such as Soft, Airy, Passionate or Light. You can use the Vocal Mode sliders to create blends of these various Modes and, to a large extent, these then dictate the style of vocal delivery created.
These seem to operate at the Track level so, if you want your singer to sing a verse in a combination of Soft and Light modes, but a chorus that blends the Power and Passionate modes, then you simply create those vocal parts on two separate tracks within the Arrange panel. You can set the Voice panel’s Parameters sliders — Loudness, Tension, Breathiness, for example — at track level, but these parameters can also be modulated via the Piano Roll’s Parameters sub‑panels. The Piano Roll’s Parameters panel also includes a Pitch Deviation pane. This modulates the pitch relative to the pitch curve displayed within the main Piano Roll display (that is, it ‘deviates’ the pitch). If Synth V’s pitch variations are not always exactly what you want, this modulator allows you to customise pitch in a very precise fashion.
Within the Piano Roll, if you select individual notes, the Note Properties panel lets you apply specific settings for pitch transitions (from one note to the next) and vibrato (its onset, depth and frequency, amongst other options). Controlling vibrato did leave me scratching my head a little and I’m still not sure I fully understand how these various controls interact. By default, Synth V’s AI does a good job with pronunciation but the Note Panel’s Phonemes settings — which allow you adjust the relative duration and strength of each phoneme within a word — let you finesse the pronunciation and customise the delivery of a word if required.
If this isn’t already enough, there are other commands to assist you in the performance creation process. It’s here that some comprehensive documentation of the full feature set would be really beneficial as I’m sure there are features — for example, the intriguing AI Retakes panel, the Ornament Selected Notes command, or working with note Groups — that I’m not yet fully exploiting and that I suspect would improve either workflow or the quality of the end result.
Singer Auditions
I was able to try four voice databases during the review. Mai is bundled with Synthesizer V Studio and, while built from a native Japanese female voice, it sings very effectively in English. The sound is youthful and would suit pop or dance styles. The other three — Natalie, Kevin and Solaria — are all native English voices. Natalie’s voice could carry a range of styles (most obviously in the dance/electronic genres but also ballads and indie pop‑rock), while Kevin also appears quite flexible and might go from EDM through to pop‑rock styles.
However, to my ears at least, the cream of the vocal crop is Solaria (see the ‘Talent Pool’ box for more details). Based upon the singing talents of Broadway‑trained vocalist Emma Rowley, this voice can cover a very wide range of styles from intimate pop, powerful EDM, pop‑rock and off into symphonic rock. It’s also the voice that seemed to produce realistic results with the least manual editing. The results can be very impressive indeed.
With each voice offering its own character, and each also having its own set of Vocal Mode options for customisation, there are a lot of musical, song or performance styles that you could coax out of just these four voices. If I was to have a wishlist item, though, it would be for both male and female voices with a ‘rasp’ Vocal Mode. You can hint at this with some of the above voices but, as yet, it doesn’t really get you into Chris Cornell or Lizzy Hale territory.
DAW Tour
Inserted as an Instrument VST within Cubase 12, Synth V’s plug‑in format provides exactly the same (resizeable) working environment. Your Synth V creations and settings are also saved within your overall DAW project.
The plug‑in seems to support multiple audio outputs (I was able to activate these within Cubase) but I couldn’t find an obvious means to then route individual vocal tracks from the Synth V plug‑in to different channels within the Cubase MixConsole. I might simply have missed something here but, if not, it would be a great addition. I did experience some other occasional less‑than‑fluid workflow wrinkles (for example, pop‑up dialogue boxes that hid themselves behind other DAW windows) but nothing of any major consequence.
Are We There Yet?
So, the feature set and workflow are impressive, but can Synthesizer V actually produce a believable natural lead vocal performance? Well, a qualifier or three aside, amazingly, I think it can. Over my years of doing product reviews for SOS, there have been a few occasions when my jaw has seriously hit the floor (and one product — Melodyne — that did that twice!). Synthesizer V is one of those moments. Given that synthesis of realistic singing is such an ambitious aim, that Synthesizer V can even get close feels like music software as science fiction. Putting aside the artistic desirability of a virtual singer for a moment, the underlying technology is seriously impressive.
Those qualifiers? Well, first, it’s still perfectly possible for a Synth V vocal to sound very obviously unnatural. Dreamtonics have done a tremendous job of getting the maximum level of ‘natural’ out of the minimum user effort, but you may still find yourself needing to dig in and finesse the pronunciation, tuning or vocal character. However, the tools are there to do just that.
Second, your definition of ‘usable lead vocal’ will undoubtedly depend upon the musical situation that you are working within. If you simply want to add a few vocal hooks to an EDM track to play to your mates, the required quality bar is going to be somewhat lower than if you are creating a theme song for a Hollywood blockbuster film. Context is everything but, once you get to grips with Synth V, I think it’s surprising just how far you might go up that quality spectrum.
Third, and perhaps obviously, while Synth V might let you add vocals to any song when you can’t actually sing them yourself, it can’t make a bad song into a great one. Whether your vocals come from a human or a piece of software, writing a great song that’s going to connect with a human audience is still down to you.
Deep Fakes?
The impressive nature of the technology aside, does the world need computer generated vocals in its music? I get that this concept would be the total antithesis of musicality to some musicians, producers and music consumers. Indeed, in part, it’s a view I can share. Of all the elements of a great song, it’s generally the vocal — its words, its phrasing, its emotions — that makes the most intimate connection with a listener. Can Synth V make that connection in the same way that, for example, singers such as Adele or Joe Cocker or Whitney Houston or Robert Plant (and many others) can? While you can certainly get it to simulate emotion in various ways, impressive though the technology is, the results are not in that superstar vocalist territory.
If you want to audition an example that I think captures the essence of where Synth V is currently at, do a YouTube search for ‘Synth V Adele Easy On Me cover’. This is perhaps an unrealistic comparison (Adele has one of the most iconic voices of our generation; lots of human singers can’t get close) but, even so, the Synth V generated vocal is a fabulous demonstration of what can currently be achieved, and is undeniably impressive... even if it doesn’t deliver that ‘something special’ emotional intensity of Adele’s original.
Yet. Because the obvious question is where this AI technology might go? Deep fake video can already fool our eyes into believing we are seeing actual video footage of people (for example, movie stars, or other celebrities, alive or dead) doing stuff they have never actually done. How long before we see that deep fake concept applied to the voices of famous singers? Where the AI ‘learns’ their sound and the varied characteristics of their voices and can recreate it to a level that our ears are fooled? My time with Synth V suggests that’s going to come, and it would be a combination of fascinating (from the technological perspective), exciting, and frightening all at the same time.
Dreamtonics appear to have taken actual magic and turned it into code. Synthesizer V is groundbreaking music technology.
Singer In A Box?
Deep fake might be for the future; so who might buy Synthesizer V Studio right now? Well, it’s undoubtedly a niche product but, if you are a songwriter/producer who doesn’t sing, whether to simply add vocal hooks or backing vocals, for demo projects with your vocal ideas as guides for session singers, or to pitch songs to artists, then Synth V — with suitable voice databases — now provides a remarkable solution. For many of these kinds of applications, in practised hands, Synth V will easily exceed the required quality bar.
What about a lead vocal on a commercial release? Well, those musicality issues aside, I’d not find this too difficult to imagine in a pop, dance or EDM context as vocal styles often diverge from ‘natural’ anyway. For singer‑songwriter ballads backed by just an acoustic guitar or piano? Well, you could do it, but you would have to be very thorough with your Synth V refinements to completely fool a listener paying very close attention. You might get them second guessing though... and that’s an achievement in itself. That said, maybe someone has maybe already pulled this trick off and the rest of us don’t know about it yet. If it hasn’t happened already, then I suspect it’s not far away.
Synthesizer V Studio is a technical marvel. Synth V’s vocals might not have all the nuances or character of a truly exceptional vocalist, but it’s still a remarkable piece of software. It’s also remarkably affordable, the price of entry for the engine and one or two vocal packs being accessible to almost anyone.
Whether you approve of the concept of computer‑generated lead vocals or not, Dreamtonics appear to have taken actual magic and turned it into code. Synthesizer V is groundbreaking music technology.
Talent Pool
While Dreamtonics have their own collection of voice databases available for Synth V, third‑party developers are also involved in this element of the process. Other companies developing voices for Synth V include Audiologie, AHS Co, Quadimension and Animen.
Solaria — probably the most impressive of the English‑based voices I looked at here — was developed in collaboration with Dreamtonics by US‑based Eclipsed Sounds. However, they have a second singer — Asterian — that, at the time of writing, was about to be launched. This voice database is being built from the vocals of US singer David Hollaway. I got to try a pre‑release version of Asterian and I’ve also heard David sing/speak via YouTube; if you can imagine the voice of Barry White in software, then David’s virtual singing incarnation will have you in the right ballpark. It’s deep, rich and very impressive. It’s also very different stylistically from any of the other existing voice databases, suiting soul and perhaps some hip‑hop styles. It would also work well on song styles related to musical theatre or film. I’ll be very interested to see (hear?) what Eclipsed Sounds have to offer next.
Cover Me
Once down the vocal synthesizer rabbit hole, you may encounter the numerous YouTube cover versions of popular songs, where a Synth V (or Vocaloid) generated vocal has been used to replace the original. Yes, there is the good, the bad and the downright ugly. However, the best of these are a really impressive showcase of just how far this technology can be pushed.
There are also some interesting and educational elements to the workflow used to create many of these cover songs, from stem extraction from the original audio, pitch curve analysis via software tools such as Praat (developed by academics from the University of Amsterdam) and the use of Synth V’s own scripting language (a user‑created script set called ‘Real Vocal’ seem to be the most popular of these) to import the original vocal’s pitch information into Synth V, giving your synthesized vocal elements of the pitch inflection contained within the original.
Some YouTubers also make their Synth V files for these vocals available to download so you can then import them into your own system, and these can be very instructional. If you want a head start, search YouTube for Synth V covers of tracks by Adele, Paramore, Linkin Park and Kate Bush; prepare to be both amazed and amused.
Audio Examples
To provide an impression of what is possible with Dreamtonics’ Synth V, I’ve created four short audio examples which you can find below. Each is based around a different musical style and, in all cases, consist of a simple backing track (to provide the musical context) and a single lead vocal produced within Synth V (no double‑tracks or harmony parts).
These examples were created after I’d been using Synth V off and on over about a two‑week period within which I was working on the review itself. I guess that makes me still a ‘new’ but perhaps not a ‘novice’ user. Even within that period of time, however, the workflow most certainly became faster as I gained familiarity with the various controls and options.
To provide a (hopefully) realistic impression of what’s possible based upon this level of experience, I deliberately limited the amount of time I spent on each of these examples. The backing tracks were quickly roughed out and (a touch of compression and/or reverb aside) are just intended to provide a bed within which to hear the vocal part. I then spent around 60‑90 minutes creating each of the vocals you hear from scratch. This involved a number of steps. First, I created a suitable melody on a piano, and this was then imported into the Synth V plug‑in running within my Cubase project. Second, I then selected a suitable voice database for the specific project and wrote (no prizes for originality!) and entered the required lyrics for each of the melody notes. Finally — and where the most time was spent — I worked my way through the performance, adjusting note timings, pitch data and pronunciation/emphasis of the individual phonemes as required. I also experimented with the various Vocal Modes that each voice offers to provide contrast within the different parts of each example.
Pros
- Truly remarkable technology.
- Can create genuinely useful results.
- Good selection of voice database options, and Solaria is capable of excellent results.
- Given just what’s possible, the price is also remarkable.
Cons
- Results equivalent to an accomplished singer rather than an exceptional one.
- Occasional quirks in an otherwise remarkably smooth workflow.
- Documentation currently a bit thin on the ground.
Summary
Synthesizer V is a truly remarkable product. Dreamtonics are bringing the concept of a software singer right to the very edge of believable.
Information
Dreamtonics Synthesizer V Studio Pro $89; voice databases from $79. Prices include VAT.
Dreamtonics Synthesizer V Studio Pro $89; voice databases from $79.