The pitch‑shifting and harmonisation facilities available in the Digitech Vocalist series have been widely acclaimed as the most natural‑sounding on the market. Paul White talks to Fred Speckeen of Canadian company IVL Technologies, who are behind the Vocalist and other pitch‑shifting technology.
The Digitech Vocalist series of intelligent, vocal harmonising pitch‑shifters occupies a unique place in the hi‑tech music market, but it's not commonly known that the pitch‑tracking and shifting technology at the core of these products comes from Canadian company IVL Technologies. Fred Speckeen, their Vice President of Music/Pro Audio Products, visited London for a few days recently, and I was lucky enough to be able to secure an interview.
Phil Scott, who was one of the original founders of IVL and is still the President, used to be a professional psychologist, and in his spare time also studied the flute. When his flute instructor told him about intonation and how important it was, Phil decided to look at existing tuners to see if there was some way to get real‑time visual feedback of pitch. Tuners tend to track rather slowly, however, so Phil decided to work on the problem himself. A friend at a telecommunications company in Canada put him in touch with Brian Gibson, and in 1983, Phil and Brian started IVL in Victoria, British Columbia. Their first product was the original Pitchrider 2000, which had an LED display on its front panel to indicate what pitch was being played, and whether it was being played sharp or flat. The pair then took the 2000 to the NAMM music industry show at about the same time that MIDI had arrived on the scene, and it soon became evident that there was interest in pitch‑to‑MIDI conversion for non‑keyboard playing musicians. This launched the Pitchrider series and later the 7000, which was the guitar version of that product.
After Fred Speckeen had filled me in on the company's background, we began chatting about IVL's technology and its future direction.
Has the company pursued guitar‑to‑MIDI conversion any further?
"We have had enquiries, and we could do a much better job today than we did then, but guitar synths have never been big sellers. Vocal harmony generation seemed a more logical way to go."
Without telling your competitors too much about your process, how does your pitch‑shifting system differ from all the other pitch‑shifters on the market?
"Where we differ is in our implementation of pitch recognition. It's a digital system that's faster than traditional methods of pitch‑tracking. Pitch‑shifting is a different type of technology to pitch‑tracking, but you can improve your pitch‑shifting if you know what the original pitch is — if you know what the pitch is, you know what the wavelength is, and knowing this, you can set the splicing points to give the minimum of artifacts. When you're dealing with a monophonic source such as the human voice, the quality of shifting is far smoother than can be obtained using a conventional system that doesn't take account of the input pitch.
It's interesting what we've learned about the singing voice during our research. We think that we can apply some unique solutions to improve both live and recorded singing.
"Our system can determine pitch in about two cycles of the signal, depending on the transients. No single technique works in all circumstances, so we use a hybrid approach that looks at a variety of signal features and statistics, before coming up with a pitch determination. In the pitch‑to‑MIDI days, we experimented with systems that would spit the pitch out quickly, then we'd keep on correcting it over time. The advantage of the way the Roland MIDI guitar system seems to work [where the pitch detection and sound module are all in the same box — Ed] is that you can read a transient before you know what the pitch is, and you can trigger a non‑pitched percussive tone out of the sound module. Once the pitch has been established, you can then crossfade into the pitched sound and the human ear won't notice."
The Digitech Connection
"The company pursued pitch‑shifting when John Johnson, President of Digitech and DOD Electronics Corporation, suggested we ought to try designing an intelligent pitch‑shifter at a far lower cost than the Eventide systems that were currently around. That's when our relationship with DOD started."
At this stage, then, you were taking the old 'chop and loop' system of pitch‑shifting but adding the refinement of intelligent looping to minimise glitching?
"That's right. There was no thought of vocal processing at this stage. However, as we were tracking the pitch of the original input, we could ask the user what key they were playing in, then generate musically meaningful harmony based on what they were playing. Traditional pitch‑shifters could only shift the pitch by a fixed number of semitones. Immediately, we had a volume‑selling product."
Do you design circuitry for Digitech, or do you build the whole product?
"We design and manufacture completely in Canada, but that design process includes much consultation with professionals, Digitech and its distributors, and others around the world. We also prepare manuals and technical documents, and work closely with Digitech on marketing materials, training and trade shows. It's a great partnership."
The really big step must have been your vocal processing system, which manages to shift vocal pitch without making the result sound too much like Mickey Mouse. How did that evolve?
"We were analysing warranty cards coming back from owners of the Digitech IPS33 [reviewed back in SOS May 1989 — Ed], and a number of people were using them primarily for creating background vocals. Brian Gibson looked at the problem of making the voice sound more natural — up to then, we'd been pitch‑shifting in the usual way, by effectively speeding up or slowing down the sound, so it sounds as though your head's getting smaller when you increase the pitch. The new method worked extremely well for voice and also seems to work nicely on instruments like sax. In 1991, we introduced the Digitech VHM5 — a 5‑part vocal harmony processor [SOS review October 1991 — Ed]. It was an immediate hit and has received a number of awards, including a TEC Award from the AES."
I guess you had two main technical problems — optimising the pitch‑tracking to work with the human voice, and processing the shifted sound in a way that sounded natural. There's lots of talk now about systems using phase vocoding, regular vocoding and formant correction — formant filtering and so on. What approach have you taken, and how does it differ from what your competitors are doing?
"I think we have a novel approach. The basic problem is that with traditional methods, you get the chipmunk effect, because as the pitch moves up, the head and body seem to get smaller. Our technology moves the pitch but keeps the body the same size. Because our goal was to make affordable products that produce the harmonies in real time, we didn't choose complex frequency‑domain analysis/resynthesis methods. Instead, we turned to our pitch recognition expertise and implemented pitch‑synchronous techniques that let us do all the processing in the time domain. Doing this in real time takes an amazing amount of processing, and accomplishing that using inexpensive hardware is very difficult. Since the company has always been in the music industry, whenever we look at a new product, the first thing we consider is the price. Apparently this is a revolutionary concept among hi‑tech companies! We try to innovate in both hardware and software, and use off‑the‑shelf components, combined with custom parts, to make products that are affordable."
Is the plan still to keep the pro audio side of your business closely linked to DOD/Digitech?
"That's a very good place to be for us. They are good people, a very strong company, and the distribution network is superb. A new product can be in over 80 countries in 60 to 90 days, and our distributors are very helpful as we develop product definitions, making sure we get the right features at the right price."
Are there any other directions that interest you?
"As you may know, we have established two divisions within IVL, and we're broadening our markets quickly. The two divisions are Music/Pro Audio and Multimedia/Consumer. What's interesting is the synergy we're finding between the two divisions, and it's feeding both our core technology base and our creativity in product design. The Multimedia/Consumer division is actively supplying the commercial Karaoke market in Japan. There, Karaoke music is essentially being supplied as MIDI Files, which can also contain information about key settings, allowing us to integrate our technology for real‑time vocal harmony. We have a number of integrated and stand‑alone products which use that technology, and our customers now hold around 70% of the market share in Japan. Our primary customer is Yamaha Corporation who, as you know, is a major player in the music and pro audio industry, and we're exploring future ways to partner together. Interestingly, Roland is another player in that market. Similarly, we are opening the Korean market, and will ship our first products in June. The Korean market is also predominantly MIDI‑based, so we see great potential. In addition, we have Chinese partners who supply a stand‑alone, non‑MIDI harmony box to go with VCD players, VHS Karaoke players, and CD players. Karaoke is a very big market throughout Asia.
"In the music/pro audio area, we're committed to improving harmony generation, plus a number of other things that you can do with the voice. In particular, we are focusing much effort on expanding a forthcoming line of hardware processors, the Digitech Vocal Solutions. It's interesting what we've learned about the singing voice during our research. From what we have learned about the weaknesses people have with their voices, and the challenges of capturing great live and recorded vocals, we think that we can apply some unique solutions to improve both live and recorded singing."
Might this include things like minor, real‑time pitch correction?
"That would make a lot of sense, and its first pro audio implementation is in the Digitech Studio Vocalist. In addition to pitch aspects, we have also learned a lot about the harmonic content of the voice, and we think we have identified things we can do in both the analogue and digital domain to improve voice quality."
To make your machines work in real time, it's often desirable to have an input of MIDI chord data, but if you're in a band without a keyboard player, this may not be practical. What can you do for these people, other than allow them to dial in a stock key and hope for the best?
"Purely diatonic harmonisation may work around 50% of the time, but in the Studio Vocalist there are a number of scalic harmony styles that help cover songs that are based on mixed scalic modes. Still, songs with multiple key modulations are a problem when you don't have the advantage of MIDI control or our automatic MIDI chord recognition. To handle this in a non‑MIDI environment, the original VHM5, Vocalist II, and Studio Vocalist each have something called Song List, which allows you to specify the harmonies in the order you will need them, and then trigger them from a footswitch as they change. Although designed for the live acoustic musician, I know of a few producers who actually run the Studio Vocalist this way in the studio."
I would imagine one solution would be to build a machine with an off‑line learn mode. You could either sing it a melody, play it a melody, or feed it a MIDI File for a song, and it would be able to deduce the type of harmony required. You could then save that harmony template as a preset and make further manual edits if the resulting harmony wasn't quite right. This would surely be more musician‑friendly than programming.
"We have a number of ideas to make the process more musician‑friendly, and what you suggest would certainly be possible."
What sort of wish lists do your end‑users have?
"There are many things our customers want, but overall they want products that help them get on with the creative process. Our challenge is not only bending technology to meet these needs, but also bridging left‑brain, digital technology to creative, right‑brain users."
Obviously, you have to make the most of your technological strengths, and it would seem that with your expertise, you could do a lot of other interesting things.
"There are lots of things you can do with this technology — and when you analyse pitch, you get a lot of other information as well. If you make a list of the things you can extract from pitch analysis, then map that against a whole range of parameters you can control, things start to look very interesting. We have lists of all kinds of potential products, but what we come back to is — what's our focus? What are we best at and what do our customers need? If we take our attention away from the vocal market, we run the risk of losing that market. We also have to decide how we focus our pure research for future products."
Presumably your technology also has further implications in the guitar and other markets...
"We are still in the guitar market — our Whammy Pedal has been called 'the wah‑wah pedal of the '90s' — but it really comes down to focus and price versus performance. We can direct our pitch‑shifting, pitch‑tracking and other technologies in a number of directions. The reality is that there are far more possibilities than we can possibly tackle, so we're always trying to make the right decision."
I can imagine that working off‑line, you can perform correlations on whole sections of audio to identify and strip out vocal formants, but in the case of a real‑time system like yours, you have to make some assumptions about the character of the voice being processed.
"Yes, this is particularly an issue with voice transformations. With the Digitech Studio Vocalist we have added gender changing, where you can switch the apparent sex of the shifted voices."
This opens up a lot of interesting new avenues for future development, not the least being the ability to rebuild the formants of your voice to imitate well‑known singers. I spoke to Eventide many years ago and suggested that a plug‑in card that enabled Elvis Presley to sound like Bob Dylan (or more impressively, Bob Dylan to sound like Elvis Presley!) wouldn't be out of the question. Aside from the legal implications, is this something we should expect to hear more of in the near future?
"With regard to vocal characterisation, our first implementation is on the Studio Vocalist, and we've been pleased with its utility — in fact we have a demo CD available from dealers where you can hear the effect for yourself. We're pleased with this first step. The kind of thing you're talking about, though, is very demanding technically, but as with all aspects of our products, we are constantly looking for ways to further increase performance."
IVL MULTIMEDIA/CONSUMER DIVISION
- Coda Vivace instrument and vocal music education modules
- Giganetworks karaoke voice processor
- Taito Voice Champ (stand‑alone vocal processor)
- Xing Multistation harmony adapter and multimedia terminal
- Xirlink Magic Mic (stand‑alone,non‑MIDI processor)
- Yamaha/DK DAMII commercial karaoke system
IVL MUSIC/PRO AUDIO DIVISION
- Digitech Studio Vocalist vocal harmony processor (review SOS August '95)
- Digitech Vocalist II vocal harmony processor
- Digitech MIDI Vocalist vocal harmony processor (review SOS July '96)
- Digitech VTP1 dual‑channel tube preamp (review elsewhere in this issue)
- Digitech RPM1 rotary speaker emulator (review SOS February '96)
- Digitech Studio 5000 instrument harmony processor (review SOS August '95)
- Digitech Whammy II and Bass Whammy pitch‑shifting pedals
- Korg ih vocal harmony processor (review elsewhere in this issue)
- Zeta Systems violin‑MIDI converter