You are here

Physical Modelling, Firewire & DSP

Cutting Edge By Dave Shapton
Published July 2000

Yamaha's VL1 was the first commercially available physical modelling synthesizer. It provided accurate models of many real instruments, but no‑one has yet come up with a physically modelled piano.Yamaha's VL1 was the first commercially available physical modelling synthesizer. It provided accurate models of many real instruments, but no‑one has yet come up with a physically modelled piano.

Physical modelling is continuing to develop with increases in DSP power. Dave Shapton ponders the possibilities for future music technology applications, and also sees trouble on the Firewire horizon...

In 1985 I had a call from an old school chum. He'd been working for a company designing modems and had come across a curious new device called a Digital Signal Processor (DSP). His idea was to build a full‑scale digital mixing console using these DSP chips. At the time the only full‑scale digital mixing desk was the Neve DSP1 — a significant development for pro audio, but based on a technology that used separate logic chips rather than DSPs. My friend reckoned we could do pretty much the same thing at a fraction of the cost, and possibly end up with a more powerful device.

Surprisingly, we attracted funding for the project and built working prototypes, which we touted round the professional trade shows for a couple of years. One feature of our device, which was hailed in professional quarters as 'radical' and 'the way of the future' was that there was no control surface. The truth of the matter was that we couldn't afford to build a control surface, and still less did we have any idea how to design one. Nor did we have any A‑D and D‑A converters (the situation was not as ludicrous as you might think, because we had a direct digital link to a Sony DAS24‑track digital recorder).

What is not so surprising is that the money ran out just when interest in the product was picking up. Our bank balance started to glow fluorescent red at about the same time we realised that our DSP chips (which were only ever intended to power digital telephone exchanges) were seriously underpowered and, basically, inappropriate for the job. We had to call it a day and I had to concentrate on more mundane matters like earning a living.

Ever since then I've kept an eye on DSP developments and I still get a jolt when I read about yet another staggering increase in processor power. I've just had another shock reading about the Roland VP9000 in last month's SOS. Before this it was Yamaha's DSP Factory digital studio card — virtually a whole professional mixing desk, plus effects, on a single PCI card. I got another jolt when I realised, a few years ago, that you could do real‑time digital signal processing on a completely standard, unaccelerated desktop computer processor. I believe we're now into a 'new age' of DSP where we no longer have to worry about processing power for doing things like mixing, dynamics, EQ and reverb. Individual chips pack enough power for all but the most demanding tasks, and if you really need hundreds of channels then you can use multiple DSPs.

Model Forte

The Matrox RT2000 is one of the first widely available cards to use the Firewire interface, and its popular success has shown up problems with some combinations of product and chipset.The Matrox RT2000 is one of the first widely available cards to use the Firewire interface, and its popular success has shown up problems with some combinations of product and chipset.

I suppose the first 'new age' DSP process was physical modelling. It's been around for a while now and we're getting quite accustomed to the technique. However, there are still some PM goals that seem some way off: a decent acoustic piano, for example. Real pianos only have one preset, but there are so many interacting physical factors that they combine to make every note in every performance unique. Think about it: you've got the velocity of the hammer, the resonance of the metal frame, the strings themselves (some of which are in pairs or threes) and the wooden case. We've lived with piano samples for years, and some of them are pretty good. But without accurate PM, there is an absolute limit to the plausibility of a piano sample (and this applies more or less to most polyphonic instruments), because if you play two notes together, what you get is not two notes but two pianos. That's because every note creates a unique set of circumstances. Where the frequency of a note has a close correspondence with a part of the structure of a piano, then that component will resonate louder than it would if a note with which it had a nonharmonic relationship was playing — but then that other note could be resonating with something else. And then, of course, there's the other strings to take into account. (You can demonstrate this sort of effect if you take the lid off a piano, press the sustain pedal and sing or shout into the piano case. You'll hear a sort of eerie frequency‑quantised reverb which is the sound of the strings resonating to your voice.)

Without modelling, the only way to accurately reproduce the sound of two notes being played together is to sample those two notes (or three notes or four, or whatever). That's all fine as long as you understand that you'd ultimately need more individual samples than the number of notes in the universe. Then you'd have to multiply this quantity by the number of permutations and combinations of velocities that you hit the notes with. What you'd end up with is a number that is from our perspective indistinguishable from infinity, and a prospect of achieving this that is indistinguishable from zero.

But you could do it with physical modelling. You've seen how the quality of 3D images in computer games has become more and more convincing. Three‑dimensional figures in computer games are made from shaded triangles — the smaller and more numerous the triangles, the more convincing the model. The detail in computer games has grown in proportion to available processing power, and I think the same is likely to be the case for PM, where complex instruments are concerned. The more DSP you throw at a model, the more faithfully you can reproduce the way a note will sound, taking into account the effects within the instrument of playing that note.

There are other 'new age' DSP processes that I'd like to see (I really don't know if these are feasible, and if anyone has any ideas on this perhaps they could let me know...):

  • How about a MIDI guitar that could extract multiple notes from a single pickup? It's a difficult task for DSP, but not impossible I would have thought.
  • Cancellation or correction of room acoustic. It works in reverse: Sony's convolution‑based reverb 'samples' a room by testing its impulse response, so perhaps if we could provide a DSP device with data on the impulse response of a room it could remove it as well. I daresay you'd have to measure the impulse response from the microphone position, though, and you'd have to do this to an extremely high precision to avoid introducing artifacts. (Perhaps it would be better to just gate it and EQ it after all!)
  • Derivation of a physical model algorithm from an acoustic input. This would be a type of resynthesis. There are probably limits to how accurate it could be, but you'd probably get some highly innovative new sounds from the mistakes made in the process!

Captured On Film

There's a new video‑capture card from Matrox that is causing quite a stir in the video industry. Not only can it play two video tracks simultaneously for effects such as dissolves (what we'd call crossfades) and picture‑in‑picture, but it can do 3D effects in real time too. Probably the most impressive of these is the 3D page‑turn. Of course, a serious video editor is about as likely to use page‑turns as a serious record producer is to apply flanging to a folk song — but that's not the point. What is important about the RT2000 is that it works with the DV format. DV is like the DAT of video, except that it really has taken off as a consumer format in a way that DAT was never allowed to do because of copyright piracy paranoia.

It's very easy to mistake DV cassettes for DAT tapes: they use 6mm tape in a very compact package. However, whereas DAT is an uncompressed audio format, DV is video that is compressed by 5 to 1 — that's the same ratio as Minidisc, by the way. Minidisc sounds great, and DV video looks fantastic compared to the vague watercolour splodges we are used to with analogue domestic formats such as VHS. DV can be good enough to broadcast, although you probably wouldn't use it to film a BBC costume drama. You'd use it for documentaries, though, and for any context where the message was more important than the medium. DV is actually so good that, with a good lens and a big enough light‑sensor array (called CCDs or Charge Coupled Devices), it can give professional formats a really big scare.

I know from the last readership survey that a significant number of SOS readers are interested in desktop video: but that's not why I'm writing about the RT2000 in this column. The real reason is that the RT2000 is one of the first devices employing Firewire/
I‑link/IEEE 1394 that is being used by a large number of people; and there are some interesting lessons to be learned for when we start using this type of technology in the music arena.

Physically, Firewire is about as exciting as a shoelace, but it lives up to its name in performance. Current versions of the interface can do 400 MBits/sec. To put that in perspective, CD‑quality audio has a data rate of 1.4 MBits/sec, and most MP3 files are encoded at 0.128 MBits/sec. Even uncompressed studio‑quality video only (only!) has a data rate of 270 MBits/sec — well within the capabilities of Firewire.

Dv Deviation

The DV video format compresses video to a data rate of 25 MBits/sec which, together with audio and machine control information, doesn't add up to anything that would suggest a problem for Firewire. Instead, the problem — as more and more ordinary users are finding — is with the way the data is interpreted by devices at either end of the Firewire connection.

Surprisingly — because you would think this would be the most difficult thing — most devices seem to work with video transfer. Plug a DV camera into a DV VCR and you'll normally get a picture. With luck you'll get sound too. Now, one of the best features of DV is that, in addition to digital video and audio, the Firewire connection carries machine control and timecode as well. Brilliant! All the data (video and audio) and metadata (timecode and control), in both directions, down one cable! Does it work? Sometimes. Is it really annoying when it doesn't work? Yes: enough to make you wonder how a system like this, designed from the ground up, could possibly work in such an erratic and unpredictable way.

There are, of course, inconsistencies in the way DV machine control is implemented across the wide range of devices available. That's not altogether surprising, because it's been like that with professional analogue video and audio for as long as there has been machine control. I'm going to look at this in a bit more detail in a future article, but for now there's an even more fundamental problem: the differences between chipsets that implement Firewire, and other new interfaces such as USB. What seems to be happening is that manufacturers of end‑user goods, such as DV capture cards or USB audio interfaces, are at the mercy of whoever made the chipsets their equipment is based on. I've found that with DV/Firewire and USB, the chipset can dictate whether the equipment works or not. With DV, Sony chipsets seem to work with just about anything. Those from Adaptec and Texas Instruments give varying results. With USB, some motherboard chipsets work as they should. With others, you can get single‑sample glitches and, which is worse, low‑level artifacts that you could easily miss if they were masked by audio content. So the message, more than ever, is to make sure that you test any setup using either Firewire or USB before you commit to buying it; or make sure the shop will take it back if there are problems. It's early days for these interface formats, but the more feedback we give the manufacturers when they don't work properly, the quicker they'll get fixed.

Graphics & Video: Matrox RT2000

The RT2000 video‑capture card is the most powerful video device yet to appear at an affordable price. It's actually based around a computer graphics card, the Matrox G400, which in its normal guise is popular with computer gamers because it's fast, especially at 3D. It's not unusual to find computer displays running at four times the resolution of a TV picture, and the frame rates are higher too. With this in mind, it's perhaps not so surprising that a well‑specified computer games card can easily process broadcast‑quality video: all it needs is some video I/O and some drivers. One speciality of the RT2000/G400 is 'bump‑mapping': the process of superimposing a video picture on a shape or textured surface. It can do this in real time, which means that there is actually no limit to the number of effects it can do, given the right software. Maybe in a few years we'll be able to map our faces onto old Duran Duran videos — or even do the reverse, so that even if you don't sound much like Elvis, you could at least look like him!