Is MP3 the ultimate audio file format? No! Martin Russ looks at net resources dealing with what lies beyond...
Computers continue to get faster and faster. This has two effects: firstly, things get done faster (although often the programs expand to utilise all the extra speed and there is no net gain); and secondly, things which were impossibly slow to do become possible. If you had told most people 10 years ago that musicians now would be collaborating over a computer network, publishing pages about their bands on a medium which more than a hundred million people could read, and that you would be able to download CD‑like digital audio via a computer — well, they'd have laughed.
So I'll make you laugh all over again. In five years time, you won't be downloading MP3 files for audio music, MIDI Files will be part of a larger standard, synthesizers may well share a common language for describing the synthesis techniques they use, and even sample CDs will be very different in how they work. And I can show you why!
From MIDI To Structured Audio
As an SOS reader, you'll be aware that MIDI files are 'descriptive' representations of music — they contain a 'recipe' for the music, but the sounds themselves are stored and generated externally, in a synth or other sound‑generating MIDI device. Problems can arise if you want to be able to give files to others to listen — if other people cannot replicate your sounds, the file could sound completely different on playback. To ensure that what you record plays back in a similar way on someone else's setup, you need a standard soundset. That's what General MIDI is for, of course, and as demands have grown for greater control possibilties outside the GM spec but retaining the same degree of standardisation, so 'extended standards' like Roland's GS and Yamaha's XG have been developed.
Now that the Internet is taking off so rapidly, there are new demands on the creators of audio formats and standards — people want to be able to place music on the web for others to download (and, of course, they need to ensure that it always sounds the same, irrespective of who is playing it back). Uncompressed CD‑quality audio results in file sizes that are too large to download practically, even with the fastest computers and modem connections. Compressed audio formats such as MP3 help with the file‑size problem, but audio quality can suffer.
So, what if there was a new standard, a compact audio file format that, like a MIDI file, described the behaviour of the music (with note events, controller and tempo data, and so on), but which also described the sounds that should be used in conjunction with the MIDI‑like file? Files in this format could be transmitted over the Net reasonably quickly, and would always sound the same. One term for a new type of format that does exactly this is 'Structured Audio', and that's what I'm looking at this month. There's a nice (if rather breathless in tone) introduction to it at: www‑edlab.cs.umass.edu/~anramani/mp4_history&concepts.html (see screenshot far left).
In this one web page, there's mention of the elite Machine Listening Group at MIT (the Boston, USA‑based Massachussetts Institute of Technology); CSound, the computer language intended to help people describe sound; NetSound, a reworking of CSound (more on this at the Netsound home page at sound.media.mit.edu/~mkc/netsound.html), MIDI files and samples into an Internet‑targetted file format; and MPEG4.
This isn't what you think. MPEG4 is not the successor to MP3 (for more information on the different formats in one place, check out the MPEG home page (see left) at drogo.cselt.stet.it/mpeg). Things may move fast in the computer world, but not that fast! Instead, MPEG4 is the third in the ongoing series of international standards intended to help the media worlds with common techniques and file formats. MPEG 1 and 2 were aimed at the video for CD‑ROM or digital TV markets, whilst MPEG4 is aimed more at Internet users and the synthetic generation of video or music. In a nutshell, it offers the promise of a relatively compact joint video and audio encoding format — and the audio side is dealt with in much the same way as I've already described
There's an overview of MPEG4 at: wwwam.HHI.DE/mpeg‑video/standards/MPEG4.htm#E10E9, and if it doesn't blow your mind, you aren't concentrating! If you've ever wanted to compare and contrast two different writing styles, then you should read sound.media.mit.edu/mpeg4/sa‑intro.html and compare it to the document at www‑edlab.cs.umass.edu/~anramani/mp4_history&concepts.html that I mentioned earlier. Actually, the Edlab pages provide a second useful introduction at www‑edlab.cs.umass.edu/~anramani/mp4_toolset.html which really starts to throw four‑letter acronyms at you. Once you've got the sound.media.mit and edlab URLs bookmarked, they can act as the jumping off point for lots of exploration.
But back to the story of MPEG4. Back in October 1998, all of the MPEG4‑related activity resulted in a historic harmonisation of music‑industry sound‑synthesis standards. Despite being momentous and far‑reaching, you have probably never heard of any of this. Do DLS2 or SASBF mean anything to you? Then read on...
DLS2 is the MPEG4 common format for creating sounds via wavetable synthesis, and is a merging of work from the MIDI Manufacturers Association (www.midi.org), Creative Technology Ltd (aka Emu Systems and Creative Labs in hi‑tech circles) and MPEG (the motion picture encoding group). The Downloadable Sounds Level 2 (DLS2) is also called the Structured Audio Sample Bank Format (or SASBF) within MPEG, but it essentially provides enhanced DLS1 samples, Creative's SoundFonts and a series of Structured Audio tools from MPEG researchers. The press release is at: sound.media.mit.edu/mpeg4/press_release2.html and is full of warm and encouraging words.
Before you write this off as being just wavetable synthesis, then you should know that the Structured Audio part of MPEG4 is about descriptive methods of sound generation, and wavetable is but one of these. The DLS and MPEG4 page from MIT (sound.media.mit.edu/mpeg4/dls_background.html) reminds us that Structured Audio is also about synthesis, streaming and more. Downloadable samples and 'SoundFonts' are all about the future of formats like sample CDs, and may even influence the direction in which conventional samplers evolve next. If a general‑purpose, standardised way of describing real‑time synthesis develops, the current generation of synthesizers, samplers and hybrids could well be replaced by a new generation of machines which take multiple methods of sound generation to new heights.
In a nutshell, MPEG4 offers the promise of a relatively compact joint video and audio encoding format.
If you look into the component parts of Structured Audio on the web, it's difficult to avoid seriously heavy academic material. The NetSound paper at sound.media.mit.edu/~mkc/icmc96/icmc96.html overviews how NetSound uses MIDI‑file events to control a CSound real‑time synthesis engine, which, instead of using General MIDI‑type sampled sounds, has synthesis techniques like wavetables, FM, granular, additive and phase vocoders in its arsenal, as well as many others. CSound also provides sound processing like reverb, echo, chorus, and so on.
Deeper still is the information on exactly how Structured Audio produces music via a language that is sent as a bitstream (usually over the Internet) and then reconstructed as music using a virtual orchestra. The Structured Audio Orchestra Language, abbreviated to SAOL but pronounced 'Sail', is quite fascinating, and leaves you wanting to rush out and play with it. The SAOL homepage (see screenshot, top) is a good starting point: sound.media.mit.edu/~eds/mpeg4‑old/. Detailed papers on MPEG4 structured audio can be found at: sound.media.mit.edu/mpeg4/sa‑papers.html
The Future Lies... This Way
Before I started you clicking around on the Internet this month, you might never have heard of Structured Audio, and the idea that MIDI, real‑time synthesis, and samples might be used for something other than just inside a MIDI & Audio sequencer would have seemed silly. Well, as we have seen, MPEG4 takes those elements and makes them the basis for a sophisticated method of transmitting music over the Internet, and unlike MP3, where you need big bandwidths and non‑real‑time streaming, Structured Audio offers the promise of real‑time playback over a simple streaming system, maybe even with a modem on a telephone line. Are you still laughing?