You are here

MPTEG4 Specification; Multicasting

MPTEG4 Specification; Multicasting

Dave Shapton explores part of the MPEG4 specification: Structured Audio, which allows music to be streamed with absolutely no loss of quality at extremely low data‑rates. Is this the way that all media will be encoded in the future?

At the end of last month's Net Notes I mentioned an audio format that allows music to be streamed with absolutely no loss of quality, and at extremely low data‑rates. And I told you there's a catch — which is that the music concerned is created algorithmically and, surprisingly, the way it's done is part of the MPEG specification.

You've probably heard about MPEG, the modestly named Motion Picture Experts Group. MPEG1 and MPEG2 are used for video compression. MPEG1 is optimised for reproducing video on computer screens and is often used for playing video from CD‑ROMs; indeed, the Video CD format, which bombed completely in the West, has become the dominant way to distribute films in the Far East. MPEG2 is the basis for digital TV and DVD. And, of course, MP3 is actually MPEG1, Layer 3 audio. For more information about MPEG have a look at:

MPEG4 is the next level of MPEG video compression and it's a very ambitious specification, which is only partially implemented. A very small part of the MPEG4 specification, the part concerned with lossy video compression, is used as the basis for Microsoft's Windows Media 7 format, although Microsoft describe it as "enhanced" MPEG4. And part of the MPEG4 specification is what it describes as "Structured Audio".

Structured Audio

MPTEG4 Specification; Multicasting

The way Structured Audio is communicated is different to any other kind of audio format. That's because it is actually sent as a computer program — a program that, when run, generates a sound. A computer language, called SAOL, is used to describe the algorithm that produces the sound. The sound generation method is immaterial; all that matters is that the process can be described as an algorithm — which means that just about any synthesis technique can be used, whether it's FM, Additive, Subtractive, Physical Modelling or any new technique that may be developed in future.

Key to the technique is that it is 'normative': in other words, when decoded, the audio will sound the same wherever and however it is reproduced. It's also resolution‑independent, because it is essentially a description of the sound as opposed to a physical representation of it (Postscript, the Page Description language works in exactly the same way). So Structured Audio can be decoded to any sample rate and any bit depth.

If all this is giving you a feeling of déjà vu about MIDI, you're not far wrong, because Structured Audio includes legacy support for MIDI. But MIDI only specifies which notes are played and how, whereas Structured Audio specifies how the notes sound as well. Structured Audio is interesting because it hints at how all media will be encoded in future: as objects rather than samples.

Digital Representations

So, what is object‑based media and why is it so important to the Internet? To understand this we first need to look at the way we represent digital media now. As the percentage of SOS readers who understand audio sampling is probably higher than any other body of individuals, I won't use valuable column space by going over familiar stuff — so here's the bit that's important to this discussion.

When you sample audio, you cut it up into chunks. The theory says that if the chunks are small enough, and there are enough of them, then you won't notice that they are discontinuous. In fact, if you filter them properly, the result has no 'blockiness' at all. But there are side‑effects, such as aliasing (the presence of non‑harmonically related tones caused by interference between the original signal and the sampling frequency). So it's good, but not perfect.

Now think of a visual equivalent to audio sampling. Think of a letter 'I' (in a font like Arial). If you were to look very closely at your computer screen you'd see that it is simply a vertical line of dots. Now, what about a slash (/), like the character you find in Internet addresses. Look closely at that and it's a staircase. Even if your screen is running at a very high resolution, it'll still look like a staircase — but there'll be more, smaller steps.

Object‑based Media

How can we improve on this? Well, it's easy enough to describe the letter 'I'. I can do it now for you: it's a black vertical line. A slash is a black vertical line leaning over a bit. This might sound trivial to you, but it's not; because if I send that description to a printing or display device, it will reproduce the letter at its maximum resolution. Computer screens typically have a resolution of around 70 dots per inch (note the precise unit of measurement 'the dot'!). The print you're reading now has a resolution of around 2,400 dots per inch. I mentioned Postscript earlier. It's a 'page description language' and it works just like this. When you feed a Postscript 'description' into a Postscript interpreter it recognises the descriptions of the letters (and where they are put on the page), draws the outlines of the letters and fills them with as many dots as are needed.

MPEG4 structured audio works by describing the calculations that will make a particular sound, and then, having provided the means to re‑create the sound, describes the way in which the sounds are to be played (thus playing a similar role to a MIDI file, in my opinion).

The fact that MPEG4 Structured Audio can work with physical models is, for me, the most exciting aspect of it. And it makes the analogy with Postscript even better. Postscript gives brilliant results because it can be used to describe the 'outlines' of characters. The precise outline of a set of letters is what gives a font its individual look, and that's exactly what physical modelling does, in one or more extra dimensions. A physical model of a trumpet describes the 3D outline of the instrument, together with other definable properties that make it sound the way it does. Of course, the better the blueprint, the better the sound. By sending a description of the instrument as a model, and then sending the notes (and data about how the notes are played) you have to send far less data than you would using conventional methods which involve transmitting sampled audio. And, best of all, the recreated audio will be as good as the original!

Of course, it's not possible to take conventional audio and convert it directly into MPEG4 Structured Audio. It may be possible to synthesise solo instruments as the result of intense analysis, and it may even be possible polyphonically; but I wouldn't like to say when.

MPEG4 is the first media compression and storage format to consider the use of object‑based media. It's a fascinating and important subject, and one which I'll be covering in more detail in future articles. Don't forget that Structured Audio is only a very small part of the MPEG4 specification: there more conventional "lossy" audio codecs as well.

MPEG4 is not the final flavour of MPEG. There's MPEG7 and even MPEG21 — each version tries to be more abstract and general. I reckon there will eventually be an MPEG Infinity, which will try to encapsulate everything. The popular name for it will be "the Universe".


Another technique for streaming to large audiences is called multicasting. If multicasting sounds like broadcasting, you're right, but there are some important differences, the biggest of which is that multicasting only takes place at the 'edge' of the Internet. I know it sounds odd to talk about the edge of something as shapeless as the global Internet, but it does make sense. To understand the concept you need to think of the Internet again as branches on a tree or, more accurately, as two trees cut in half and joined at the trunk.

On this model, your local ISP is what we would describe as the 'edge' of the Internet, where the last division between the smallest branches can be found. Between the ISP and you there are no further devices, (called routers) to direct the stream packets. You can think of the local ISP and everyone connected to it almost as if it was a local area network! Of course, it's different from a 'real' LAN because not all the users are connected using the same piece of wire — in fact, many of them will be relying on the Plain Old Telephone System (known as POTS in the acronymically hyper‑active communications industry). But in terms of data topology — what connects to what — it's a LAN.

Multicast works in a really clever, almost obvious way. A multicast stream is sent to a single address. Surely that's a Unicast? No. It's a special address. What's special about it is that every user that wants to receive the multicast actually pretends that its own address is the multicast address. It's as simple as that: if everyone that wants to receive the stream pretends to have the same address, and that address is the 'virtual' address that the stream is sent to; then you only have to send one stream, however many people want to see it!

So why doesn't everyone use multicast, all the time. Well, because it only works on the edge of the network. Which means that, except in special cases, it only works where there are no routers between the user and the network, which in practice means the link between the user and the ISP.

Multicast has a place in Internet broadcasting, though. Firstly, some ISPs cover wide areas and have thousands of users. I've seen some webcasts where looking at the 'properties' tab of my media player reveals that I'm part of a multicast group. It's also useful on an Intranet (that's what you call a Local Area Network that runs Internet Protocol). Note that some Intranets are big enough to have routers, but they are not normally big enough to behave as non‑deterministically as the Internet. There are even special Multicast Protocols that will allow multicasting even over LANs with routers.

If you want to know more about multicasting, and have a strong stomach for things deeply technical, have a look at www‑ (it's an unusual web address, so make sure you type it exactly as it is printed with an underscore between 'mbone' and 'review'). The same web page also talks about Mbone, which is a project to set up a Multicast Backbone. Mbone is essentially a special protocol that works with Mbone‑enabled routers on the Internet to allow a Multicast stream to disguise itself as a Unicast. The reasoning behind this is that you only need to send one stream to an edge location on the Internet, where it can be converted to a local Multicast. It is actually a very good solution to broadcasting on the Internet, but my impression is that Mbone is not widely used or supported, and that bigger companies like Akamai and Digital Island are using their own fast, high‑bandwidth links to provide high‑capacity streaming links for their customers.

Well, that's enough about streaming for the time being. If you've stayed with me this long, congratulations! And remember, if you find this stuff hard, you're not alone. The fact is that the technology is moving so quickly that even if you only have a superficial knowledge of how it works, you'll be way ahead of most people!