Streaming is now the most important way in which music is consumed. So how can you make sure your music sounds as good as possible?
The way we listen to music has changed fundamentally in recent years. Not only are we not buying CDs any more, but increasingly, we’re not buying downloads either. Instead, streaming is now the main route by which music reaches the ears of the public. Last year, for example, 431 billion song streams were logged in the USA alone: that’s well over 1000 for every person in the country!
Obviously, not all streams represent sales, and a download or physical sale might lead to hundreds of listens, whereas a stream always represents a single listen. Nevertheless, the figures are so huge that it’s clear that streaming now dominates the market.
This isn’t only true in terms of volume, either. As of 2015, streaming is now the most important source of revenue for recorded music. Digital revenues now surpass physical sales figures, and for the first time since 1995, overall revenue from recorded music has risen. In streaming, the music business might have found what it has been searching for for more than 20 years: a consumer-friendly way of delivering music digitally and making money from it. The shift from physical media to online delivery is sure to continue, and within the digital sector, downloads will continue to wane while streaming grows and grows.
If you’re producing music, this means you can’t afford to ignore streaming. In fact, it means that if you are going to optimise your output for any particular delivery method, you should choose streaming rather than CD or download. That, however, is easier said than done. Different streaming services will do different things to your music — and most of them don’t publish any details of what it is that they do! So how can you make sure that your mixes will sound as good as they possibly can to today’s listeners?
Like many download formats, streaming usually uses data compression. What is delivered to the listener’s device is not a bit-for-bit copy of the mix that was originally uploaded to the streaming server. Instead, a variety of techniques are used to reduce the bandwidth that is used in the data transfer. Various data compression codecs are used by different streaming sites; and although MP3 is sometimes used, the most widely used codecs are AAC and Vorbis.
All these codecs can operate at different levels of data reduction, and the resulting sound quality can vary enormously as a consequence. However, one thing they all have in common is that they don’t handle ‘overs’ very well. A mix that might sound great in uncompressed 16-bit, 44.1kHz form can easily distort when encoded as a data-reduced audio stream.
This problem has been well known in the TV broadcast industry for many years, and as a result, broadcasters have introduced specific delivery requirements and new metering tools to avoid it. However, music delivery is a much less closely regulated world than broadcast, with a wider and less standardised spectrum of delivery formats. In the TV world, it’s fairly easy to find out in advance what might happen to your audio on broadcast, and to check the effects of this during mixing or post-production. This is much more difficult in the world of music, partly because streaming services don’t always make public what codec settings or loudness normalisation standards they are using, and partly because of the sheer range of things they might do to your mix at delivery.
As streaming grows ever more important, new tools are becoming available that are purposely designed to let you hear how your mixes will sound under different streaming codec settings. Plug-ins such as Nugen Audio’s MasterCheck Pro and Sonnox’s Fraunhofer Pro-Codec, and the Codec Preview option in iZotope’s Ozone 7 Advanced, allow producers and mastering engineers to audition encoded streams in real time, compare them with the uncompressed originals, and use specially designed meters to identify inter-sample peaks that might be causing problems, before making any necessary adjustments to mitigate the effects of codec distortion.
Part of the reason why optimising your audio for streaming is complicated is that not only do different services use different codecs, but that most employ various levels of data reduction depending on the listener’s device, the connection type, and whether or not money has changed hands. Typically, there are three levels of data compression:
- Low bit-rate: mobile connection/free service.
- Medium bit-rate: desktop/mobile subscription.
- High bit-rate: desktop subscription service.
A codec will use three techniques to meet the specific bit-rate requirements of a given streaming context. The first is to discard what is known as “perceptually irrelevant signal”: audio that is considered to be perceptually ‘masked’ by other elements of the music. For example, human hearing might not be able to discriminate a quiet sound occurring immediately after a loud snare drum hit, so the codec would discard the data that is needed to reproduce this sound.
The second technique employed to reduce the bit-rate of an audio stream is called “spectral band replication”. High frequencies in the source audio are discarded, in effect reducing the sample rate; these are then reconstructed from the lower frequencies on playback, using principles akin to those underlying exciters or enhancers.
Finally, where high levels of data reduction are required, a similar approach is taken to stereo. The source signal is matrixed from conventional left/right stereo into Middle and Sides channels, and the Sides information is discarded or drastically data-reduced. Some stereo information is then reconstructed at the decoding stage, but at very low bit rates listeners can find themselves hearing the music in mono.
As explained in the ‘Cracking The Codec’ box, it’s reasonably easy to train your ears to detect the artifacts introduced by different codecs and settings. Deciding which settings to optimise your mixes for is a different matter, and almost inevitably, if your music sounds as good as possible in one set of circumstances, you’ll be compromising the way it sounds in another.
In general, though, it’s best not to be too concerned about how your music will sound at very low-bandwidth settings. These are typically more susceptible to overload problems, and in order to reproduce audio without distortion, require the input to be heavily limited and controlled. The processing needed to do this would often suppress transients and musical detail that is important to the source and which can be reproduced acceptably on higher-bandwidth streams. Given that audio streamed at low bandwidth will never sound great in any case, and will often be consumed in very poor listening environments — on mobile phone speakers, through earbuds on the bus, and so on — it’s not worth removing musical detail that should come through at high bandwidths merely in order to avoid inter-sample clipping at the lowest settings.
Data reduction isn’t the only processing that streaming services apply to music. Nearly all of them apply some form of loudness normalisation. Sometimes this is active only if enabled by the user, but more often, it is switched on by default, and in some cases it is permanently enabled and can’t be defeated. This last category includes YouTube, which is the most-used music streaming service — though, bizarrely, it can take a couple of days for loudness normalisation to take effect after a track is uploaded! And although users of Spotify, Apple Music and Tidal can choose to switch off loudness normalisation, most don’t. What this means is that the large majority of people hearing your music online will be listening to a loudness-normalised stream.
The point of this loudness normalisation is to ensure that when you hear lots of tracks from different artists or albums back-to-back, they will be presented at a consistent level, so the user is not having to constantly turn his or her volume control up and down. Importantly, it does not affect dynamics within a track: rather, the entire track is analysed, and a static gain offset is applied as a result. In some systems, this applies both ways, such that not only are excessively loud tracks turned down, but quiet ones are turned up. Some systems also apply limiting to ensure that loudness normalisation cannot introduce clipping.
The upshot of this is that whereas you can, arguably, make your music stand out in a peak-normalised format such as CD by aggressively limiting the stereo master, this is likely to be counter-productive in a streaming environment. You can test this for yourself with a service that allows loudness normalisation to be switched on and off (see the ‘Sharp Practice’ box).
The take-home message here is that unless there’s an artistic reason to do so, there is no point in reducing the dynamic range of your track beyond the threshold at which the loudness normalisation will start to turn it down. That threshold varies from service to service; of all the popular services, Spotify used to have the highest threshold, at -11LUFS. However, that has now been reduced to -14LUFS, so it is probably wise to treat this as a limit. Going beyond -14LUFS merely for the sake of doing so will yield no benefits on streaming services, and will actually make your music come across less well. Note that to ensure your mixes don’t go beyond this, you can’t use a traditional peak or VU meter: you need a meter capable of displaying Loudness Units (LU).
There is, however, more than one way to skin this particular cat. Loudness normalisation is based on the average loudness of an entire track, and it’s possible to achieve the same average loudness measurement in a number of different ways. One track could be consistently at or around that average reading, with virtually no internal dynamics, while another could have loud and quiet sections that swing significantly above and below it, and both tracks could measure the same. From a mixing and production perspective, this means that by including breakdowns that really drop in level, you can also have sections that are significantly louder than the average. What will stand out in a streaming environment is not louder tracks, but tracks that are more dynamic for a given average loudness.
You can envision all playout services as having a ‘letterbox’ through which the audio must be ‘posted’. By making the right decisions at the mixing and mastering stage, you can ensure that your music usually fits through the letterbox without modification. The key to getting your music to sound great on streaming services is to mix to the optimal PLR or peak-to-loudness ratio, where ‘loudness’ refers to the target loudness value to which streaming is normalised, and ‘peak’ is the highest inter-sample peak level that avoids codec distortion.
Compromises are still inevitable, unfortunately. Different streaming services have different PLR windows, and although loudness normalisation is increasingly the norm, it’s not always present. And even if you want to go to the trouble of creating multiple masters for different streaming services, you can’t easily submit more than one master of each track to an aggregator, so it’s hard to see how you would get them all to the right places.
Finally, there’s the fact that client education and knowledge tends to lag behind technological developments such as loudness normalisation. If you submit a master that is perfectly optimised for Spotify, your client may still complain that it doesn’t sound as loud as tracks in their iTunes library or on CD. If there’s time, you can demonstrate the futility of compressing it further by uploading different versions to a private YouTube account, so the client can hear it after normalisation — but, as previously mentioned, YouTube doesn’t apply its loudness normalisation straight away. Alternatively, you can use a plug-in such as Nugen Audio’s MasterCheck Pro to illustrate the effect straight away. Its Offset to Match button will adjust the playback level to reflect the loudness normalisation value chosen as a reference.
There are still many wrinkles to be ironed out in the world of loudness-optimised playback, and in some ways it’s disappointing that most music is now being heard in data-compressed form. However, it is surely better than the race to the bottom that saw CDs become ever louder and more distorted; and as the bandwidth of fixed and mobile Internet connections continues to grow, we can be confident that data reduction will eventually fade from the scene. Most of all, mastering engineers can once again focus on making music sound as good as possible, without having to worry about competing for level.
Jon Schorah is Creative & Marketing Director of Nugen Audio.
The perceptual effects of the encoding and decoding cycle used in streaming data-compressed audio can vary from the inaudible to the drastic, depending on the source material, the codec and the level of data reduction taking place. If you wish, it’s actually quite easy to train yourself to hear the artifacts in an encoded stream. But before you do this, bear in mind that you won’t be able to unhear it — this is knowledge that could spoil your enjoyment of music for ever...
Encoding artifacts are usually most prominent in the Sides component of a stereo signal, so the easiest way to learn to hear what these artifacts sound like is to matrix the signal into Mid-Sides and mute the Mid component. To get an ear for what these artifacts sound like, begin with a low bit-rate signal. You will hear all sorts of musical noise in the soloed Sides component, representing encoding artifacts; and once you get a feel for what these sound like, you’ll be able to hear them in the full mix too. Once you are confident you can hear the artifacts in the full mix at a low bandwidth, try the same process again with different codecs and higher bandwidths. Nugen Audio’s MasterCheck Pro is a good tool for making this sort of comparison, as it allows you to switch codecs and to solo the Sides signal directly.
If you want to check for yourself the consequences of dynamic range reduction within a loudness-normalised system, a good example is provided by the two different masters of ZZ Top’s ‘Sharp Dressed Man’ that are available on Spotify. The first dates from 1983, when CD was just becoming established as a format, while the second is a 2008 remaster.
As already mentioned, loudness normalisation is enabled in Spotify by default. To turn it off, go to the Spotify menu and choose Edit / Preferences / Show advanced settings / Playback / Set the same volume level for all songs — which, I think we can agree, is not a setting that most users are going to happen across by accident!
With the normalisation switched off, you’ll hear that the 2008 remaster of ‘Sharp Dressed Man’ is noticeably louder than the 1983 master. In fact it measures about 3LU (Loudness Units) louder, and as a result, seems to have more punch when the two are heard back-to-back. The mastering engineer responsible for the remaster did a good job of optimising it for a peak-normalised playback environment. However, with loudness normalisation enabled — as it is by default — the result is different. Both versions play out at the same subjective level, but listeners consistently prefer the 1983 master. It retains more micro-dynamic variation and is widely perceived to have more ‘presence’ and ‘detail’, especially in the mid-range.