You are here

How The Ear Works

Exploration By Emmanuel Deruty
Published March 2011

The most important pieces of hardware in any studio are the ones on the sides of your head. We look at the amazing journey your music makes on its way to the brain...

In this article, we're going to have a close look at the tool we all use every day: the ear. This small organ has quite a few surprises in store for us. We' see that it's literally crammed with equalisers and dynamic compressors, including a multi‑band one. It even includes an extremely efficient filter bank, as well as a highly sophisticated analogue‑to‑digital converter. Armed with this knowledge, sometimes referred to as 'psychoacoustics', we'll discover numerous practical consequences for music production. Those include the choice of monitoring level, ideas for how to deal with bass frequencies in a mix, and a surprising antidote to frequency overlap.

Note that this article won't attempt to cover psychoacoustics in its entirety. In particular, we'll be restricting our focus to monaural audition and setting aside the notion of integration time, which is the audio equivalent of 'persistence of vision'. Stay tuned for future articles covering stereo image issues and the very diverse uses one can make of the ear's integration time in music production.

The Recording Studio In Your Head

We'll start our study of the ear by looking at Figure 1, top right. This drawing shows the morphology of the ear, as usually represented. This is divided into three sections. The outer ear consists of the auditory canal and the exterior of the tympanic membrane, better known as the eardrum. The malleus, incus and stapes, which are small bones often referred to as ossicles, belong to the middle ear, along with the interior of the tympanic membrane. Then there is the inner ear, which includes the cochlea and the semicircular canals. Last, we find two nerves that connect the ear to the brain. (The semicircular canals and vestibular nerves don't relay any information relating to hearing; their purpose is to give us a sense of gravity and balance, so we'll leave them aside.) Figure 1: The morphology of the human ear (diagram derived from Chittka L, Brockmann A (2005): Perception Space — The Final Frontier, www.plosbiology.org).Figure 1: The morphology of the human ear (diagram derived from Chittka L, Brockmann A (2005): Perception Space — The Final Frontier, www.plosbiology.org).

What we call 'sound' is in fact a progressive acoustic wave — a series of variations in air pressure, spreading out from whatever source made the sound. When these pressure variations strike the ear, they find their way through the external auditory canal to the tympanic membrane, setting it into vibration. The signal is thus converted to mechanical vibrations in solid matter. These vibrations of the tympanic membrane are transmitted to the ossicles, which in turn transmit them to the cochlea. Here the signal undergoes a second change of nature, being converted into pressure variations within liquid. These are then transformed again by specialised hair cells, which convert the liquid waves into nervous signals.

This extraordinary signal path encompasses four distinct states of information: acoustic, mechanical (solid), mechanical (liquid), and electric, more specifically electro‑chemical. The very nature of the information also changes: from analogue, it becomes digital. Really! Whereas mechanical information propagation is analogous to the original sound wave, the nervous signal is even more decorrelated from the sound wave than AES‑EBU audio signals can be. Put simply, the ear includes a built‑in analogue‑to‑digital converter. Figure 2, right, summarises these changes of state and nature in the sound information. Figure 2: A complex set of audio devices!Figure 2: A complex set of audio devices!

The Outer Ear: EQ & Compression

Besides such changes in state and nature, the audio signal is also subject to important changes in content. It is, in short, rather heavily equalised and compressed. Let's begin with the external auditory canal, which has the shape of a shallow tube. Now, as I explained in a previous SOS article (/sos/sep10/articles/convolution.htm), reverb in a very small space is so short that it's perceived as EQ, rather than as ambience. The reverberation of the auditory canal boosts frequencies around 3kHz by 15 to 20dB.

At this point, the signal is transformed into mechanical vibrations by the eardrum, which also acts as an EQ. To understand why, compare it to an actual drum. Strike a timpani and it will resonate at certain frequencies, which are inversely proportional to the size of the instrument. Strike a snare drum and, being much smaller, it will resonate at higher frequencies. Strike an eardrum, and it resonates at even higher frequencies, thus filtering the input signal correspondingly. The tympanic membrane is also attached to a muscle called the tensor tympani. When confronted with high sound pressure levels, this muscle contracts, heavily damping eardrum movements. It is, in other words, a mechanical compressor/limiter, allowing low‑level vibrations through unaltered, but damping larger vibrations.

The Middle Ear: EQ, Compression & Impedance Matching

Behind the eardrum, we find the ossicles. The purpose of these minute bones is to convert the eardrum vibrations into pressure variations in the cochlear fluid. Now, converting acoustic waves into variations in fluid pressure is no easy matter — look at what happens when you've got water in your ears. This means that conversion of acoustic waves from air to water is anything but efficient. Put differently, fluids have a high input impedance when receiving acoustic waves.

The ear's answer to the problem is simple: give me a lever and I can move the earth! To override this high input impedance, the ossicles form a complex system of levers that drastically increase pressure variations from the eardrum to the entrance of the inner ear. This is made possible physically by the fact that the eardrum is 20 times the size of the cochlear window. It really works as a conventional lever does: low pressure across a wide area is converted into higher pressure on a small area.

Dealing with impedance matching with a system of levers as complex as the ossicles doesn't come without side‑effects. The ossicles' frequency response is not flat, turning them into another EQ. In this case, frequency response is decent around 0.5kHz, gets even better near 1-2kHz, and then degrades steadily above this frequency. The ossicles also serve as compressor/limiter, thanks to what's called the stapedian muscle. Like the tensor tympani in the case of the eardrum, the stapedian muscle stabilises the ossicles at high levels.

The middle ear also contains the eustachian tube. Now, the purpose of this is simple: seal the opening at the rear of a kick drum, and you suddenly get much less sound! Likewise, if you seal the cavity behind the eardrum, you suddenly have problems hearing properly. This happens regularly; for example, when we're on an airplane or when we get a cold. In both cases, the eustachian tube gets clogged, and that prevents the tympanic membrane from moving as it should.

The Inner Ear: Multi‑band Compression, Pitch Tracking & ADC

By now, the audio signal has reached the inner ear, and that means the cochlea. This snail‑shaped organ is filled with liquid. Logically enough, it must be waterproof, in order to prevent any fluid leaking. This explains the purpose of the round window, a small, elastic membrane on the surface of the cochlea. Its purpose is to allow movement of the fluid inside the cochlea. Liquids are incompressible, and without this membrane, the fluid enclosed inside the cochlea would completely block the ossicle movements. Indeed, stiffening of the oval window can lead to hearing losses of about 60dB.

Inside the cochlean we find the tectorial membrane, which moves along with the pressure variations of the cochlear fluid. As shown in Figure 3, above, this membrane is in contact with the cilia on the top of the hair cells. There are two kinds of hair cells. The outer hair cells are the actual receptors. When the tectorial membrane moves, so does the hair on the the outer cells. This movement is then encoded into electrical digital signals and goes to the brain through the cochlear nerve. The inner cells have a different role: when the audio signal gets louder, they stick themselves to the tectorial membrane in order to limit its movements, playing the role of another dynamic compressor. Figure 3: Inside the cochlea.Figure 3: Inside the cochlea.

This tectorial membrane exhibits a clever design. Its stiffness is variable, and decreases gradually towards the center of the 'snail'. This is a way of tuning the membrane to different frequencies. In order to understand the phenomenon, consider guitar tuning. When you want pitch of a string to be higher, you stretch it so it gets more tense, and stiffer. Generally speaking, stiffer materials are able to vibrate at higher frequencies. This makes the tectorial membrane a bank of filters, with an important result: outer cells are frequency‑specific, each group of cells being dedicated to particular frequencies. Also consider the inner cells, and their ability to attenuate the tectorial membrane's movement. They function as a frequency‑specific compressor — in other words, a multi‑band compressor!

The tectorial membrane's decreasing stiffness towards its end serves another important purpose, which is frequency tracking. A particular audio frequency will set the membrane in motion at a particular position, and that vibration will be sensed by a specific set of outer cells. A comparatively lower frequency will set the membrane in motion closer to the centre of the 'snail', and that vibration will be sensed by another set of outer cells. The brain, by analysing which set of outer cells was put in motion, will then be able to tell that the second frequency was the lower one. Notice how, during this process, the tectorial membrane really acts in the manner of a filter bank, performing an actual spectral analysis of the input signal. Figure 4, below, illustrates the rough position of a few key frequencies on the cochlea. Figure 4: Filter bank frequencies on the cochlea.Figure 4: Filter bank frequencies on the cochlea.

Harmonic sounds come as a set of regularly spaced pure tones: if the fundamental frequency is 100Hz, the harmonic frequencies will be 200Hz, 300Hz, 400Hz and so on. As shown in Figure 5, each one of those frequencies will correspond to a particular area of the tectorial membrane. Suppose a given harmonic sound comes with its fundamental frequency plus nine harmonics. In this case, no fewer than 10 distinct areas of the tectorial membrane will be set in vibration: this provides an abundance of coherent information to the brain, which will have no difficulty in quickly and easily finding the right pitch. This is what makes the human ear so powerful for pitch identification. Figure 5: Pitch tracking inside the cochlea.Figure 5: Pitch tracking inside the cochlea.

With the hair cells, we come to the end of the audio path inside the ear. Hair cells are neurons, and the purpose of the outer hair cells is to convert the mechanical vibrations that come from their cilia into nerve signals. Such signals are binary (all or nothing), and seem to be completely decorrelated from the analogue signals to which they correspond. In other words, they're digital signals, and the inner hair cells are analogue‑to‑digital converters.

What's It All For?

There is no denying that the ear is quite a complicated device, with a lot of processing built in. We've met no fewer than four EQs and three compressors. Are those really necessary? And what can be their purpose?

Let's begin with the built‑in compressors. They're not always active. In a quiet ambience, none of them is functioning. The louder the audio signal that gets to the ear, the more the compressors do to attenuate the audio signal that gets to the outer hair cells. This enables the human ear to withstand a dynamic range of roughly 140dB, which corresponds to a ratio of 100,000,000,000,000:1 between the loudest and the quietest sounds we can perceive. Considering we're dealing with three tiny mechanical compressors, that's no mean feat.

The actions of the equalisers are also far from accidental. If you refer back to the above descriptions, you will notice that the frequency values that get boosted are centred around 1-3kHz, while additional processing inside the cochlea adds a strong gain near 4kHz. Those frequencies correspond to the part of speech that's linked to understanding, so EQs in the ear make us more able to communicate with each other. Oral communication being an extremely important aspect of homo sapiens behaviour, we can safely assume that we're dealing with Darwinist EQs.

What About The Bass?

Naturally enough, the way the ear works has huge consequences for how we perceive music — and for how we make it. Take the EQs inside the ear: we've seen that they privilege the frequencies around 0.5‑4kHz. Those three octaves contain the frequencies we are most sensitive to. And, naturally enough, they are the ones most favoured by musicians and composers, consciously or not. Think about it: how many concertos for violin have been written, and how many for double bass? How many for trumpet, and how many for tuba?

In more modern applications, consider the kick drum in a mix. If you want it to be heard, you have to boost the medium‑high frequencies that are present in the attack. The same goes for the electric bass: if you want it to be completely drowned inside a mix, remove the medium‑high frequencies! It's as simple as that. To make bass instruments clear and audible, you need to make sure the mid-range frequencies are in place. Low frequencies themselves are clearly useful — they can convey roundness, body, strength, and so on — but any part of the musical language linked to understanding, anything that requires sensitivity, needs to be inside or near the 0.5‑4kHz range.

For an example, think about music being played through a telephone. Telephones usually degrade considerably anything outside the 0.5‑4kHz range. Sure, music through a telephone is ugly, but the essential message of the music remains intact: it's easy to recognise the song, and you have no trouble hearing the different instruments. By contrast, try listening to music through a 100‑400Hz band‑pass filter. This is still a three‑octave‑wide filter — as wide as a 0.5‑4kHz in terms of logarithmic frequencies — but it makes music completely incomprehensible. It becomes very hard to recognise any of the instruments, and the message is completely lost.

Before the advent of the studio, this 0.5‑4kHz constraint was not so much of an issue, because most instruments operate within this frequency range anyway. With electroacoustic music — and that includes many kinds of pop music nowadays — one must be more careful. The 0.5‑4kHz range should contain the most important part of the message. To carry the essence of your music, forget about those bass frequencies, your priority is elsewhere.

The same naturally applies to the very high frequencies. Don't try to convey your musical message using the 4‑20kHz range: it's useless. (This, admittedly, is so obvious that only a handful of unreasonable avant garde composers with a strong tendency to sadism have ever been tempted to write and produce music based mainly on such frequencies.)

The 'Ghost Fundamental'

Looking back at Figure 5, one observation jumps to mind: together, the harmonics include more information about the pitch of a sound than does the fundamental frequency. There can be as many as 16 audible harmonics, setting 16 zones of the tectorial membrane in motion, whereas the fundamental tone only puts one zone in motion. Not only are the harmonics important, so is the spacing between the harmonics, which is constant, and repeated up to 15 times. Just think how it's so much easier to identify pitch from a 'normal' instrumental sound with harmonics than from a pure sine wave. In fact, the fundamental frequency is so unimportant as far as pitch is concerned that it can simply be removed without any impact on pitch. This experiment was first performed by Pierre Schaeffer during the 1950s.

This knowledge is very relevant to music production. Let's say that you're writing an arrangement for a song and you find yourself fighting frequency overlap in the 100‑1000Hz range. For instance, you've got a sampled brass section that gets in the way of the lead vocals. A straightforward solution would be to equalise the brass section to attenuate the frequencies that neighbour those of the lead vocals. This will probably work, but there is a more radical solution: just remove the fundamentals from the brass samples. The pitch will remain the same, and the frequency overlap problem will be greatly reduced. True, it will alter the brass section timbre, but so would an EQ.

Conversely, it's perfectly possible to 'design' whole instrumental sections that appear to be pitched between 100 and 200Hz, whereas in truth this frequency range remains unused. This leaves more room for other instruments, which in turn means more freedom, richer arrangements, and easier mixes.

How Loud Should You Mix?

Let's recap and list the ear's built‑in EQs and compressors:

  • External auditory canal: EQ.
  • Eardrum: EQ + compressor.
  • Ossicles: EQ + compressor.
  • Cochlea: EQ + compressor.

In three cases out of four, dynamic and frequency processing are closely linked, with a fundamental consequence: the timbre of the sound we hear depends on its intensity. In other words, the ear applies a different EQ to incoming audio signals depending on how loud they are. This behaviour is very different from that of a studio EQ, which applies the same frequency gains regardless of input signal level.

Now, this is quite natural. Remember that most transducers in the ear are mechanical. They're physical objects that are set into vibration. Obviously, you can't expect a vibrating object to react identically when it's slightly shivering and when it's moving like crazy. This is particularly true of a complex lever system such as the ossicles. It's only common sense that the resonant frequencies will differ. This means that you just can't expect the ear's EQs to apply the same gain whatever the input level — and they don't.

When representing the ear's EQ transfer functions, we have to do so for a set of given input levels. See Figure 6, on the previous page, for three transfer functions at different levels. We've assumed, for this graph, that the EQ's gain is always 0dB at 1000Hz. Those transfer functions are derived from what's called the equal loudness curves, shown in Figure 7. They're far more common, but a bit obscure, and not quite related to the concept of EQ. Figure 6: The equalisation applied in the ear depends on level.Figure 6: The equalisation applied in the ear depends on level.Figure 7: The well‑known Fletcher‑Munson, or equal loudness curves.Figure 7: The well‑known Fletcher‑Munson, or equal loudness curves.

Obviously, the ear's EQ transfer functions are of enormous importance when it comes to mixing. Looking again at Figure 5, what can we read? Consider two tones, one at 60Hz and one at 400Hz, that seem to be equally loud. Reduce the monitoring level by 40dB, and the 60Hz tone will now sound 15dB quieter than the 1000Hz one. To be convinced, listen to a kick drum sample very quietly: you don't hear any bass frequencies. Turn up the volume considerably, and suddenly the same kick drum can be heard with plenty of bass frequencies. This brings an important consequence: imagine you're in the middle of a mix and you're listening to it at quite a high level. Everything is going well, your sound is round and powerful... Now reduce the monitoring volume: hey presto, no more bass frequencies. Your mix is shallow and anything but powerful. So are you sure you want to mix that loud? Naturally, the same principle applies when you're actually writing the music: if you don't want to be disappointed afterwards, turn the volume down. High monitoring levels are for the consumer, not the producer. And, by the way, notice how this phenomenon entirely confirms what I was explaining above: forget about those bass frequencies, really. They're just not reliable.

The choice of a given monitoring level is not trivial. Not only does cranking up the volume change the frequency balance of a mix, but it also changes the pitch. You don't believe it? Try the following experiment. Take a pair of good‑quality headphones, such as Beyerdynamic DT770s or Sennheiser K271s. Use them to listen to some music at high level, then remove the headphones from your head, and place them 10 centimetres in front of your face. You will hear all pitches of the song going down noticeably, by up to half a tone. This effect is more obvious with certain tracks than others; try it with the chorus from the song 'Circus' by Britney Spears.

Now that we know how the ear works, we can suggest an explanation for this phenomenon. When we listen to loud music, all the compressors in the ear swing into action, including tensor tympani and stapedian muscles. The tensor tympani tightens the eardrum, and the stapedian muscle stiffens the ossicles. As we've seen before, when a vibrating object gets more tense, it resonates at a higher pitch. The same applies for the eardrum and ossicles. As a consequence, when audio input gets considerably louder, the whole mechanical part of the ear will resonate at higher frequencies. This prompts a global stimulation offset of the tectorial membrane towards higher frequencies, and confuses the brain into thinking it's listening to transposed musical content.

Conclusion

As we've seen throughout this article, the ear is a very complicated device, and hearing is a delicate and complex phenomenon. Even if we restrict ourselves to monaural audition, the sense of hearing seems to be at the origin of many illusions. Perceptual aspects that we generally think of as independent, such as timbre, volume and pitch, are in fact inter‑related. The fundamental frequency of a harmonic sound, the very one we use to define pitch, is actually of no importance at all as far as our perception of pitch is concerned.

This should put into perspective the common saying that the ears are the ultimate judge in music production. To some extent they certainly are, but as we are now aware, they can also be fooled extremely easily. The only reasonable suggestion would be prudence. When designing sounds, when mixing, in music and/or ambience production in general, when you want to evaluate what you're doing, take your time. Change your monitoring level, walk to the other end of the studio, listen to your work on other speakers... anything that can stop you fooling yourself. And don't hesitate to use meters, be they loudness meters or spectrograms: true, they show reductive information that is not necessarily related to what we hear, but they cannot be tricked.

When writing or producing music, the ease with which the ear is confused can also work to your advantage. Knowing how the ear works means that you have a head start when it comes to shaping the message you want your listeners to receive!  

Why Hearing Declines With Age

When people age, their hearing degrades. Considering the complexity of the hearing system, this should not come as a surprise. The main symptom is difficulty in hearing high‑frequency sounds. This problem originates from the destruction of the more fragile hair cells associated with high frequencies. Why are those hair cells more fragile than the others? Because they have to be much more sensitive: in a normal environment, high frequencies are much quieter than low frequencies. Figure 8, below, shows the spectral density of a male human voice. Frequencies above 8kHz are 45dB below frequencies around 200Hz, and don't forget that ‑45dB means 32,000 times quieter. When listening to someone talking, hair cells associated with frequencies above 8kHz have to be 32,000 times as sensitive as the ones associated with frequencies around 200Hz: no wonder they're more fragile. Figure 8: Spectral density of a male human voice.Figure 8: Spectral density of a male human voice.

Indeed, those particular hair cells are subject to much stress. Listen to loud music and they get destroyed. Eat too much fat or smoke too much, your arteries become clogged, the hair cells don't get enough oxygen, and they die. This concerns the inner hair cells as well, and since those behave like a dynamic compressor, when they die, your tectorial membrane becomes less able to handle high signal levels — so even more outer hair cells die as a result! It's no wonder so many people become deaf after 60.

Trouble doesn't stop at the inner ear. Muscles and body tissues in general get less elastic with age, and this naturally applies to both tensor tympani and stapedian muscles. Since they don't work as well, they're less able to perform dynamic compression. This results in louder signals getting to the inner ear, and even more hair cells being destroyed. Incidentally, this also explains why some older people sometimes hear sounds as overly loud: it's when dynamic compression of the middle ear doesn't work as it should.

On a brighter note, for some time this loss of high‑frequency hearing can be compensated for, thanks to a very interesting phenomenon. Suppose that, at some point, a group of high‑frequency‑specific hair cells are destroyed. The brain will reconfigure all the remaining hair cells, so that hair cells that were previously dedicated to lower frequencies will now handle the missing higher frequencies. Naturally, this has drawbacks. To begin with, it reduces the number of cells dedicated to each frequency band. This loss of definition reduces our ability to discriminate between different frequencies that are close together.

For people with perfect pitch, this reconfiguration of the inner hair cells brings annoying consequences. Think about it: once reconfigured, the hair cells that were previously associated with a given pitch are now associated with a different one, so the brain is going to confuse one pitch with the other. To give a practical example, for the last five years, I have consistently heard everything approximately half a tone lower than standard 440Hz tuning. If you ask me, an A is really a B-flat. As I said before, it's really annoying.

A final consequence of hearing impairment is an increase in 'otoacoustic emissions' — in other words, the ear's background noise. Older people sometimes claim they hear 'ringing'. Such ringing originates from the hair cells dedicated to the highest frequencies these people can still hear. Those cells being in bad shape, their signal‑to‑noise ratio diminishes considerably, to the point that it's practically 0dB: background noise gets confused with actual audio signal.

Not that I want to sound grumpy, but if you want to make a career out of sound engineering, mixing or mastering, stop smoking, get some exercise and turn that monitoring volume down. Otherwise, in 25 years from now, you're out of a job...