We introduce a little‑known mic configuration that has some unique advantages over more familiar stereo arrays.
There are myriad stereo microphone arrays, each with unique strengths and weaknesses. If you have only a pair of cardioid mics, your options are limited to ‘coincident X‑Y cardioids’ or some form of near‑spaced array such as DIN, NOS or ORTF; the last of these is the best‑known and most popular. In this article, I‘ll suggest an alternative which offers some worthwhile advantages. I can’t claim the credit for it: I first came across this idea in the mid‑’80s, but was reminded of it recently when searching for something else entirely in my Studio Sound magazine archive. As far as I know, the late, great Michael Gerzon (of Ambisonics fame) first described this arrangement in print, and he credited Tony Faulkner. But as Tony already has a couple of eponymous mic arrays, I’m taking the liberty of naming this one for Michael!
Before describing this ‘Gerzon array’, let’s consider some pros and cons of other cardioid stereo arrays. When Alan Blumlein was developing his ideas for stereophonic sound in the 1930s, he noted that to achieve accurate imaging across the stereo soundstage, the signals reproduced by the speakers should differ only in their relative amplitudes, not their phase or timing. To achieve this, he developed the concept of the coincident mic array, in which directional capsules are mounted directly above one another (ensuring phase coincidence in the horizontal plane), and angled so that their relative sensitivities vary with the sound source’s angle of incidence. In other words, a sound source off to one side of the mic array’s central axis will be picked up at a greater level by one mic than the other, simply because of the degree of off‑axis rejection afforded by the mics’ polar patterns, thereby generating the required inter‑channel amplitude difference.
An X‑Y coincident array formed of two cardioid mics at a 90‑degree mutual angle is the aural equivalent of a fish‑eye lens, producing a ‘stereo recording angle’ (the angle around the front of the mic array, across which sounds appear fully left or right when auditioned on speakers) of about 196 degrees. This means the array has to be positioned very close to the orchestra or choir in order for the recording to fill the reproduced sound stage. Coincident X‑Y cardioids typically need to be placed more or less directly above the conductor’s head, inevitably generating a very close, relatively dry sound, and giving an exaggerated imbalance between the orchestral players near the centre and edges of the ensemble, and between those at the front and back. Not surprisingly, then, X‑Y cardioids don’t deliver a naturally pleasing and balanced sound. On the upside, the stereo imaging is well focused and precise, as with all coincident arrays, and mono compatibility is rock solid.
A stereo mic array doesn’t just capture the wanted musicians, though. It also captures the reverberant sound of the room. In the case of X‑Y cardioids, the recording venue’s reverberation energy tends to build up around the middle of the stereo image, rather than being spread evenly across the full width. Hold that thought, as I’ll return to it later.
In the late ’50s and early ’60s, European broadcasters sought alternative stereo arrays that would offer a more ‘natural’ sound character. Many commercial record companies favoured widely spaced omnidirectional mics, but these had relatively vague stereo imaging and, more importantly, often had poor mono compatibility, which was a big problem for the broadcasters. So broadcasters experimented with ‘near‑spaced’ cardioid arrays, assuming that the angle between cardioid mics would still generate the inter‑channel amplitude differences of a coincident array, while the mic spacing would generate some inter‑channel timing/phase differences, like a spaced omni array.
Several variations emerged: the Dutch national broadcaster, Nederlandse Omroep Stichting, came up with the NOS array, the Germans developed the DIN array, and the Office de Radiodiffusion Télévision Française at Radio France developed the ORTF array. These configurations are detailed in the table, with values for their SRAs and angular distortions; data for a typical spaced omni array is also included for comparison. (Of course, there are countless other possible combinations of capsule spacing, mutual angle and SRA.) With SRAs of between roughly 80 and 100 degrees, these three named near‑spaced arrays all require a more distant placement than the coincident X‑Y array, giving a more reverberant sound character. Technically, the NOS arrangement has slightly more accurate stereo imaging, but with a 30cm capsule spacing, it’s also the most impractical to rig. The DIN and ORTF formats use smaller spacings (20 and 17 cm, respectively), and the former’s 90‑degree mutual angle is rather easier to judge than the 110 degrees needed for ORTF.
A mutual angle of 120 degrees delivers an almost perfectly flat and uniform distribution of the room’s reverberation energy across the full stereo image.
So what of the Gerzon array? With a capsule spacing of just 5cm, it’s more ‘quasi‑coincident’ than ‘near‑spaced’, and this results in significantly better mono compatibility than the DIN, NOS or ORTF arrangements. Also, at 120 degrees, the Gerzon array’s mutual angle is rather greater than any of the arrays discussed above — and with cardioid mics, it turns out that a mutual angle of around 120 degrees delivers an almost perfectly flat and uniform distribution of the room’s reverberation energy across the full stereo image. While this is rarely mentioned, it is important, because the best stereo image stability and sense of front‑back dimensions behind the loudspeakers occurs when the reverberation distribution curve is as flat as possible. The only stereo mic array that achieves a flatter (and wider) reverberation distribution is the classic Blumlein array (figure‑8 capsules at 90 degrees), which is widely admired for its natural stereo imaging.
Another strength of the Gerzon array is that, for any given frontal source location, the small capsule spacing produces virtually the same phase/amplitude differences at the ears of a loudspeaker listener (at least up to around 2kHz) as would be obtained when listening to the sound source live. In this respect, the Gerzon array can claim to be more accurate than coincident X‑Y cardioids.
For a given stereo width of source ensemble, the Gerzon array’s SRA of around 130 degrees means it should be placed further away than a coincident X‑Y array, giving a slightly more reverberant perspective and more natural side‑to‑side and front‑to‑back balances. Even so, it would still be closer than any of the conventional near‑spaced arrays, which give even more reverberant perspectives.
My only word of caution is that the Gerzon array’s wide mutual angle means that central sound sources arrive well off‑axis (65 degrees) to both mics, so good‑quality cardioids, with a very consistent polar pattern across the full bandwidth, out to at least 90 degrees, are required. Most decent small‑diaphragm mics should be fine, but be wary of large‑diaphragm models, and if the sound character of central sources seems tonally ‘coloured’, try different mics!
While this Gerzon array has some very intriguing and important technical strengths, it has less perceived spaciousness than the ORTF array (inevitably, given the small capsule spacing) but some simple processing delivers an even better sense of spaciousness, without any sonic degradation. This is achieved using another Blumlein invention called Shuffling, which widens the stereo image below about 600Hz, in effect by converting small inter‑channel phase differences at the mics into inter‑channel amplitude differences for the speakers. Although similar processing could be performed on an ORTF recording, that array’s wider capsule spacing can introduce audible comb‑filtering effects, as explained in the 'Stereo Widening' box.
The Shuffling is achieved by boosting the Sides signal only by about 8dB below 600Hz (the shelf corner frequency and amount of boost can be experimented with for best results). It is trivially simple to achieve in a DAW using either a couple of M‑S conversion plug‑ins with a shelf filter whose channels can be controlled independently, or a stereo EQ with an M‑S mode.
Interestingly, the shelf filter required for this bass boost typically introduces about 25 degrees of phase lag at 600Hz, where our sense of hearing is quite sensitive to phase shifts — but at 600Hz the small capsule spacing of the Gerzon array just so happens to introduce a similar amount of phase lead in the Sides signal (relative to the Mid), almost perfectly counteracting that lag. It’s a very happy twist of fate which results in both improved stereo imaging and reduced ‘phasiness’.
So, the next time you find yourself needing to record in stereo with a pair of cardioids, why not try the Gerzon array? It’s an easy array to rig: simply overlap the mic capsules by just 5cm with a mutual angle of 120 degrees. You may find you prefer the sound of this array as it is, since it is not as intrinsically as ‘phasey’ as ORTF, and mono compatibility is far better. But, should you hanker for a more ORTF‑like sense of spaciousness, try a little Shuffling on the recorded file in your DAW afterwards, as described above.
From a practical perspective, I find the more compact form of the Gerzon array useful when recording public concerts, and the ability to dial in greater spaciousness in post‑production without degrading the array’s other properties is extremely useful. But the biggest attraction for me is the more focused and stable stereo imaging, a less phasey character, and a far, far better portrayal of front‑back depth and recreation of the sense of space of the recording room.
Stereo widening works in the Mid‑Sides domain by increasing the amplitude of the Sides signal relative to the Mid. With Left‑Right stereo source signals that differ only in their inter‑channel amplitudes, it works very nicely. But with a spaced array, the distance between the two mics inherently generates inter‑channel phase shifts. Converting the L‑R signals into the M‑S involves addition and subtraction, so the M‑S signals inevitably suffer strong comb‑filtering effects.
For example, imagine a sound source at the extreme left of an ORTF array. At low frequencies, the phase difference between the two mics will be very small, but that phase difference builds with rising frequency. At 1kHz, sound has a wavelength of about 34cm, so given the ORTF’s capsule spacing of 17cm, the sound arriving at the right mic will be half a wavelength later than that at the left mic, and thus in opposite polarity. The same condition occurs at 3kHz, 5kHz, 7kHz and so on, of course. Conversely, at 2, 4, 6 kHz and so on the waves will be in phase, so if the amplitude in both channels was the same, summing them to generate the Mid would result in deep cancellation notches at 1, 3 and 5 kHz, with peaks at 2, 4 and 6 kHz. In the Sides channel, the nulls and peaks would swap over. If the level of the Sides signal were then raised to increase the stereo width, some narrow frequency bands would be moved outwards, but others would not, so the stereo imaging would actually be degraded, rather than enhanced.
In mitigation, the fact that the ORTF array uses angled cardioid polar patterns means that for a source at the extreme left, the signal level in the right channel would be around 20dB lower than that in the left channel. That significant difference in amplitude substantially reduces the depths of the cancellation nulls and peaks in the combined signals, and so reduces the scale of the stereo image degradation. Nevertheless, broadband stereo widening with spaced‑mic arrays often does more harm than good.
Blumlein’s Shuffling technique, in which the stereo widening is only applied at low frequencies, largely avoids the problem, as it restricts the processing to frequencies where the inter‑channel phase shifts are much less than 180 degrees. In the case of the ORTF array, experimentation suggests it best to restrict any spatial equalisation to below 320Hz.
The relationship between an ensemble of sound sources ranged in front of a stereo mic array, and their perceived positions when heard over stereo loudspeakers, is not a fixed or linear one. All microphone arrays introduce some degree of angular distortion, where the angles between the actual sources and the mics are effectively misreported in the virtual sound images reproduced by the loudspeakers. It’s broadly equivalent to the effect of watching an old 4:3 television programme stretched out to fill a modern 16:9 TV screen.
However, in the case of stereo mic arrays, angular distortion usually causes sounds near the centre to be reproduced wider than they should be, and sounds nearer the edges tend to bunch together. Different combinations of capsule spacings, mutual angles, and polar patterns produce different amounts of angular distortion. The precise amount can be calculated for a given array, but it typically varies from 2 to 7 degrees, with most being around 4‑6 degrees. More information on this, and many other technical aspects of stereo mic arrays, can be found in Michael Williams’ book, Microphone Arrays for Stereo and Multichannel Sound Recording.