You might expect the job of a forensic audio analyst to be similar to that of an audio engineer, but it’s a lot more complicated than that!
Nearly all of us have mobile phones that can be used as recording devices, and this has led to a proliferation in the use of audio recordings within the courtroom as evidence. In 99.9 percent of these cases, these recordings will have been taken in less than optimal conditions, on low-quality devices, by individuals with little to no knowledge of the recording process. Whereas studio recordings are carefully planned and recorded in a purpose-designed environment, forensic recordings are often made on the spur of the moment, in hostile conditions, and the audio quality reflects that. However, just as the typical quality of these recordings is miles apart technically from studio recordings, so too is the impact that they can have on somebody’s life.
As audio recordings become increasingly important in legal cases, so too does the role of the forensic audio examiner. This is a job with several facets, which can include advising others on improving audio capture, analysing recordings and evaluating their authenticity, and applying processing in order to make them more intelligible. The final destination for these recordings is the courtroom, so the processes between the capture and playback are all geared towards ensuring the recording makes it there and reaches its potential. This means following strict guidelines to ensure the chain of custody is maintained.
The differences between forensic and studio recordings are many, and arise for a number of reasons, including the capture device, the environment and the people making the recordings.
The device involved is typically either a portable audio recorder or, most commonly, a mobile phone, used either to record events in the surrounding environment, or a telephone conversation. Audio recording is not the core function of a mobile phone, and as the design goal is usually to achieve an acceptable quality for telephone transmission, the bandwidth is often limited to 4kHz, meaning that audio recordings made on a phone are often of very low quality by other standards.
The microphones and analogue-to-digital converters in these devices are often chosen because they are the cheapest ones that are sufficient for the task it is required to do. As a result, they often introduce audible distortion, and frequently also have a high inherent noise level. This means that considered as audio recording devices, phones tend to have a very poor signal-to-noise ratio even before external environmental noise is considered. If the recording is from a telephone call, then at least the individuals talking are usually (but not always) close to the microphone, which helps to ensure that the levels are reasonable — but then other issues can arise, such as artifacts from the low bit-rate audio transmission.
Mobile phone microphones are generally monaural, and even if the phone contains two mics, they can be classed as mono for all intents and purposes, as the distance between them is negligible, meaning that any interaural level or time differences will have no real effect on the recording. This means that it is incredibly difficult to make azimuth or elevation judgements with regard to the environment in which the recording was made in an attempt to determine the location of speakers in relation to the microphone.
In terms of the environment, it is glaringly obvious that forensic recordings are not taken in conditions that would seek to mitigate the effects of the environment. They are more often than not taken in conditions which may include air-conditioning systems, other voices, vehicle noise, hum, background music, hard reverberant surfaces, the noise of cloth hitting the microphone and endless other factors. You name it, it’s in there. They say the best cure is prevention, but unless the recording was premeditated there is no prevention, and the only cure available is whatever processing the forensic audio examiner can apply.
Finally, the individuals operating the recording equipment are often untrained, with little to no knowledge of the audio capture process or understanding of what can and can’t be done to improve poor recordings after the fact.
When all of the above factors are combined, it is clear that there is a lot of potential for the creation of absolutely terrible recordings. If every factor is at its worst, and we’re dealing with a low-quality recording device operated by an individual with no knowledge of audio in a stone-walled room occupied by multiple speakers, with air-conditioning noise and music overlaid, the prognosis is not good. All that can be done is to approach each recording independently, and seek to do our best regardless of the individual, device and environment from which the recording came. If you’re a studio engineer complaining about excessive fret noise from a guitar, spare a little thought for those of us who are bombarded with terrible recording after terrible recording, just hoping for the day when the biggest problem is a 50Hz hum or some birds chirping in the background!
Despite the false expectations that are created by television shows such as CSI, forensic audio examiners are subject to the laws of physics, and this means there are limits as to what is possible. It is of the utmost importance that these limitations are understood not only by the forensic audio analyst, but also by judges, juries, clients and other lay people, reinforcing the message that much of what is presented on television shows and movies is not possible in the real world.
For example, consider two recordings of the same speech where one contains music in the background and the other the continuous rumble of an engine. The music will not only contain frequencies that occupy the same bandwidth as speech, especially if there is a vocal part, but it will also be highly variable, making it impossible for a machine-learning algorithm to follow and remove it. The limit as to how well the recording can be enhanced boils down to the nature of the problem and a lack of an algorithm sophisticated enough to remove it. In the second recording, by contrast, the rumble will most likely not occupy the same region as speech, meaning that a simple high-pass filter may be able to attenuate the engine noise while leaving the speech intact; the nature of the noise also makes it the perfect candidate for a noise-removal algorithm. As the engine sound is continuous and repetitive, machine learning can use it to derive a noise profile that can be applied to clean up the sections that contain speech.
Frequently, alas, it is not possible to extract anything of useful evidential value from a recording, most often because the wanted information simply isn’t present in the recording. Deleted files on a computer can often be recovered as they were, at some point, present on the system; but if the device used to make the recording hasn’t captured the desired element of a signal, whether because of distance from the source, bandwidth limitations, perceptual encoding or other reasons, there is nothing to recover. Likewise, if an event has been recorded but is completely inaudible due to phenomena such as masking or reverberation, the chances of recovery are very small. Although there are techniques that can de-convolve a signal into separate elements such as source and reverberation, these will not be effective if the desired signal is below the noise floor. Some recordings are just impossible to enhance, and the practiced audio examiner should recognise this before attempting an enhancement which will leave the client disappointed.
Where enhancement is possible, this will usually involve tools that will be familiar to audio engineers in other contexts, particularly noise reduction and ‘spectral repair’. Techniques for using these tools have been covered in SOS before, and are usually fully explained in the documentation for relevant products. However, it’s important to note one key difference between the way these tools are used in forensic audio and in conventional recording and mastering contexts: there is a distinction between quality and intelligibility, and the forensic examiner’s primary responsibility is to maintain intelligibility. Intelligibility refers to the number of words that can reliably be transcribed from a recording, whereas quality is a subjective measure of the pleasantness of a recording. Sometimes it is possible to improve both factors, and sometimes improving quality naturally leads to an improvement in intelligibility, but this is not always the case, and this is a crucial point to remember when performing enhancements and when providing clients with the enhanced recording. A prime example is that whereas in a mastering or restoration context it might be desirable to attenuate high frequencies to remove hiss and other noise, the forensic audio examiner will often leave them alone or even boost them, as they contain voice information critical to intelligibility.
Although it’s good practice for studio engineers to store settings for the processors they apply, especially with outboard gear, this documentation will not affect the final outcome. Failure to take note of the ratio setting on your compressor won’t stop a song from being played on the radio, and won’t affect the judgements of audiences and critics. For forensic analysts, by contrast, adequate note-taking is not only good practice, it is compulsory. The documentation is as important as, if not more important than, the actual analysis itself.
It all relates to the ‘chain of custody’, so called as it is analogous to a chain containing multiple links that have the potential to be broken. The first link of the chain is put in place when the evidence is seized, with appropriate documentation stating who, what, where, how and when. The next step is to create a working copy for the forensic examiner to use while the original version is stored, never to be touched again. This ensures that if questions are raised later down the line, especially with regards to the recording’s authenticity, it is possible to access the original evidence and provide the associated documentation to prove that no edits have been made since it was seized.
But what about enhancement? Isn’t that editing the recording? Yes and no. On one hand, we are changing the recording as it will no longer be in its original form, and similar processes could equally well be used for nefarious purposes. However, if we were aiming to manipulate evidence in this way, we would be looking to cover our traces and so certainly wouldn’t be documenting any of the work performed. In many ways, then, it’s reasonable to say that the difference between inappropriate manipulation and forensic enhancement lies in the way we document what we do. Documenting the exact changes made to the file, and taking screenshots of every setting used, are essential in keeping our section of the chain intact.
Once the work is complete, it must be provided in the form of a forensic report to the client. Writing these reports can often take longer than the work itself, due to the need for accuracy. We must never under- or overstate the facts, and we need to document every aspect of the work from the client’s instructions to the bibliography. Everything reported must have a foundation in provable facts, and anything that is opinion must be stated as such.
It is very likely that ISO 17025 will become mandatory for all digital forensic laboratories in the near future, and forensic audio documentation will need to be compliant with this standard, which shows that the laboratory is competent of performing testing and calibration. This means, among other things:
- Standardising elements of forensic reports.
- Following Standard Operating Procedures (or SOPs).
- Validation of tools to ensure they perform as they are expected to perform.
- Verification of tools to ensure that they continually perform as they are expected to perform.
- Having a Quality Assurance programme in place to ensure work is reviewed before delivery.
It is this standard which the government is relying on not only to improve the standard of forensic science, but also to make it incredibly difficult for individuals who do not have the correct education, experience and skill set to work in the field. When applied to a field such as audio forensics, an area in which there is very little education available and in which it is difficult to get experience due to the limited requirement for the service, this could have a huge and long-lasting impact. The only solution is increased education within the field, but with few individuals practising forensic audio, this may prove to be difficult. If you are interested in moving into the audio forensics field, it may be advisable to find a training programme specific to the area.
From afar, it may seem that audio forensics and audio engineering utilise similar skills, but close up, that view is revealed as being somewhat inaccurate. The tools are similar and recognisable, but in many respects you’re presented with the polar opposite of what an audio engineer encounters from day to day: audio recorded not in specialised studio rooms but in poor-quality street environments, and not by skilled engineers but by untrained individuals with an iPhone. Audio engineers can go with whatever feels good, without worrying about documenting their processes, but forensic analysts must only apply what can be referenced and proven to be an accepted technique, and have to write down everything. And with the implementation of the ISO 17025 guidelines making the transition to the audio forensics world infinitely more difficult, anyone who wants to get into the field in the future could find themselves jumping through a whole lot more hoops.
Many people are now familiar with the ‘CSI effect’ and its impact on the world of forensic audio. The hit US TV series has created unrealistic expectations for what is possible within all forensic disciplines. CSI and shows like it will perform miracles, showcase poor science and solve the crime, all within an hour.
It is the first two points that have the biggest impact on audio forensics. Actors are shown taking a recording on which there is literally no audible speech and turning it into a crystal-clear telephone-quality conversation, all at the push of a button. On too many occasions to count I have been given audio in which I can hear no speech whatsoever, but the client insists they can, and becomes frustrated when I tell them I cannot enhance what isn’t there.
I have also been given recordings in which the gain of the recording device was set very high and the targeted speaker was three rooms away, leaving a recording that is just white noise with the faintest speech. In some cases, it is possible, with a large amount of work, to get something from these recordings, but they are at the most extreme end of not only what should be attempted, but what is actually possible.
The poor science showcased on these shows creates unrealistic expectations when creating forensic reports. At the push of a button, two recordings will be compared, and a TV computer will flash “100 percent match confirmed” or something of that nature. In real life, a 100 percent match is something that is just not scientifically plausible. From a scientific standpoint, unless you are able to compare the speaker against every one of the 7.6 billion people on the planet, you cannot say with 100 percent certainty that two recordings are of the same speaker — and that’s without taking into account the error rate. Even if there were only two individuals being compared and a voice is known to be one of the two, there would still be a finite risk of error. With 7.6 billion people, it’s a lot worse! I have been asked so many times if the result of a speaker comparison will be 100 percent accurate that I am considering compiling an ‘FAQ’-type document rather than spend time repeating the same logic in every report.
Next time a drummer is moaning that they expected their kick drum to have more bottom end, spare a thought for forensic audio examiners who are expected to deliver a crystal-clear recording from a burst of white noise!
Prevention is always better than cure, and the capture of audio should be treated as the most important part of the forensic audio chain, as it has the largest impact on the final result. Every decision after the recording has been made is based on the quality of the original recording. There can be an attitude of ‘fix it in the mix’, where the person capturing the recording expects the forensic audio examiner to be able to fix any problems later on; to counteract this dogma, training should be made available to those in the field who are on the front line creating recordings. The emphasis should be on the correct settings for the recording device, optimal microphone placement and considered location choice. Some simple recommendations include:
- Use an uncompressed format such as WAV PCM.
- Use as high a sample rate and frequency as possible (minimum 44.1kHz, 16-bit).
- Perform test recordings prior to the event to optimise gain.
- Use an external microphone.
- If using two microphones or more, place the microphones at least 20cm apart.
- Direct the microphone diaphragm towards the target.
- Choose a location with low ambient noise levels.
- Choose a location with as little dynamic noise as possible.
- Avoid locations with competing talkers where possible.
If this advice is followed, it will offer more potential to the forensic audio enhancement process, which in turn will provide improved results, which will then result in more usable forensic audio evidence within courtrooms.