You are here

Q. How do audio and video stay in sync?

Video and audio sync.

I'm having some problems understanding exactly how audio-to-video synchronisation works. I know that the clock generates a pulse — it gives off a voltage at a certain frequency. But how does this keep the sound and the pictures in time?

SOS Forum post

Technical Editor Hugh Robjohns replies: The pulses generated by the clock generator are just that — pulses — but at a very precisely controlled rate. These are intended only to control the rate at which pictures or audio samples are taken, or the speed at which an analogue tape is dragged past the recording heads.

The missing link that provides the positional information is the timecode generator, and all this does is count the pulses from some arbitrarily agreed starting point, and keep track of that count as timecode within the recording medium somewhere.

When replaying, the replay clock generator provides more pulses to determine the replay speed of the medium, and the timecode numbers identify which bit of sound is supposed to align to which pictures.

Let's consider the example of pictures on a video tape, and sound on a separate digital recorder of some sort, imagining that you want to replay both together in perfect sync.

The 'rules' for this kind of material are that there must be 25 picture frames every second (sticking with European standards here for simplicity — the US ones are different) and 1920 audio samples every second (for the standard 48kHz sample rate), and that the first of each batch of samples must align precisely with the start of each picture frame. This last point is to facilitate editing, so that when cutting on a picture boundary you don't end up cutting halfway through an audio sample!

To maintain these rules when recording, both the video camera and the digital sound recorder have to be running at the same very precisely controlled rate, and this is provided by something called a 'sync pulse generator' or SPG. It provides a continuous series of very precise pulses at the rate(s) required by the equipment to ensure that they capture 25 picture frames and sample 1920 audio samples every second.

In a studio setting, there will be one SPG which originates all the required timing pulses which are then distributed as required. On location, this is a little impractical as it is often desirable to have camera and sound moving independently of each other, so sync cables connecting the two back to a central SPG is not a great idea. In this case, the usual solution is to equip the camera and the sound recorder with their own internal SPG systems, and to synchronise these to each other at regular intervals. Crystal oscillators are extremely stable these days and once matched to each other, they will drift relative to each other very slowly indeed. Provided they are re-sync'ed every couple of hours (usually whenever changing batteries or tapes, in practice) they will remain sufficiently close to each other's timing to be fixed together.

So that takes care of the rate at which pictures and sound are captured. The next step is to make sure that if separated, the pictures can subsequently be linked to the correct sound. This is achieved by using timecode. Timecode is simply a series of numbers (generally in the time format of hours, minutes, seconds and frames, but it could just as easily be a continuous stream of consecutive numbers. The timecode simply counts the SPG pulses in order to allocate each picture frame (or each block of 1920 audio samples) with a unique number from which it can later be identified.

When the camera and sound SPGs are synchronised to run at the same rate, their associated timecode generators are also synchronised so that they start counting picture frames (or blocks of 1920 samples) from the same point. After that, the camera and sound recorder can do their own thing, safe in the knowledge that the rate at which the picture frames are being shot, and sound samples are being captured are identical, and that each is being identified with the same timecode number sequence.

It is worth noting that the pulses and counting have to continue whether the recorders are actually recording anything or not. In practice, and purely for convenience the timecode numbers are usually aligned to the actual time of day (TOD) when working in this mode, and hence this is often called 'TOD working'. Obviously, this practice will result in discontinous timecode sequencing on the recorded tape as the recordings are started and stopped throughout the shooting day. Most editing controllers or workstations don't like that much, so it is usual to record for at least 10 seconds before the wanted material to make editing easier (although it is not disastrous if this can't be achieved).

In post-production, when it comes to putting the pictures and sound together, the rate at which the pictures are reproduced will be controlled by the pulses from the studio's SPG. The same SPG will also provide suitable pulses to control the rate at which the digital sound system produces audio samples — so we know that pictures and sound are being replayed at exactly the same rate that they were filmed at.

Now we just need to make sure they both start at the appropriate place, and this is where the timecode comes in. A suitable start point will be identified on the video source and the corresponding timecode noted. The sound system will then search for the same timecode number within its recordings and position itself accordingly. When the picture and sound replay is started, the speed of the two sources is controlled by the SPG, and the system will 'nudge' one of them (usually the sound, but not always) backwards or forwards slightly until the timecode information aligns precisely. Once aligned, the only thing controlling the speed is the pulses from the SPG.

In the case of video equipment, the required pulses from the SPG are provided in the form of a composite combination of pulses called 'video B&B' (Black and Burst) or 'Colour Black'. This signal contains the vertical sync pulses that define the start and end of each picture frame, the horizontal sync pulses to define the start and end of each picture line, and the colour subcarrier which is used to make sure the colour information is coded correctly. In the case of digital audio equipment, the required pulses from the SPG are usually simple pulses at the sample rate — a signal called 'word clock'. Some systems will also use the AES or even S/PDIF composite signal, which embeds the word-clock information with the audio data.

Some video SPGs can also generate digital word clocks directly. In most cases, though, a B&B signal from the sync pulse generator has to be used as a reference to a digital clock generator, which then generates suitable digital word clocks for the digital equipment, locked to the video sync pulse generator.

Published October 2003