Audio/Video Out-of-Sync in PC Video Capture
Inventa Australia Pty Ltd
Have you ever heard someone talking to you before you saw his mouth opening? Weird? Contrary to common understanding that light travels faster than sound? Sadly, this weird audio/video out-of-sync phenomenon is widely present in video captured and played back on Personal Computers.
Although Audio-Video out-of-sync, also known as A/V out-of-sync, A/V sync, lip-sync, etc, has strong personal subjective bias, i.e., different people might feel differently towards a video clip’s A/V out-of-sync problem, when the problem is too obvious, say audio comes before the video by 1 second, or lags behind the video by 3 seconds, almost everyone will be disturbed. People in the professional field claim they can detect sub-video-frame(1 video frame is approx. 40 milliseconds long) level A/V sync problem(as low as 10ms a/v out of sync, or ¼ of a video frame, as some people claim), while ordinary home users normally will certainly detect half second or more discrepancies. A study done in the 1940s by Bell Laboratories in the U.S. concluded that when audio led video by more than 35 milliseconds, or lagged video by more than 100 milliseconds, a/v out-of-sync will be detected. In a more recent document based on experiments carried out in countries including Australia, The 1998 International Telecommunications Union Radiocommunications Sector(ITU-R) recommendation ITU-R BT.1359-1 (1998), average A/V out-of-sync detectability threshhold for ordinary people was set at about 45 milliseconds for audio leading video and 125 ms for audio lagging video, and A/V out-of-sync acceptability thresholds --- meaning passing that people will reject the video, are set at 90 ms for audio leading video and 185 ms for audio lagging video.
Why, you might ask, is the tolerance tighter for audio coming before the video than lagging behind the video? This is because in our daily lives, human hearing system is more tolerant to audio lagging video than preceding video, due to the fact that light travels much faster than sound(30,000km per second for light, a few hundreds to a few thousands metres per second for sound depending on transmitting environment). We are used to hearing sound after seeing the actual view if the video/sound source is away from us, such as watching sports events in an open-air stadium. But we are easier to feel uncomfortable with the reversed situation where sound comes before the video that generates the sound.
A/V out of sync in PCs happens in many different stages of video processing, but the first stage of bringing video signal into PC --- the Video Capture Stage --- is where the A/V out of sync will most likely happen, and it does happen most of the times.
To be fair, A/V out-of-sync problem is not limited to videos on PCs. In theory and in practice, all digital video equipment have A/V sync problem.To the dismay of digital video enthusiasts, it is a fact that Audio/Video out of sync problem is unique to, or at least much more serious in, digitised video than in analogue video. VHS/Hi8 or other analogue video tapes have their audio signal recording physically attached together with their video recording media at a fixed position, therefore sychronised video/audio inputs always results in sychronised video/audio recording. When anaologue video tapes are being played back, as long as the recording materials are not physically deteriorated too much, analogue video and audio are always played back in sync. In digital video recording and play back, things can be very different. Synchronised video/audio input can be easily digitised as A/V out-of-sync, and synchronised digital video/audio recording can be played back out-of-sync.
The first generally accepted reason for A/V out-of-sync is that digital processing of video signal is always more resource intensive than digitally processing audio signal, therefore video processing always needs more time than the corresponding audio processing. This is why perfectly sync’ed video/audio input to a video digitisation instrument including PCs normally needs special audio delay mechanism at the output end to allow video output to “catch up” with audio output -------this might sound easy, engineering implementation(if ever implemented) actually faces numurous difficulties, resulting in various A/V sync problems when video/audio are played back from digital video devices such as PCs, typically with audio heard before video being viewed, but audio might also lag behind video, or lead and lag video alternatively.
Most digital video processing devices have some kind of A/V out of sync problem. These include professional broadcasting devices as well as prosumer and amateurer devices. As for generic PCs, they are doomed to be causing A/V out-of-sync problems at every stage of their video capture, compression, editing, and outputting processes, despite those much hyped commercial claims to the opposite.
In the commercial broadcasting field, special operations have been taken to measure and correct the A/V out-of-sync problem. Apart from implementing audio delays to allow video processing to catch up, one complicated method is to insert audio synchronisation signal into video streams in a visually invisible manner, so that proir to the final video/audio play back point, these embedded audio sync. marks can be checked against the corresponding audio signal arriving at that point, to find out possible out-of-sync then delay/advance audio or video accordingly. These said, audio/video out-of-sync is rarely completely eliminated in broadcasting industry, although they can be effectively minimised. To this end, the ITU had another very logical recommendation called ITU-R BT.1377, which suggests that video and audio equipment be labeled to indicate processing delay or delay range ---- You rarely see PCs and their A/V peripherals being labeled this way, do you?
The second generally accepted reason for A/V out-of-sync on digital video is wrongly produced sampling clock. This could come in many different flavours. One typical situation in PC video editing is that the sampling clock used by the PC differs from that used by the original video source. For example, while PC video capture device manufactuers take exact 48KHz audio sampling clock as standard for DV video sampling clock, some Canon Mini DV cameras(e.g. XL1) were actually made to use 48.009KHz audio sampling frequency. Without special adjustment, video capture cards capturing continuous one hour DV video from this kind of DV cameras will have done less 32400 audio samples(9 Sample X 3600 seconds) than the DV camera would have done, resulting in almost one second(approx. 32KHz) of video/audio sampling difference during one hour’s time. To overcome this discrepency between the sampling clocks of the DV camera and PC video capture cards, some PC video capture device manufacturers adjust their sampling clock accordingly. For example, Apple’s Final Cut Pro software has special treatment for Canon XL1 camera if you tell it so!
To achieve perfect audio/video sync, video and audio sampling clocks at PC’s video capture hardware ideally should be locked together: every fixed length of video samples(frames) should have fixed number of audio samples. For PC video capture devices that do not have its own audio capture hardware, like many low-end analogue video capture cards, this is extremely difficult because they will need the PC’s sound card to capture and sample the incoming audio, while the PC’s sound card can have different sampling clock than the incoming video’s, resulting in inevitable audio/video out-of-sync for video capture process where audio is captured separately (using PC sound card instead using video capture hardware itself) from video.
However, even using hardware where audio capture is built in, A/V sync problem is still present. In the old and venerable Miro DC30+ age, where analogue video and audio were captured together hardware wise without relying on PC’s sound card, audio slippage can be seen from time to time when exact A/V sync is required, typically in situations such as manually ringing a huge bell, or firing a gun with smokes coming out of its barrel.
The third common cause of audio/video out of sync is using host PC based software to do realtime video / audio compression. Apart from mere digital data transmission such as Firewire-DV video capture, most video capture processes on PCs need to compress video in realtime into a suitable format for a particular application: analogue video need to be compressed in DV, MPEG, or streaming format, DV video need to be compressed into MPEG or streaming format, etc. This realtime compressing video task can be accomplished either using a hardware circuit(chipsets and on-board firmware) on a dedicated video capture board, or a software running on the host PC. To achieve a decent quality video and sychronised audio, such as 25Mbps “DV” format video, or 4Mbps~10Mbps DVD-compliant MPEG2 video, current host PCs, with all their mighty powers and glorious features in hardware and software, cannot handle realtime video compressing task using host-PC based software. Designs of products using low-cost video grabbing hardware and host-PC based video compressing software all failed miserably, producing horribly low quality video and absolutely out-of-sync’ed audio. Typical examples include using realtime MPEG encoding software to create DVD-compliant video files through Firewire ports connected to DV cameras, and using realtime “DV” encoding software to create DV-compliant .AVI files for video editing, through low-cost TV-tuner cards or AGP graphics cards connected to analogue VCR/cameras. Exception might exist, through extremely smart enginnering effort, but in general host-PC software-based realtime video encoders causing audio/video out of sync is inevitable. The reasons why generic PCs cannot handle realtime video compression properly are mainly due to their hardware and software design architecture, detailed discussion on this will go beyond this article’s scope.
Every application field using video capture on PCs has A/V out of sync problem: video capture for video editing, video disk creation(DVD/VCD), TV watching, video streaming, surveillance, etc. It is obvious that the more processing required during video capture process, the more likely A/V sync problem could happen. Currently the most processing extensive video capture process is realtime MPEG video capture, this is where the most serious and most widely-spread audio/video out of sync happens.
As discussed ealier, realtime MPEG video capture devices using host-CPU based software encoding always create A/V out of sync on the captured video files. Will hardware encoding chipset based realtime MPEG capture devices fare better? Not necessarily. In fact, most hardware encoding chipset based realtime MPEG capture devices still create A/V out of sync in their captured video files, in particular those low-end and middle range products. A widely held but incorrect assumption is once a capture card uses a particular “good” hardware MPEG encoding chipset, the A/V sync problem will go away. In fact, different realtime MPEG capture devices using the same hardware encoding chipset result in different A/V out of sync symptoms. It is how the overall hardware and software is designed that decides how good or bad the A/V sync problem can be minimised or reduced, not just a simple matter of choosing a particular piece of chipset. Talking to the design engineers in the field will tell you that they all feel A/V sync is a real problem in MPEG encoding devices, that at best could be mininised through design effort but certainly is very hard to be eliminated completely.
Hoffner, Randy Hoffner
Wilt, Adam J.