Automatic caption generation for video data. Time alignment between caption and acoustic signal | IEEE Conference Publication | IEEE Xplore