Cross-utterance context for multimodal video transcription | IEEE Conference Publication | IEEE Xplore