Attention-based Visual-Audio Fusion for Video Caption Generation | IEEE Conference Publication | IEEE Xplore