Multi-modal fusion for video understanding | IEEE Conference Publication | IEEE Xplore