Look, listen, and decode: Multimodal speech recognition with images | IEEE Conference Publication | IEEE Xplore