MSR-VTT: A Large Video Description Dataset for Bridging Video and Language | IEEE Conference Publication | IEEE Xplore