Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding | IEEE Conference Publication | IEEE Xplore