Global-Shared Text Representation Based Multi-Stage Fusion Transformer Network for Multi-Modal Dense Video Captioning | IEEE Journals & Magazine | IEEE Xplore