Bridging Video and Text: A Two-Step Polishing Transformer for Video Captioning | IEEE Journals & Magazine | IEEE Xplore