Movie Caption Generation with Vision Transformer and Transformer-based Language Model | IEEE Conference Publication | IEEE Xplore