Multi-Modal Learning with Text Merging for TEXTVQA | IEEE Conference Publication | IEEE Xplore