A Multimodal Interpretable Visual Question Answering Model Introducing Image Caption Processor | IEEE Conference Publication | IEEE Xplore