Leveraging Efficient Training and Feature Fusion in Transformers for Multimodal Classification | IEEE Conference Publication | IEEE Xplore