Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition | IEEE Conference Publication | IEEE Xplore