A Transformer-based Multimodal Network for Audiovisual Depression Prediction | IEEE Conference Publication | IEEE Xplore