Temporal Bilinear Encoding Network of Audio-visual Features at Low Sampling Rates Topics: Categorization and Scene Understanding; Deep Learning for Visual Understanding ; Event and Human Activity Recognition; Features Extraction In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP, 637-644, 2021