Listen and Look: Multi-Modal Aggregation and Co-Attention Network for Video-Audio Retrieval | IEEE Conference Publication | IEEE Xplore