Loading [a11y]/accessibility-menu.js
Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis | IEEE Journals & Magazine | IEEE Xplore

Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis


Abstract:

Due to the difficulty of annotating large amounts of training data, directly learning the association of sound and its makers in natural videos is a challenging task for ...Show More

Abstract:

Due to the difficulty of annotating large amounts of training data, directly learning the association of sound and its makers in natural videos is a challenging task for machines. In this paper, we present a novel audiovisual model that introduces a soft-clustering module as the audio and visual content detector, and regards the pervasive property of audiovisual concurrency as the latent supervision for inferring the correlation among detected contents. Furthermore, we discover for the first time that the complexity of data has an impact on the training efficiency and subsequent performance of audiovisual model, i.e., more complex data brings more obstacles to the model training, and degrades the performance of downstream audiovisual tasks. To address the issue of audiovisual learning, we propose a novel heterogeneous audiovisual scene analysis module that trains the model from simple to complex scene. We show that such ordered learning procedure rewards the model the merits of easy training and fast convergence. Meanwhile, our audiovisual model can also provide effective unimodal representation and cross-modal alignment performance. We further deploy the well-trained model into practical audiovisual sound localization and separation tasks. We show that our localization model significantly outperforms existing methods, based on which we show comparable performance in sound separation task by comparison to several related SOTA audiovisual learning methods without referring external visualsupervision.
Published in: IEEE Transactions on Multimedia ( Volume: 25)
Page(s): 3534 - 3545
Date of Publication: 25 March 2022

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.