Conferences >2013 IEEE Symposium on Comput...

Unsupervised multimodal VAD using sequential hierarchy

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In speech processing systems, the performance of the Voice Activity Detector (VAD) is a bottleneck to the whole system. Traditional VADs are solely based on acoustic feat...Show More

Metadata

Abstract:

In speech processing systems, the performance of the Voice Activity Detector (VAD) is a bottleneck to the whole system. Traditional VADs are solely based on acoustic features. Additional modality in form of visual information is used to make robust VADs. In this paper, we propose a multimodal VAD based on decision fusion between two modalities. Visual VAD (VVAD) decision vectors are interpolated so that logical operators can be applied to both modalities. In order to avoid this interpolation, we suggest a sequential arrangement of both subsystems to achieve a multimodal VAD. The proposed method considerably reduces false alarm rates when compared with performance of standalone audio VAD (AVAD).

Published in: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Date of Conference: 16-19 April 2013

Date Added to IEEE Xplore: 16 September 2013

Electronic ISBN:978-1-4673-5895-8

DOI: 10.1109/CIDM.2013.6597233

Conference Location: Singapore