Abstract:
In speech processing systems, the performance of the Voice Activity Detector (VAD) is a bottleneck to the whole system. Traditional VADs are solely based on acoustic feat...Show MoreMetadata
Abstract:
In speech processing systems, the performance of the Voice Activity Detector (VAD) is a bottleneck to the whole system. Traditional VADs are solely based on acoustic features. Additional modality in form of visual information is used to make robust VADs. In this paper, we propose a multimodal VAD based on decision fusion between two modalities. Visual VAD (VVAD) decision vectors are interpolated so that logical operators can be applied to both modalities. In order to avoid this interpolation, we suggest a sequential arrangement of both subsystems to achieve a multimodal VAD. The proposed method considerably reduces false alarm rates when compared with performance of standalone audio VAD (AVAD).
Date of Conference: 16-19 April 2013
Date Added to IEEE Xplore: 16 September 2013
Electronic ISBN:978-1-4673-5895-8