Conferences >2018 IEEE International Confe...

Seeing Through Noise: Visually Driven Speaker Separation And Enhancement

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-v...Show More

Metadata

Abstract:

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speech neural network-based model. Then the speech predictions are applied as a filter on the noisy input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model. We evaluate our method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our method attains significant SDR and PESQ improvements over the raw video-to-speech predictions, and a well-known audio-only method.

Published in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 15-20 April 2018

Date Added to IEEE Xplore: 13 September 2018

ISBN Information:

Electronic ISSN: 2379-190X

DOI: 10.1109/ICASSP.2018.8462527

Conference Location: Calgary, AB, Canada

Contents

References is not available for this document.

Seeing Through Noise: Visually Driven Speaker Separation And Enhancement

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Seeing Through Noise: Visually Driven Speaker Separation And Enhancement

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?