Loading [a11y]/accessibility-menu.js
Are you Really Alone? Detecting the use of Speech Separation Techniques on Audio Recordings | IEEE Conference Publication | IEEE Xplore

Are you Really Alone? Detecting the use of Speech Separation Techniques on Audio Recordings


Abstract:

The pervasive influence of digital media has brought about new challenges in verifying the authenticity and integrity of audio recordings. The ease of editing and alterin...Show More

Abstract:

The pervasive influence of digital media has brought about new challenges in verifying the authenticity and integrity of audio recordings. The ease of editing and altering audio has raised concerns regarding the potential malicious use of speech separation techniques, where multiple speakers' voices can be extracted from a mixed recording. In light of these emerging threats, the need for robust forensic detectors that can identify the presence of speech separation forgeries becomes increasingly crucial. In this paper, we propose a novel forensic detector designed to discern between original single-speaker speech recordings and those obtained using speech separation techniques applied to audio recordings containing multiple speakers. Leveraging the power of Convolutional Neural Networks (CNNs), we explore the efficacy of different Short-Time Fourier Transform (STFT) representations in tackling the task. While many conventional approaches in the literature employ the audio spectrogram (i.e., the STFT magnitude) as input for CNNs, our study explores the use of the STFT real and imaginary parts, as well as the STFT magnitude and phase. In doing so, we ensure the preservation of all essential information embedded within the speech signal. Results show that the proposed signal representation improves over the sole use of the spectrogram. Moreover, the proposed approach is able to generalize to datasets and speech separation techniques never seen in training. Finally, our proposed detector shows promising results on preliminary experiments performed on synthetically generated audio tracks.
Date of Conference: 04-07 December 2023
Date Added to IEEE Xplore: 01 January 2024
ISBN Information:

ISSN Information:

Conference Location: Nürnberg, Germany

Contact IEEE to Subscribe

References

References is not available for this document.