Abstract:
Recent advances in speech synthesis and counterfeit audio generation have pushed the multimedia forensics community to develop speech deepfake detection techniques to avo...Show MoreMetadata
Abstract:
Recent advances in speech synthesis and counterfeit audio generation have pushed the multimedia forensics community to develop speech deepfake detection techniques to avoid threats and unpleasant situations. Although synthetic speech detectors show excellent performance in controlled conditions, they are not always reliable in open set cases, when evaluated on data that are very different from those seen during training. This can lead to misleading scores and poorly indicative results in real-world scenarios. In this paper, we propose a method for estimating the reliability of a prediction performed by a speech deepfake detector. This enables us to perform the detection only on the most relevant portions of a signal, i.e., the time windows on which we obtain more reliable scores. This increases the final accuracy of the developed systems. As some audio fragments may not contain enough traces for the task at hand and negatively affect the system output, a reliability estimator allows us to discard them and focus only on the most pertinent data. The proposed method proves to positively impact the performance of the considered detector and shows excellent generalization capabilities on unseen datasets.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: