Loading [a11y]/accessibility-menu.js
BTS-E: Audio Deepfake Detection Using Breathing-Talking-Silence Encoder | IEEE Conference Publication | IEEE Xplore

BTS-E: Audio Deepfake Detection Using Breathing-Talking-Silence Encoder


Abstract:

Voice phishing (vishing) is increasingly popular due to the development of speech synthesis technology. In particular, the use of deep learning to generate an arbitrary-c...Show More

Abstract:

Voice phishing (vishing) is increasingly popular due to the development of speech synthesis technology. In particular, the use of deep learning to generate an arbitrary-content audio clip simulating the victim’s voice makes it difficult not only for humans but also for automatic speaker verification (ASV) systems to distinguish. Countermeasure (CM) systems have been developed recently to help ASV combat synthetic speech. In this work, we propose BTS-E, a framework to evaluate the correlation between Breathing, Talking (speech), and Silence sounds in an audio clip, then use this information for deepfake detection tasks. We argue that natural human sounds, such as breathing, are hard to synthesize by Text-to-speech (TTS) system. We conducted a large-scale evaluation using ASVspoof 2019 and 2021 evaluation set to validate our hypothesis. The experiment results show the applicability of the breathing sound feature in detecting deepfake voices. In general, the proposed system significantly increases the performance of the classifier by up to 46%.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Contact IEEE to Subscribe

References

References is not available for this document.