Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection

Li, Kai; Li, Sheng; Lu, Xugang; Akagi, Masato; Liu, Meng; Zhang, Lin; Zeng, Chang; Wang, Longbiao; Dang, Jianwu; Unoki, Masashi

doi:10.21437/Interspeech.2022-10088

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection

Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki

Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdams-coefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%.

doi: 10.21437/Interspeech.2022-10088

Cite as: Li, K., Li, S., Lu, X., Akagi, M., Liu, M., Zhang, L., Zeng, C., Wang, L., Dang, J., Unoki, M. (2022) Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. Proc. Interspeech 2022, 664-668, doi: 10.21437/Interspeech.2022-10088

@inproceedings{li22o_interspeech,
  author={Kai Li and Sheng Li and Xugang Lu and Masato Akagi and Meng Liu and Lin Zhang and Chang Zeng and Longbiao Wang and Jianwu Dang and Masashi Unoki},
  title={{Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={664--668},
  doi={10.21437/Interspeech.2022-10088}
}