skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Speaker-targeted Synthetic Speech Detection

Technical Report ·
DOI:https://doi.org/10.2172/1844063· OSTI ID:1844063

Text-to-speech technologies are evolving quickly towards realistic-sounding human-like voices. As this technology improves, so does the opportunity for malpractice in speaker identification (SID) via spoofing, the process of impersonating a voice biometric via synthesis. More data typically equates to a more realistic voice model, which poses an issue for well-known subjects, such as politicians and celebrities, who have vast amounts of multimedia available online. Detection of synthetic speech has relied on signal processing techniques that focus on the generation of new acoustic features and train deep learning models to detect when an audio file has been manipulated through the characterization of unnatural changes or artifacts. However, these techniques do not use any information from the speaker they are evaluating. This paper proposes to incorporate information from the speaker-of-interest (SoI) into the models to avoid specific spoofing attacks for certain vulnerable people. The wealth of data for well-known people can also be used to train a speaker-specific spoofing detector with a higher level of accuracy than a speaker-independent model. The paper proposes a new xResNet-PLDA system and compares it to three different baseline systems: a state-of-the-art speaker identification system, an xResNet system trained to discriminate between bona fide and fake speech, and a speaker identification system in which the PLDA and calibration models were trained with bona fide and fake speech. We evaluated the systems in two different scenarios — a cross-validation scenario and a hold-out scenario — with three different databases. We show how the proposed system outperforms dramatically the baseline systems in each scenario and for each database. Finally, we show how using a small amount of the SoI’s speech to adapt global calibration parameters improves the performance of the system, especially in unseen conditions.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
NA0003525
OSTI ID:
1844063
Report Number(s):
SAND2022-1418R; 703333; TRN: US2302820
Country of Publication:
United States
Language:
English