Abstract:
In this work, the linear prediction (LP) residual, also known as excitation component of speech is processed for detecting replay attacks. The LP residual is derived from...Show MoreMetadata
Abstract:
In this work, the linear prediction (LP) residual, also known as excitation component of speech is processed for detecting replay attacks. The LP residual is derived from speech using LP analysis method with proper LP order. It represents excitation source information, in implicit form. Also, the features derived from LP residual using signal processing algorithms are referred as excitation source features. The two source features, namely, residual mel-frequency cepstral coefficients (RMFCC) and residual constant-Q-cepstral coefficients (RCQCC) has been derived and used for replay attack detection task. The Gaussian mixture model (GMM) is used as back-end classifier. The experimental study is conducted using ASVspoof 2017 Version 2.0 database. The RMFCC-GMM and RCQCC-GMM systems provides 20.89% and 18.51% EERs. The score level fusion of both systems result a notable 11.72% EER, indicating significant complementary information content in both features useful for replay speech detection task. This infers that combining source features obtained from LP residual with suitable signal processing methods may become better alternatives over existing solutions under replay attack detection context. Further, score level fusion of RMFCC, RCQCC and CQCC features provides 9.18% EER, the best reported performance in this work.
Published in: 2023 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)
Date of Conference: 28-30 November 2023
Date Added to IEEE Xplore: 14 December 2023
ISBN Information: