An Experimental Analysis on Mapping Strategies for Cepstral Coefficients Multi-projection in Voice Spoofing Detection Problem

Contreras, Rodrigo Colnago; Viana, Monique Simplicio; Guido, Rodrigo Capobianco

doi:10.1007/978-3-031-42508-0_27

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14126))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

262 Accesses

Abstract

With the cheapening of certain electronic devices, the use of biometric user authentication systems is becoming increasingly common. In particular, the use of microphones for user recognition is a reality for many people in the most common tasks, such as accessing their own cell phone or even their own bank account at an ATM. However, recent works demonstrate that the practice of frauds such as the improper presentation of a recording of an authentic user to the recognition system can pose a threat to the security of these systems. As a countermeasure to such practices, liveness detection techniques have emerged with the intention of ensuring that the signal presented to the system comes from the human vocal tract or from a recording. Techniques based on the analysis of spectrograms, which consist of a matrix representation of the relationship between the frequency of the signal and its temporal duration, are widely used in this topic. However, little study has been carried out on the use of mapping operators on such representations in order to make feature vectors belonging to \(\mathbb {R}^n\) and, consequently, facilitate its definition and use in most binary classifiers. In this work, we perform an experimental analysis on different mapping operators applied in the construction of handcrafted features on spectrograms calculated by various techniques. In addition, we also analyze the effect of fusing and projecting such features. Finally, we were able to observe that, with the use of the proposed material, which consists of a tool of simple definition and low complexity, we obtained competitive results to those presented by the baselines of the theme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abdul, Z.K., Al-Talabani, A.K.: Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10, 122136–122158 (2022). https://doi.org/10.1109/ACCESS.2022.3223444
Article Google Scholar
Alim, S.A., Rashid, N.K.A.: Some Commonly Used Speech Feature Extraction Algorithms. IntechOpen, London (2018)
Google Scholar
Assefi, M., Liu, G., Wittie, M.P., Izurieta, C.: An experimental evaluation of apple Siri and Google speech recognition. In: Proceedings of the 2015 ISCA SEDE 118 (2015)
Google Scholar
Chandra, E., Sunitha, C.: A review on speech and speaker authentication system using voice signal feature selection and extraction. In: 2009 IEEE International Advance Computing Conference, pp. 1341–1346. IEEE (2009)
Google Scholar
Contreras, R.C., Nonato, L.G., Boaventura, M., Boaventura, I.A.G., Coelho, B.G., Viana, M.S.: A new multi-filter framework with statistical dense SIFT descriptor for spoofing detection in fingerprint authentication systems. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2021. LNCS (LNAI), vol. 12855, pp. 442–455. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87897-9_39
Chapter Google Scholar
Contreras, R.C., et al.: A new multi-filter framework for texture image representation improvement using set of pattern descriptors to fingerprint liveness detection. IEEE Access 10, 117681–117706 (2022). https://doi.org/10.1109/ACCESS.2022.3218335
Article Google Scholar
De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., Saratxaga, I.: Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans. Audio Speech Lang. Process. 20(8), 2280–2290 (2012)
Article Google Scholar
Delgado, H., et al.: ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018-The Speaker and Language Recognition Workshop (2018)
Google Scholar
Ergünay, S.K., Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–6. IEEE (2015)
Google Scholar
Folorunso, C., Asaolu, O., Popoola, O.: A review of voice-base person identification: state-of-the-art. Covenant J. Eng. Technol. (2019)
Google Scholar
Font, R., Espín, J.M., Cano, M.J.: Experimental analysis of features for replay attack detection-results on the ASVspoof 2017 challenge. In: Interspeech, pp. 7–11 (2017)
Google Scholar
Gao, W., Su, C.: Analysis on block chain financial transaction under artificial neural network of deep learning. J. Comput. Appl. Math. 380, 112991 (2020)
Article MathSciNet MATH Google Scholar
Guido, R.C.: A tutorial on signal energy and its applications. Neurocomputing 179, 264–282 (2016)
Article Google Scholar
Guido, R.C.: ZCR-aided neurocomputing: a study with applications. Knowl.-Based Syst. 105, 248–269 (2016)
Article MathSciNet Google Scholar
Guido, R.C.: Enhancing teager energy operator based on a novel and appealing concept: signal mass. J. Franklin Inst. 356(4), 1341–1354 (2018)
MathSciNet Google Scholar
Guido, R.C.: A tutorial-review on entropy-based handcrafted feature extraction for information fusion. Inf. Fusion 41, 161–175 (2018)
Article Google Scholar
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Leino, T., Laukkanen, A.M.: I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Interspeech, pp. 930–934 (2013)
Google Scholar
Herrera, A., Del Rio, F.: Frequency bark cepstral coefficients extraction for speech analysis by synthesis. J. Acoust. Soc. Am. 128(4), 2290–2290 (2010)
Article Google Scholar
Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recogn. 38(12), 2270–2285 (2005)
Article Google Scholar
Kepuska, V., Bohouta, G.: Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: 2018 IEEE 8th Annual Computing and communication Workshop and Conference (CCWC), pp. 99–103. IEEE (2018)
Google Scholar
Kersta, L.G.: Voiceprint identification. J. Acoust. Soc. Am. 34(5), 725–725 (1962)
Article Google Scholar
Khoury, E., El Shafey, L., Marcel, S.: SPEAR: an open source toolbox for speaker recognition based on bob. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1655–1659. IEEE (2014)
Google Scholar
Kumar, C., Ur Rehman, F., Kumar, S., Mehmood, A., Shabir, G.: Analysis of MFCC and BFCC in a speaker identification system. In: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp. 1–5. IEEE (2018)
Google Scholar
Li, B., et al.: Acoustic modeling for google home. In: Interspeech, pp. 399–403 (2017)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)
Article Google Scholar
Lopatovska, I., et al.: Talk to me: exploring user interactions with the Amazon Alexa. J. Librariansh. Inf. Sci. 51(4), 984–997 (2019)
Article Google Scholar
Memon, Q., AlKassim, Z., AlHassan, E., Omer, M., Alsiddig, M.: Audio-visual biometric authentication for secured access into personal devices. In: Proceedings of the 6th International Conference on Bioinformatics and Biomedical Science, pp. 85–89 (2017)
Google Scholar
Mohammad, S.M., Surya, L.: Security automation in information technology. Int. J. Creat. Res. Thoughts (IJCRT) 6 (2018)
Google Scholar
Prabakaran, D., Shyamala, R.: A review on performance of voice feature extraction techniques. In: 2019 3rd International Conference on Computing and Communications Technologies (ICCCT), pp. 221–231. IEEE (2019)
Google Scholar
Rao, K.S., Reddy, V.R., Maity, S.: Language Identification Using Spectral and Prosodic Features. SECE, Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17163-0
Book Google Scholar
Rui, Z., Yan, Z.: A survey on biometric authentication: toward secure and privacy-preserving identification. IEEE access 7, 5994–6009 (2018)
Article Google Scholar
Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection (2015)
Google Scholar
Sanchez, J., Saratxaga, I., Hernaez, I., Navas, E., Erro, D., Raitio, T.: Toward a universal synthetic speech spoofing detection using phase information. IEEE Trans. Inf. Forensics Secur. 10(4), 810–820 (2015)
Article Google Scholar
Senk, C., Dotzler, F.: Biometric authentication as a service for enterprise identity management deployment: a data protection perspective. In: 2011 Sixth International Conference on Availability, Reliability and Security, pp. 43–50. IEEE (2011)
Google Scholar
Tait, B.L.: Applied phon curve algorithm for improved voice recognition and authentication. In: Georgiadis, C.K., Jahankhani, H., Pimenidis, E., Bashroush, R., Al-Nemrat, A. (eds.) e-Democracy/ICGS3 -2011. LNICST, vol. 99, pp. 23–30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33448-1_4
Chapter Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Article Google Scholar
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Valero, X., Alias, F.: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. IEEE Trans. Multimedia 14(6), 1684–1689 (2012)
Article Google Scholar
Wang, X., et al.: ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput. Speech Lang. 64, 101114 (2020)
Article Google Scholar
Wang, X., Yan, Z., Zhang, R., Zhang, P.: Attacks and defenses in user authentication systems: a survey. J. Netw. Comput. Appl. 188, 103080 (2021)
Article Google Scholar
Wang, Z.F., Wei, G., He, Q.H.: Channel pattern noise based playback attack detection algorithm for speaker recognition. In: 2011 International Conference on Machine Learning and Cybernetics. vol. 4, pp. 1708–1713. IEEE (2011)
Google Scholar
Wu, Z., Chng, E.S., Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Yan, C., Ji, X., Wang, K., Jiang, Q., Jin, Z., Xu, W.: A survey on voice assistant security: attacks and countermeasures. ACM Comput. Surv. 55(4), 1–36 (2022)
Article Google Scholar
Zouhir, Y., Ouni, K.: Feature extraction method for improving speech recognition in noisy environments. J. Comput. Sci. 12(2), 56–61 (2016)
Article Google Scholar

Download references

Acknowledgments

This study was financed in part by the São Paulo Research Foundation (FAPESP), processes #22/05186-4 (RCC) and #21/12407-4 (RCG).

Author information

Authors and Affiliations

Institute of Biosciences, Letters and Exact Sciences, São Paulo State University, São José do Rio Preto, SP, 15054-000, Brazil
Rodrigo Colnago Contreras & Rodrigo Capobianco Guido
Department of Computing, Federal University of São Carlos, São Carlos, SP, 13565-905, Brazil
Monique Simplicio Viana

Authors

Rodrigo Colnago Contreras
View author publications
You can also search for this author in PubMed Google Scholar
Monique Simplicio Viana
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Capobianco Guido
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Colnago Contreras .

Editor information

Editors and Affiliations

Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Krakow, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Contreras, R.C., Viana, M.S., Guido, R.C. (2023). An Experimental Analysis on Mapping Strategies for Cepstral Coefficients Multi-projection in Voice Spoofing Detection Problem. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-42508-0_27
Published: 14 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42507-3
Online ISBN: 978-3-031-42508-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Experimental Analysis on Mapping Strategies for Cepstral Coefficients Multi-projection in Voice Spoofing Detection Problem