Skip to main content

An Experimental Analysis on Mapping Strategies for Cepstral Coefficients Multi-projection in Voice Spoofing Detection Problem

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2023)

Abstract

With the cheapening of certain electronic devices, the use of biometric user authentication systems is becoming increasingly common. In particular, the use of microphones for user recognition is a reality for many people in the most common tasks, such as accessing their own cell phone or even their own bank account at an ATM. However, recent works demonstrate that the practice of frauds such as the improper presentation of a recording of an authentic user to the recognition system can pose a threat to the security of these systems. As a countermeasure to such practices, liveness detection techniques have emerged with the intention of ensuring that the signal presented to the system comes from the human vocal tract or from a recording. Techniques based on the analysis of spectrograms, which consist of a matrix representation of the relationship between the frequency of the signal and its temporal duration, are widely used in this topic. However, little study has been carried out on the use of mapping operators on such representations in order to make feature vectors belonging to \(\mathbb {R}^n\) and, consequently, facilitate its definition and use in most binary classifiers. In this work, we perform an experimental analysis on different mapping operators applied in the construction of handcrafted features on spectrograms calculated by various techniques. In addition, we also analyze the effect of fusing and projecting such features. Finally, we were able to observe that, with the use of the proposed material, which consists of a tool of simple definition and low complexity, we obtained competitive results to those presented by the baselines of the theme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org/stable/.

  2. 2.

    https://superkogito.github.io/spafe/index.html.

  3. 3.

    https://librosa.org/.

References

  1. Abdul, Z.K., Al-Talabani, A.K.: Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10, 122136–122158 (2022). https://doi.org/10.1109/ACCESS.2022.3223444

    Article  Google Scholar 

  2. Alim, S.A., Rashid, N.K.A.: Some Commonly Used Speech Feature Extraction Algorithms. IntechOpen, London (2018)

    Google Scholar 

  3. Assefi, M., Liu, G., Wittie, M.P., Izurieta, C.: An experimental evaluation of apple Siri and Google speech recognition. In: Proceedings of the 2015 ISCA SEDE 118 (2015)

    Google Scholar 

  4. Chandra, E., Sunitha, C.: A review on speech and speaker authentication system using voice signal feature selection and extraction. In: 2009 IEEE International Advance Computing Conference, pp. 1341–1346. IEEE (2009)

    Google Scholar 

  5. Contreras, R.C., Nonato, L.G., Boaventura, M., Boaventura, I.A.G., Coelho, B.G., Viana, M.S.: A new multi-filter framework with statistical dense SIFT descriptor for spoofing detection in fingerprint authentication systems. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2021. LNCS (LNAI), vol. 12855, pp. 442–455. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87897-9_39

    Chapter  Google Scholar 

  6. Contreras, R.C., et al.: A new multi-filter framework for texture image representation improvement using set of pattern descriptors to fingerprint liveness detection. IEEE Access 10, 117681–117706 (2022). https://doi.org/10.1109/ACCESS.2022.3218335

    Article  Google Scholar 

  7. De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., Saratxaga, I.: Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans. Audio Speech Lang. Process. 20(8), 2280–2290 (2012)

    Article  Google Scholar 

  8. Delgado, H., et al.: ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018-The Speaker and Language Recognition Workshop (2018)

    Google Scholar 

  9. Ergünay, S.K., Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–6. IEEE (2015)

    Google Scholar 

  10. Folorunso, C., Asaolu, O., Popoola, O.: A review of voice-base person identification: state-of-the-art. Covenant J. Eng. Technol. (2019)

    Google Scholar 

  11. Font, R., Espín, J.M., Cano, M.J.: Experimental analysis of features for replay attack detection-results on the ASVspoof 2017 challenge. In: Interspeech, pp. 7–11 (2017)

    Google Scholar 

  12. Gao, W., Su, C.: Analysis on block chain financial transaction under artificial neural network of deep learning. J. Comput. Appl. Math. 380, 112991 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  13. Guido, R.C.: A tutorial on signal energy and its applications. Neurocomputing 179, 264–282 (2016)

    Article  Google Scholar 

  14. Guido, R.C.: ZCR-aided neurocomputing: a study with applications. Knowl.-Based Syst. 105, 248–269 (2016)

    Article  MathSciNet  Google Scholar 

  15. Guido, R.C.: Enhancing teager energy operator based on a novel and appealing concept: signal mass. J. Franklin Inst. 356(4), 1341–1354 (2018)

    MathSciNet  Google Scholar 

  16. Guido, R.C.: A tutorial-review on entropy-based handcrafted feature extraction for information fusion. Inf. Fusion 41, 161–175 (2018)

    Article  Google Scholar 

  17. Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Leino, T., Laukkanen, A.M.: I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Interspeech, pp. 930–934 (2013)

    Google Scholar 

  18. Herrera, A., Del Rio, F.: Frequency bark cepstral coefficients extraction for speech analysis by synthesis. J. Acoust. Soc. Am. 128(4), 2290–2290 (2010)

    Article  Google Scholar 

  19. Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recogn. 38(12), 2270–2285 (2005)

    Article  Google Scholar 

  20. Kepuska, V., Bohouta, G.: Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: 2018 IEEE 8th Annual Computing and communication Workshop and Conference (CCWC), pp. 99–103. IEEE (2018)

    Google Scholar 

  21. Kersta, L.G.: Voiceprint identification. J. Acoust. Soc. Am. 34(5), 725–725 (1962)

    Article  Google Scholar 

  22. Khoury, E., El Shafey, L., Marcel, S.: SPEAR: an open source toolbox for speaker recognition based on bob. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1655–1659. IEEE (2014)

    Google Scholar 

  23. Kumar, C., Ur Rehman, F., Kumar, S., Mehmood, A., Shabir, G.: Analysis of MFCC and BFCC in a speaker identification system. In: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp. 1–5. IEEE (2018)

    Google Scholar 

  24. Li, B., et al.: Acoustic modeling for google home. In: Interspeech, pp. 399–403 (2017)

    Google Scholar 

  25. Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)

    Article  Google Scholar 

  26. Lopatovska, I., et al.: Talk to me: exploring user interactions with the Amazon Alexa. J. Librariansh. Inf. Sci. 51(4), 984–997 (2019)

    Article  Google Scholar 

  27. Memon, Q., AlKassim, Z., AlHassan, E., Omer, M., Alsiddig, M.: Audio-visual biometric authentication for secured access into personal devices. In: Proceedings of the 6th International Conference on Bioinformatics and Biomedical Science, pp. 85–89 (2017)

    Google Scholar 

  28. Mohammad, S.M., Surya, L.: Security automation in information technology. Int. J. Creat. Res. Thoughts (IJCRT) 6 (2018)

    Google Scholar 

  29. Prabakaran, D., Shyamala, R.: A review on performance of voice feature extraction techniques. In: 2019 3rd International Conference on Computing and Communications Technologies (ICCCT), pp. 221–231. IEEE (2019)

    Google Scholar 

  30. Rao, K.S., Reddy, V.R., Maity, S.: Language Identification Using Spectral and Prosodic Features. SECE, Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17163-0

    Book  Google Scholar 

  31. Rui, Z., Yan, Z.: A survey on biometric authentication: toward secure and privacy-preserving identification. IEEE access 7, 5994–6009 (2018)

    Article  Google Scholar 

  32. Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection (2015)

    Google Scholar 

  33. Sanchez, J., Saratxaga, I., Hernaez, I., Navas, E., Erro, D., Raitio, T.: Toward a universal synthetic speech spoofing detection using phase information. IEEE Trans. Inf. Forensics Secur. 10(4), 810–820 (2015)

    Article  Google Scholar 

  34. Senk, C., Dotzler, F.: Biometric authentication as a service for enterprise identity management deployment: a data protection perspective. In: 2011 Sixth International Conference on Availability, Reliability and Security, pp. 43–50. IEEE (2011)

    Google Scholar 

  35. Tait, B.L.: Applied phon curve algorithm for improved voice recognition and authentication. In: Georgiadis, C.K., Jahankhani, H., Pimenidis, E., Bashroush, R., Al-Nemrat, A. (eds.) e-Democracy/ICGS3 -2011. LNICST, vol. 99, pp. 23–30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33448-1_4

    Chapter  Google Scholar 

  36. Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)

    Article  Google Scholar 

  37. Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)

  38. Valero, X., Alias, F.: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. IEEE Trans. Multimedia 14(6), 1684–1689 (2012)

    Article  Google Scholar 

  39. Wang, X., et al.: ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput. Speech Lang. 64, 101114 (2020)

    Article  Google Scholar 

  40. Wang, X., Yan, Z., Zhang, R., Zhang, P.: Attacks and defenses in user authentication systems: a survey. J. Netw. Comput. Appl. 188, 103080 (2021)

    Article  Google Scholar 

  41. Wang, Z.F., Wei, G., He, Q.H.: Channel pattern noise based playback attack detection algorithm for speaker recognition. In: 2011 International Conference on Machine Learning and Cybernetics. vol. 4, pp. 1708–1713. IEEE (2011)

    Google Scholar 

  42. Wu, Z., Chng, E.S., Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)

    Google Scholar 

  43. Yan, C., Ji, X., Wang, K., Jiang, Q., Jin, Z., Xu, W.: A survey on voice assistant security: attacks and countermeasures. ACM Comput. Surv. 55(4), 1–36 (2022)

    Article  Google Scholar 

  44. Zouhir, Y., Ouni, K.: Feature extraction method for improving speech recognition in noisy environments. J. Comput. Sci. 12(2), 56–61 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

This study was financed in part by the São Paulo Research Foundation (FAPESP), processes #22/05186-4 (RCC) and #21/12407-4 (RCG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Colnago Contreras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Contreras, R.C., Viana, M.S., Guido, R.C. (2023). An Experimental Analysis on Mapping Strategies for Cepstral Coefficients Multi-projection in Voice Spoofing Detection Problem. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42508-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42507-3

  • Online ISBN: 978-3-031-42508-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics