Skip to main content
Log in

Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, an identification of a speaker for multimedia application under non-electronically disguised voice is performed. In non-electronically disguised voice under physical variation of speech, it is a difficult task to identify the speaker in speech signal processing application area. Due to changes in the frequency spectrum of the speech signal during non- electronic disguising, some methods like Mel-frequency cepstrum coefficients (MFCC), delta Mel-frequency cepstrum coefficients (ΔMFCC) and double delta Mel-frequency cepstrum coefficients (ΔΔMFCC) are used to specify the frequencies spectral property. A new algorithm developed, based on acoustic feature extraction by MFCC technique of text-dependent speech signal of all speaker’s and changed their speech by six physical variation methods. The acoustic features which include the correlation coefficients and the mean value are extracted by the MFCC, ΔMFCC and ΔΔMFCC feature extraction method. Thereafter, different classifiers based on feature extraction are used to classify the non-electronically disguised voice and normal voice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahmad KS, Thosar AS, Nirmal JH, Pande VS (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition, 1–6.

  2. Ajmera PK, Dattatray VJ, Ragunath SH (2011) Text-independent speaker identification using radon and discrete cosine transforms based features from speech spectrogram. J Pattern Recogn Elsevier 44(10–11):2749–2759

    Article  Google Scholar 

  3. Alam MJ, Tomi K, Patrick K, Pierre O, Douglas O (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. J Speech Commun Elsevier 55(2):237–251

    Article  Google Scholar 

  4. Alma Deed N, Amar A, Abbes A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET J Mag 4(1):18–28

    Google Scholar 

  5. Audacity: free audio editor and recorder (n.d.) [online] in http://audacity.sourceforge.net

  6. Cooke M, Ellis DP (2001) The auditory organization of speech and other sources in listeners and computational models. Speech Comm 35(3):141–177

    Article  Google Scholar 

  7. Crochiere RE, Rabiner LR (1981) Interpolation and decimation of digital signals- A tutorial review. Proc IEEE 69(3):300–331

    Article  Google Scholar 

  8. Daqrouq K, Tarek AT (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. J Appl Soft Comput Elsevier 27:231–239

    Article  Google Scholar 

  9. Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  10. Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111

    Article  Google Scholar 

  11. Hanilci C, Tomi K, Figen E, Rahim S, Jouni P, Paavo A (2012) Regularized all-pole models for speaker verification under noisy environments. IEEE Signal Process Lett 19(3):163–166

    Article  Google Scholar 

  12. Haojun W, Yong W. Jiwu H (2013) Blind detection of electronic disguised voice. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3013–3017

  13. Jingxu C, Hongchen Y, Zhanjiang S (2004) The speaker automatic identified system and its forensic application. Proceedings of International Symposium Computing Information, 1:96–100

  14. Kajarekar SS, Bratt H, Shriberg E, de Leon R (2006) A study of intentional voice modifications for evading automatic speaker recognition. Proceedings of IEEE International Workshop Speaker Language Recognition, June 2006, pp 1–6

  15. Kirchhübel C, Howard DM (2013) Detecting suspicious behaviour using speech, acoustic correlates of deceptive speech – an exploratory investigation. Appl Ergon 44(5):694–702

    Article  Google Scholar 

  16. Koenig BE (2012) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2086–2090

    Google Scholar 

  17. Kunzel HJ (2016) Identifying Dr. Schneider’s voice: an adventure in forensic speaker identification. Forensic Linguist 3(1):146–154

    Google Scholar 

  18. Künzel HJ, Gonzalez-Rodriguez J, Ortega-García J (2004) Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proceedings of IEEE International Workshop Speaker Language Recognition, June 2004, pp 1–4

  19. Leemann A, Kolly MJ (2015) Speaker-invariant supra segmental temporal features in normal and disguised speech. Speech Comm 75:97–110

    Article  Google Scholar 

  20. Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095

    Article  Google Scholar 

  21. Padilla M T, Quatieri T F, Reynolds D A (2006) Missing feature theory with soft spectral subtraction for speaker verification. In Ninth International Conference on Spoken Language Processing, 913–916

  22. Ranjan R, Dubey RK (2016) Isolated word recognition using HMM for Maithili dialect. In: IEEE, International conference on signal processing and communication, pp 322–328

  23. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41

    Article  Google Scholar 

  24. Rodman R (1998) Speaker recognition of disguised voices: a program for research. In: Proceedings of consortium speech Technol. Conjunct. Conf. Speaker Recognition. Man Mach, Direct. Forensic, pp 9–22, Appl., 1998

  25. Sahoo TR, Sabyasachi P (2014) Silence removal and endpoint detection of speech signal for text-independent speaker identification. Int J Image Graph Signal Process 6:27–35. https://doi.org/10.5815/ijigsp.2014.06.04

    Article  Google Scholar 

  26. Saloni R, Sharma K, Gupta AK (2016) Estimation and statistical analysis of physical task stress on human speech signal. Int J Image Graph Signal Process (IJIGSP) 8(10):29–34. https://doi.org/10.5815/ijigsp.2016.10.04

    Article  Google Scholar 

  27. Seresht HR, Ahadi SM, Seyedin S (2017) Spectro-temporal power spectrum features for noise robust ASR. Circuits Systems Signal Process 36(8):3222–3242

    Article  Google Scholar 

  28. Shantha R, Kumari S, Selva NS, Anand G (2012) Fused mel-feature sets based text-independent speaker identification using GMM. Int Conf Commun Technol Syst Des J Procedia Eng Elsevier 30:319–326

    Google Scholar 

  29. Singh MK, Singh AK, Singh N (2018) Disguised voice with fast and slow speech and its acoustic analysis. Int J Pure Appl Math 118(14):241–246

    Google Scholar 

  30. Singh MK, Singh AK, Singh N (2018) Acoustic comparison of electronics disguised voice using different semitones. Int J Eng Technol (UAE) 7(2):98. https://doi.org/10.14419/ijet.v7i2.16.11502

    Article  Google Scholar 

  31. Singh MK, Singh AK, Singh N Multimedia analysis for disguised voice and classification efficiency. Multimedia Tools Appl Springer J. https://doi.org/10.1007/s11042-018-6718-6

  32. Soong FK, Rosenberg AE, Juang B-H, Rabiner LR. Report: a vector quantization approach to speaker recognition. AT Tech J 1987;66(2):14–26.

  33. Waller SS, Eriksson M (2016) Vocal age disguise: the role of fundamental frequency and speech rate and its perceived effects. Front Psychol 93(7):213–220

    Google Scholar 

  34. Wu H, Wang Y, Huang J (2013) Blind detection of electronically disguised voice. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3016–3017 May 2013

  35. Wu H, Wang Y, Huang J (March 2014) Identification of electronic disguised voices. IEEE Trans Inf Forensic Secur 9(3):489–500

    Article  Google Scholar 

  36. Zhang C, Tan T (2008) Voice disguise and automatic speaker recognition. Elsevier Sci Direct. Forensic Sci Int 175(2–3):118–122

    Article  Google Scholar 

  37. Zhu X, Beauregard G, Wyse L (2007) Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Trans Audio Speech Lang Process 15(5):1645–1653

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahesh K. Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, M.K., Singh, A.K. & Singh, N. Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement. Multimed Tools Appl 79, 35537–35552 (2020). https://doi.org/10.1007/s11042-019-08329-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08329-y

Keywords

Navigation