Abstract
In this paper, an identification of a speaker for multimedia application under non-electronically disguised voice is performed. In non-electronically disguised voice under physical variation of speech, it is a difficult task to identify the speaker in speech signal processing application area. Due to changes in the frequency spectrum of the speech signal during non- electronic disguising, some methods like Mel-frequency cepstrum coefficients (MFCC), delta Mel-frequency cepstrum coefficients (ΔMFCC) and double delta Mel-frequency cepstrum coefficients (ΔΔMFCC) are used to specify the frequencies spectral property. A new algorithm developed, based on acoustic feature extraction by MFCC technique of text-dependent speech signal of all speaker’s and changed their speech by six physical variation methods. The acoustic features which include the correlation coefficients and the mean value are extracted by the MFCC, ΔMFCC and ΔΔMFCC feature extraction method. Thereafter, different classifiers based on feature extraction are used to classify the non-electronically disguised voice and normal voice.
Similar content being viewed by others
References
Ahmad KS, Thosar AS, Nirmal JH, Pande VS (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition, 1–6.
Ajmera PK, Dattatray VJ, Ragunath SH (2011) Text-independent speaker identification using radon and discrete cosine transforms based features from speech spectrogram. J Pattern Recogn Elsevier 44(10–11):2749–2759
Alam MJ, Tomi K, Patrick K, Pierre O, Douglas O (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. J Speech Commun Elsevier 55(2):237–251
Alma Deed N, Amar A, Abbes A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET J Mag 4(1):18–28
Audacity: free audio editor and recorder (n.d.) [online] in http://audacity.sourceforge.net
Cooke M, Ellis DP (2001) The auditory organization of speech and other sources in listeners and computational models. Speech Comm 35(3):141–177
Crochiere RE, Rabiner LR (1981) Interpolation and decimation of digital signals- A tutorial review. Proc IEEE 69(3):300–331
Daqrouq K, Tarek AT (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. J Appl Soft Comput Elsevier 27:231–239
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111
Hanilci C, Tomi K, Figen E, Rahim S, Jouni P, Paavo A (2012) Regularized all-pole models for speaker verification under noisy environments. IEEE Signal Process Lett 19(3):163–166
Haojun W, Yong W. Jiwu H (2013) Blind detection of electronic disguised voice. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3013–3017
Jingxu C, Hongchen Y, Zhanjiang S (2004) The speaker automatic identified system and its forensic application. Proceedings of International Symposium Computing Information, 1:96–100
Kajarekar SS, Bratt H, Shriberg E, de Leon R (2006) A study of intentional voice modifications for evading automatic speaker recognition. Proceedings of IEEE International Workshop Speaker Language Recognition, June 2006, pp 1–6
Kirchhübel C, Howard DM (2013) Detecting suspicious behaviour using speech, acoustic correlates of deceptive speech – an exploratory investigation. Appl Ergon 44(5):694–702
Koenig BE (2012) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2086–2090
Kunzel HJ (2016) Identifying Dr. Schneider’s voice: an adventure in forensic speaker identification. Forensic Linguist 3(1):146–154
Künzel HJ, Gonzalez-Rodriguez J, Ortega-García J (2004) Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proceedings of IEEE International Workshop Speaker Language Recognition, June 2004, pp 1–4
Leemann A, Kolly MJ (2015) Speaker-invariant supra segmental temporal features in normal and disguised speech. Speech Comm 75:97–110
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095
Padilla M T, Quatieri T F, Reynolds D A (2006) Missing feature theory with soft spectral subtraction for speaker verification. In Ninth International Conference on Spoken Language Processing, 913–916
Ranjan R, Dubey RK (2016) Isolated word recognition using HMM for Maithili dialect. In: IEEE, International conference on signal processing and communication, pp 322–328
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41
Rodman R (1998) Speaker recognition of disguised voices: a program for research. In: Proceedings of consortium speech Technol. Conjunct. Conf. Speaker Recognition. Man Mach, Direct. Forensic, pp 9–22, Appl., 1998
Sahoo TR, Sabyasachi P (2014) Silence removal and endpoint detection of speech signal for text-independent speaker identification. Int J Image Graph Signal Process 6:27–35. https://doi.org/10.5815/ijigsp.2014.06.04
Saloni R, Sharma K, Gupta AK (2016) Estimation and statistical analysis of physical task stress on human speech signal. Int J Image Graph Signal Process (IJIGSP) 8(10):29–34. https://doi.org/10.5815/ijigsp.2016.10.04
Seresht HR, Ahadi SM, Seyedin S (2017) Spectro-temporal power spectrum features for noise robust ASR. Circuits Systems Signal Process 36(8):3222–3242
Shantha R, Kumari S, Selva NS, Anand G (2012) Fused mel-feature sets based text-independent speaker identification using GMM. Int Conf Commun Technol Syst Des J Procedia Eng Elsevier 30:319–326
Singh MK, Singh AK, Singh N (2018) Disguised voice with fast and slow speech and its acoustic analysis. Int J Pure Appl Math 118(14):241–246
Singh MK, Singh AK, Singh N (2018) Acoustic comparison of electronics disguised voice using different semitones. Int J Eng Technol (UAE) 7(2):98. https://doi.org/10.14419/ijet.v7i2.16.11502
Singh MK, Singh AK, Singh N Multimedia analysis for disguised voice and classification efficiency. Multimedia Tools Appl Springer J. https://doi.org/10.1007/s11042-018-6718-6
Soong FK, Rosenberg AE, Juang B-H, Rabiner LR. Report: a vector quantization approach to speaker recognition. AT Tech J 1987;66(2):14–26.
Waller SS, Eriksson M (2016) Vocal age disguise: the role of fundamental frequency and speech rate and its perceived effects. Front Psychol 93(7):213–220
Wu H, Wang Y, Huang J (2013) Blind detection of electronically disguised voice. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3016–3017 May 2013
Wu H, Wang Y, Huang J (March 2014) Identification of electronic disguised voices. IEEE Trans Inf Forensic Secur 9(3):489–500
Zhang C, Tan T (2008) Voice disguise and automatic speaker recognition. Elsevier Sci Direct. Forensic Sci Int 175(2–3):118–122
Zhu X, Beauregard G, Wyse L (2007) Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Trans Audio Speech Lang Process 15(5):1645–1653
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, M.K., Singh, A.K. & Singh, N. Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement. Multimed Tools Appl 79, 35537–35552 (2020). https://doi.org/10.1007/s11042-019-08329-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08329-y