Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement

Singh, Mahesh K.; Singh, A. K.; Singh, Narendra

doi:10.1007/s11042-019-08329-y

Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement

Published: 14 December 2019

Volume 79, pages 35537–35552, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

289 Accesses
31 Citations
Explore all metrics

Abstract

In this paper, an identification of a speaker for multimedia application under non-electronically disguised voice is performed. In non-electronically disguised voice under physical variation of speech, it is a difficult task to identify the speaker in speech signal processing application area. Due to changes in the frequency spectrum of the speech signal during non- electronic disguising, some methods like Mel-frequency cepstrum coefficients (MFCC), delta Mel-frequency cepstrum coefficients (ΔMFCC) and double delta Mel-frequency cepstrum coefficients (ΔΔMFCC) are used to specify the frequencies spectral property. A new algorithm developed, based on acoustic feature extraction by MFCC technique of text-dependent speech signal of all speaker’s and changed their speech by six physical variation methods. The acoustic features which include the correlation coefficients and the mean value are extracted by the MFCC, ΔMFCC and ΔΔMFCC feature extraction method. Thereafter, different classifiers based on feature extraction are used to classify the non-electronically disguised voice and normal voice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

References

Ahmad KS, Thosar AS, Nirmal JH, Pande VS (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition, 1–6.
Ajmera PK, Dattatray VJ, Ragunath SH (2011) Text-independent speaker identification using radon and discrete cosine transforms based features from speech spectrogram. J Pattern Recogn Elsevier 44(10–11):2749–2759
Article Google Scholar
Alam MJ, Tomi K, Patrick K, Pierre O, Douglas O (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. J Speech Commun Elsevier 55(2):237–251
Article Google Scholar
Alma Deed N, Amar A, Abbes A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET J Mag 4(1):18–28
Google Scholar
Audacity: free audio editor and recorder (n.d.) [online] in http://audacity.sourceforge.net
Cooke M, Ellis DP (2001) The auditory organization of speech and other sources in listeners and computational models. Speech Comm 35(3):141–177
Article Google Scholar
Crochiere RE, Rabiner LR (1981) Interpolation and decimation of digital signals- A tutorial review. Proc IEEE 69(3):300–331
Article Google Scholar
Daqrouq K, Tarek AT (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. J Appl Soft Comput Elsevier 27:231–239
Article Google Scholar
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111
Article Google Scholar
Hanilci C, Tomi K, Figen E, Rahim S, Jouni P, Paavo A (2012) Regularized all-pole models for speaker verification under noisy environments. IEEE Signal Process Lett 19(3):163–166
Article Google Scholar
Haojun W, Yong W. Jiwu H (2013) Blind detection of electronic disguised voice. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3013–3017
Jingxu C, Hongchen Y, Zhanjiang S (2004) The speaker automatic identified system and its forensic application. Proceedings of International Symposium Computing Information, 1:96–100
Kajarekar SS, Bratt H, Shriberg E, de Leon R (2006) A study of intentional voice modifications for evading automatic speaker recognition. Proceedings of IEEE International Workshop Speaker Language Recognition, June 2006, pp 1–6
Kirchhübel C, Howard DM (2013) Detecting suspicious behaviour using speech, acoustic correlates of deceptive speech – an exploratory investigation. Appl Ergon 44(5):694–702
Article Google Scholar
Koenig BE (2012) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2086–2090
Google Scholar
Kunzel HJ (2016) Identifying Dr. Schneider’s voice: an adventure in forensic speaker identification. Forensic Linguist 3(1):146–154
Google Scholar
Künzel HJ, Gonzalez-Rodriguez J, Ortega-García J (2004) Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proceedings of IEEE International Workshop Speaker Language Recognition, June 2004, pp 1–4
Leemann A, Kolly MJ (2015) Speaker-invariant supra segmental temporal features in normal and disguised speech. Speech Comm 75:97–110
Article Google Scholar
Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095
Article Google Scholar
Padilla M T, Quatieri T F, Reynolds D A (2006) Missing feature theory with soft spectral subtraction for speaker verification. In Ninth International Conference on Spoken Language Processing, 913–916
Ranjan R, Dubey RK (2016) Isolated word recognition using HMM for Maithili dialect. In: IEEE, International conference on signal processing and communication, pp 322–328
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41
Article Google Scholar
Rodman R (1998) Speaker recognition of disguised voices: a program for research. In: Proceedings of consortium speech Technol. Conjunct. Conf. Speaker Recognition. Man Mach, Direct. Forensic, pp 9–22, Appl., 1998
Sahoo TR, Sabyasachi P (2014) Silence removal and endpoint detection of speech signal for text-independent speaker identification. Int J Image Graph Signal Process 6:27–35. https://doi.org/10.5815/ijigsp.2014.06.04
Article Google Scholar
Saloni R, Sharma K, Gupta AK (2016) Estimation and statistical analysis of physical task stress on human speech signal. Int J Image Graph Signal Process (IJIGSP) 8(10):29–34. https://doi.org/10.5815/ijigsp.2016.10.04
Article Google Scholar
Seresht HR, Ahadi SM, Seyedin S (2017) Spectro-temporal power spectrum features for noise robust ASR. Circuits Systems Signal Process 36(8):3222–3242
Article Google Scholar
Shantha R, Kumari S, Selva NS, Anand G (2012) Fused mel-feature sets based text-independent speaker identification using GMM. Int Conf Commun Technol Syst Des J Procedia Eng Elsevier 30:319–326
Google Scholar
Singh MK, Singh AK, Singh N (2018) Disguised voice with fast and slow speech and its acoustic analysis. Int J Pure Appl Math 118(14):241–246
Google Scholar
Singh MK, Singh AK, Singh N (2018) Acoustic comparison of electronics disguised voice using different semitones. Int J Eng Technol (UAE) 7(2):98. https://doi.org/10.14419/ijet.v7i2.16.11502
Article Google Scholar
Singh MK, Singh AK, Singh N Multimedia analysis for disguised voice and classification efficiency. Multimedia Tools Appl Springer J. https://doi.org/10.1007/s11042-018-6718-6
Soong FK, Rosenberg AE, Juang B-H, Rabiner LR. Report: a vector quantization approach to speaker recognition. AT Tech J 1987;66(2):14–26.
Waller SS, Eriksson M (2016) Vocal age disguise: the role of fundamental frequency and speech rate and its perceived effects. Front Psychol 93(7):213–220
Google Scholar
Wu H, Wang Y, Huang J (2013) Blind detection of electronically disguised voice. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3016–3017 May 2013
Wu H, Wang Y, Huang J (March 2014) Identification of electronic disguised voices. IEEE Trans Inf Forensic Secur 9(3):489–500
Article Google Scholar
Zhang C, Tan T (2008) Voice disguise and automatic speaker recognition. Elsevier Sci Direct. Forensic Sci Int 175(2–3):118–122
Article Google Scholar
Zhu X, Beauregard G, Wyse L (2007) Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Trans Audio Speech Lang Process 15(5):1645–1653
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, JUET, Guna, MP, India
Mahesh K. Singh & Narendra Singh
Department of ECE, Thapar Institute of Engineering & Technology, Patiala, Punjab, India
A. K. Singh

Authors

Mahesh K. Singh
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Singh
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahesh K. Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, M.K., Singh, A.K. & Singh, N. Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement. Multimed Tools Appl 79, 35537–35552 (2020). https://doi.org/10.1007/s11042-019-08329-y

Download citation

Received: 11 January 2019
Revised: 11 July 2019
Accepted: 01 October 2019
Published: 14 December 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-019-08329-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation