Multimedia analysis for disguised voice and classification efficiency

Singh, Mahesh K.; Singh, A. K.; Singh, Narendra

doi:10.1007/s11042-018-6718-6

Multimedia analysis for disguised voice and classification efficiency

Published: 01 October 2018

Volume 78, pages 29395–29411, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

347 Accesses
30 Citations
Explore all metrics

Abstract

For multimedia analysis of electronic disguised method is a speech editing process in which the characteristics of voice have been changed. Frequency spectrum characteristics of the speech signal during electronic disguised have also changed. In this paper proposed a method for deriving an algorithm for extracted the efficiency of disguised voice from its normal voice. By using practical approaches for disguising the voice by a different semitone. Mel-frequency cepstral coefficients (MFCC), delta Mel-frequency cepstral coefficients (ΔMFCC), double delta Mel-frequency cepstral coefficients (ΔΔMFCC) based feature extraction techniques compute the acoustic feature and its statistical moments mean and correlation coefficient. Acoustic feature and its statistical moments passed through the different types of the algorithm-based classifier. By using different classifier find the efficiency of disguised voice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Audacity: free audio editor and recorder [online]" in http://audacity.sourceforge.net
Crochiere RE, Rabiner LR (1981) Interpolation and decimation of digital signals—a tutorial review. Proc IE E 69(3):300–331
Article Google Scholar
Entezari-Maleki R, Rezaei A, Minaei-Bidgoli B (2009) Comparison of classification methods based on the type of attributes and sample size. J Converg Inform Technol 4(3):09–17
Google Scholar
Gonzalez-Rodriguez J, Ramos-Castro D, Garcia-Gomar M, Ortega-Garcia J (2004) On robust estimation of likelihood ratios: the ATVs-UPM system at 2003 NFI/TNO forensic evaluation. In proc. IEEE Int Workshop Speaker Language Recognit: 1–8
Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111
Article Google Scholar
Haojun Wu, Yong Wang & Jiwu Huang (2013). Blind detection of electronic disguised voice. IEEE international conference on acoustics, speech and signal processing (ICASSP), 3016–3017
Jingxu C, Hongchen Y, Zhanjiang S (2004) The speaker automatic identified system and its forensic application. Proc Int Symp Comput Inf (1) 96–100
Kajarekar SS, Ferrer L, Shriberg E, Sonmez K, Stolcke A, Venkataraman A (2005) SRI’s 2004 NIST speaker recognition evaluation system. Proc IEEE ICASSP (1): 173–176
Kajarekar SS, Bratt H, Shriberg E, de Leon R (2006). A study of intentional voice modifications for evading automatic speaker recognition. Proc IEEE Int Workshop Speaker Lang Recognit: 1–6
Kiang MY (2003) A comparative assessment of classification methods. Decision Supp Syst Elsevier 35:441–454
Article Google Scholar
Künzel HJ, Gonzalez-Rodriguez J, Ortega-García J (2004).Effect of voice disguise on the performance of a forensic automatic speaker recognition system. Proc IEEE Int Workshop Speaker Lang Recognit: 1–4
Liao X, Qin Z, Ding L (2017) Data embedding in digital images using critical functions. Signal Process Image Commun. https://doi.org/10.1016/j.image.2017.07.006
Rajeev Ranjan, Rajesh K. Dubey (2016) Isolated word recognition using HMM for Maithili dialect,” IEEE. Int Conf Signal Process Commun: 322–328
R. Rodman (1998) Speaker recognition of disguised voices: a program for research. Proc Consortium Speech Technol Conjunct Conf Speaker Recognition Man Mach Direct Forensic: 9–22
Seresht HR, Ahadi SM, Seyedin S (2017) Spectro-temporal power spectrum features for noise robust ASR. Circ Syst Sign Process 36(8):3222–3242
Article Google Scholar
Tan T (2010) The effect of voice disguise on automatic speaker recognition. IEEE Int CISP (8): 3538–3541
Wu H, Wang Y, Huang J (2014) Identification of electronic disguised voices. IEEE Trans Inform Foren Sec 9(3):489–500
Article Google Scholar
Zhang C, Tan T (2008) Voice disguise and automatic speaker recognition. Elsevier: Sci Direct: Foren Sci Int 175(2–3):118–122
Article Google Scholar
Zhang C, Tan T (2008) Voice disguise and automatic speaker recognition. Forensic Sci Int 175(2):118–122
Article Google Scholar
Zhu X, Beauregard G, Wyse L (2007) Real-time signal estimation from modified short-time Fourier transforms magnitude spectra. IEEE Trans Audio Speech Lang Process 15(5):1645–1653
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, JUET, Guna, MP, India
Mahesh K. Singh & Narendra Singh
Department of ECE, Thapar University, Patiala, Punjab, India
A. K. Singh

Authors

Mahesh K. Singh
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Singh
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahesh K. Singh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, M.K., Singh, A.K. & Singh, N. Multimedia analysis for disguised voice and classification efficiency. Multimed Tools Appl 78, 29395–29411 (2019). https://doi.org/10.1007/s11042-018-6718-6

Download citation

Received: 04 June 2018
Revised: 25 August 2018
Accepted: 21 September 2018
Published: 01 October 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-6718-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia analysis for disguised voice and classification efficiency

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia analysis for disguised voice and classification efficiency

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation