Analysis and detection of mimicked speech based on prosodic features

Mary, Leena; Anish Babu, K. K.; Joseph, Aju

doi:10.1007/s10772-012-9163-3

Analysis and detection of mimicked speech based on prosodic features

Published: 27 June 2012

Volume 15, pages 407–417, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Leena Mary¹,
K. K. Anish Babu² &
Aju Joseph²

401 Accesses
7 Citations
Explore all metrics

Abstract

This paper describes a work aimed towards understanding the art of mimicking by professional mimicry artists while imitating the speech characteristics of known persons, and also explores the possibility of detecting a given speech as genuine or impostor. This includes a systematic approach of collecting three categories of speech data, namely original speech of the mimicry artists, speech while mimicking chosen celebrities and original speech of the chosen celebrities, to analyze the variations in prosodic features. A method is described for the automatic extraction of relevant prosodic features in order to model speaker characteristics. Speech is automatically segmented as intonation phrases using speech/nonspeech classification. Further segmentation is done using valleys in energy contour. Intonation, duration and energy features are extracted for each of these segments. Intonation curve is approximated using Legendre polynomials. Other useful prosodic features include average jitter, average shimmer, total duration, voiced duration and change in energy. These prosodic features extracted from original speech of celebrities and mimicry artists are used for creating speaker models. Support Vector Machine (SVM) is used for creating speaker models, and detection of a given speech as genuine or impostor is attempted using a speaker verification framework of SVM models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Adami, A. G., Mihaescu, R., Reynolds, D. A., & Godfrey, J. J. (2003). Modeling prosodic dynamics for speaker recognition. In Proceeding of int. conf. acoust., speech and signal processing, Hong Kong, China (Vol. 4, pp. 788–791).
Google Scholar
Atal, B. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(3), 1687–1697.
Article Google Scholar
Blomberg, M., Elenius, D., & Zetterholm, E. (2004). Speaker verification scores and acoustic analysis of a professional impersonator. In Proceedings FONETIK 2004 the XVIIth Swedish phonetics conference (pp. 84–87).
Google Scholar
Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Article Google Scholar
Drygajlo, A. (2007). Forensic automatic speaker recognition. IEEE Signal Processing Magazine, 132–135.
Farrús, M., Wagner, M., Erro, D., & Hernando, J. (2010). Automatic speaker recognition as a measurement of voice imitation and conversion. The International Journal of Speech, Language and the Law, 17(1), 119–142.
Google Scholar
Heck, L. P. (2002). Integrating high-level information for robust speaker recognition in John Hopkins University workshop on SuperSID. Baltimore, Maryland. http://www.cslp.jhu.edu/ws2002/groups/supersid.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Lin, C., & Wang, H. (2005). Language identification using pitch contour information. In Proceedings of int. conf. acoust., speech and signal processing, Philadelphia. USA (Vol. I, pp. 601–605).
Google Scholar
Mary, L. (2006). Multilevel implicit features for language and speaker recognition. Ph.D. Thesis, Indian Institute of Technology, Madras, India.
Mary, L. (2011). Prosodic features for speaker recognition. In A. Neustein & H. A. Patil (Eds.), Forensic speaker recognition—law enforcement and counter-terrorism (pp. 365–388). Berlin: Springer.
Google Scholar
Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In Proceedings of interspeech, Pittsburgh, Pennsylvania (pp. 917–920).
Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
Article Google Scholar
Moattar, M. H., Homayounpour, M. M., & Kalantari, N. K. (2010). A new approach for robust realtime voice activity detection using spectral pattern. In Proceeding of int. conf. acoust., speech and signal processing (pp. 4478–4481).
Google Scholar
NIST (2001). Speaker recognition evaluation website. http://www.nist.gov/speech/tests/spk/2001.
Perrot, P., & Chollet, G. (2008). The question of disguised voice. In Proceedings of acoustics 08, Paris (pp. 5681–5685).
Google Scholar
Perrot, P., Aversano, G., & Chollet, G. (2007). Voice disguise and automatic detection: review and perspectives. Lecture notes in computer science.: Vol. 4391. In Progress in nonlinear speech processing (pp. 101–117). Berlin: Springer.
Chapter Google Scholar
Perrot, P., Morel, M., Razik, G., & Chollet, G. (2009). Lecture notes of the Institute for Computer Sciences: Vol. 8. Vocal forgery in forensic sciences, Social Informatics and Telecommunication Engineering (pp. 179–185). Berlin: Springer.
Google Scholar
Rabinerl, L. R., & Schafer, R. W. (2007). Introduction to digital speech processing. Foundations and Trends in Signal Processing, 1(1–2), 1–194.
Article Google Scholar
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., & Xiang, B. (2003). The superSID project: exploiting high-level information for high-accuracy speaker recognition. In Proceedings of int. conf. acoust., speech and signal processing, Hong Kong, China (Vol. 4, pp. 784–787).
Google Scholar
Rose, P. (2006). Technical speaker recognition: evaluation, types and testing of evidence. Computer Speech & Language, 20, 159–191.
Article Google Scholar
Shriberg, E., & Stolcke (2008). The case for automatic higher level features in forensic speaker recognition. In Proceedings of interspeech (pp. 1509–1512).
Google Scholar
Shriberg, E., & Stolcke (2008). The case for automatic higher level features in forensic speaker recognition. In Proceedings of interspeech (pp. 1509–1512).
Google Scholar
Zetterholm, E. (2006). Same speaker–different voices. A study of one impersonator and some of his different imitations. In Proceedings of the 11th Australian international conference on speech science and technology (pp. 70–75).
Google Scholar
Zetterholm, E., & Sullivan, K. P. H. (2002). The impact of semantic expectation on the acceptance of a voice imitation. In Proceedings of the 9th Australian conference on speech science and technology (pp. 291–296).
Google Scholar
Zetterholm, E., Blomberg, M., & Elenius, D. A. (2004). Comparison between human perception and a speaker verification system score of a voice imitation. In Proceedings of the 10th Australian international conference on speech science and technology (pp. 393–397).
Google Scholar

Download references

Acknowledgement

The authors would like to thank Kerala State Council for Science, Technology and Environment, India for giving financial support to carry out the study described in this paper.

Author information

Authors and Affiliations

Government Engg. College, Bartonhill Trivandrum, Kerala, 695 0535, India
Leena Mary
Rajiv Gandhi Institute of Technology, Kottayam, Kerala, 686 501, India
K. K. Anish Babu & Aju Joseph

Authors

Leena Mary
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Anish Babu
View author publications
You can also search for this author in PubMed Google Scholar
Aju Joseph
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leena Mary.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mary, L., Anish Babu, K.K. & Joseph, A. Analysis and detection of mimicked speech based on prosodic features. Int J Speech Technol 15, 407–417 (2012). https://doi.org/10.1007/s10772-012-9163-3

Download citation

Received: 06 April 2012
Accepted: 12 June 2012
Published: 27 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10772-012-9163-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis and detection of mimicked speech based on prosodic features

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis and detection of mimicked speech based on prosodic features

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation