Combining Evidence from Temporal and Spectral Features for Person Recognition Using Humming

Patil, Hemant A.; Madhavi, Maulik C.; Jain, Rahul; Jain, Alok K.

doi:10.1007/978-3-642-27387-2_40

Hemant A. Patil¹⁹,
Maulik C. Madhavi¹⁹,
Rahul Jain²⁰ &
…
Alok K. Jain²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7143))

Included in the following conference series:

Indo-Japanese Conference on Perception and Machine Intelligence

1375 Accesses
2 Citations

Abstract

In this paper, hum of a person is used to identify a speaker with the help of machine. In addition, novel temporal features (such as zero-crossing rate & short-time energy) and spectral features (such as spectral centroid & spectral flux) are proposed for person recognition task. Feature-level fusion of each of these features with state-of-the art spectral feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is found to give better recognition performance than MFCC alone. In addition, it is shown that the person identification rate is competitive over baseline MFCC. Furthermore, the reduction in equal error rate (EER) by 1.46 % is obtained when a feature-level fusion system is employed by combining evidences from MFCC, temporal and proposed spectral features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Article 01 January 2024

Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition

Robust Speaker Recognition Based on Low-Level- and Prosodic-Level-Features

References

Amino, K., Arai, T.: Perceptual Speaker Identification Using Monosyllabic Stimuli-Effects of the Nucleus Vowels and Speaker Characteristics Contained in Nasals. In: INTERSPEECH 2008, Brisbane, Australia, pp. 1917–1920 (2008)
Google Scholar
Patil, H.A., Jain, R., Jain, P.: Identification of Speakers from their Hum. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 461–468. Springer, Heidelberg (2008)
Chapter Google Scholar
Jin, M., Kim, J., Yoo, C.D.: Humming-based Human Verification and Identification. In: Proc. Int. Conf. on Acoustic, Speech and Signal Processing, ICASSP 2009, Taipei, Taiwan, pp. 1453–1456 (2009)
Google Scholar
Patil, H.A., Parhi, K.K.: Novel Variable Length Teager Energy based Features for Person Recognition from Their Hum. In: Proc. Int. Conf. on Acoustic, Speech and Signal Processing, ICASSP 2010, Dallas, Texas, USA, pp. 4526–4529 (2010)
Google Scholar
Huang, R., Hansen, J.H.L.: Advances in Unsupervised Audio Classification and Segmentation for the Broadcast News and NGSW Corpora. IEEE Transactions on Audio, Speech, and Language Processing 14(3), 907–919 (2006)
Article Google Scholar
Kedem, B.: Spectral Analysis and Discrimination by Zero-Crossings. Proc. IEEE 74(11), 1477–1493 (1986)
Article Google Scholar
Schubert, E., Wolfe, J., Tarnopolsky, A.: Spectral Centroid and Timbre in Complex, Multiple Instrumental Textures. In: Proceedings of the 8th International Conference on Music Perception & Cognition, Evanston, IL, pp. 654–657 (2004)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison on Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE, Transactions on Acoustics, Speech, And Signal Processing ASSP-28(4), 357–366 (1980)
Article Google Scholar
Campbell, W.M., Assaleh, K.T., Broun, C.C.: Speaker Recognition with Polynomial Classifiers. IEEE Transactions on Speech and Audio Processing 10(4), 205–212 (2002)
Article Google Scholar
Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: Proc. EUROSPEECH 1997, Rhodes, Greece, vol. 4, pp. 1895–1898 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
Hemant A. Patil & Maulik C. Madhavi
Hindustan Institute of Technology and Management, Keetham, Agra, Uttar Pradesh, India
Rahul Jain
Nikhil Institute of Engineering and Management, Mathura, Uttar Pradesh, India
Alok K. Jain

Authors

Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Maulik C. Madhavi
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Jain
View author publications
You can also search for this author in PubMed Google Scholar
Alok K. Jain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Statistical Institute (ISI), Machine Intelligence Unit, Kolkata, India
Malay K. Kundu & Sushmita Mitra &
Centre for Development of Advanced Computing (C-DAC), Kolkata, India
Debasis Mazumdar
Indian Statistical Instituten (ISI), Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, H.A., Madhavi, M.C., Jain, R., Jain, A.K. (2012). Combining Evidence from Temporal and Spectral Features for Person Recognition Using Humming. In: Kundu, M.K., Mitra, S., Mazumdar, D., Pal, S.K. (eds) Perception and Machine Intelligence. PerMIn 2012. Lecture Notes in Computer Science, vol 7143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27387-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-27387-2_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27386-5
Online ISBN: 978-3-642-27387-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics