Convolutional and Deep Neural Networks based techniques for extracting the age-relevant features of the speaker

Kuppusamy, Karthika; Eswaran, Chandra

doi:10.1007/s12652-021-03238-1

Convolutional and Deep Neural Networks based techniques for extracting the age-relevant features of the speaker

Original Research
Published: 25 April 2021

Volume 13, pages 5655–5667, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Karthika Kuppusamy¹ &
Chandra Eswaran¹

474 Accesses
5 Citations
Explore all metrics

Abstract

With the advent of conversational voice recognition systems such as Alexa, SIRI, OK Google, etc., natural language conversational scheme including Chatbot and voice recognition systems are in new high and determining the age of a speaker is critical for setting the pertinent context. Age can be inferred from the speech signal by inferring various factors such as physical attributes of voice, linguistic attributes, frequency, speech rate, etc., This paper discusses on extracting the spectral features of speech such as Cepstral Coefficients, Spectral Decrease, Centroid, Flatness, Spectral Entropy,Jitter and Shimmer as inputs which would also helps in classifying speaker age through deep learning techniques.A novel approach is addressed along with the model for implementation using Deep Neural Network and Convolutional Neural Network for classifying the features using three different classifiers.The results obtained from the proposed system would outline the performance in speaker age recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

Article 28 November 2023

Automatic Age Estimation Based on Vocal Cues and Deep Neural Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

TIMID, Switch Board and CMU KIDS corpus.

References

Abdel-Hamid O, Mohamed A, Jiang H, Penn G (2012) Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition. IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 4277–4280. Doi: https://doi.org/10.1109/ICASSP.2012.6288864
Abdel-Hamid O, Mohamed A-R, Jiang H, Deng Li, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process. https://doi.org/10.1109/TASLP.2014.2339736
Article Google Scholar
Anzalone L, Barra P, Barra S, Narducci F, Nappi M (2019) Transfer Learning for Facial Attributes Prediction and Clustering. In: Wang G, El Saddik A, Lai X, Martinez Perez G, Choo KK (Eds.). Smart City and Informatization iSCI 2019. Communications in Computer and Information Science. 1122. Springer, Singapore. Doi: https://doi.org/10.1007/978-981-15-1301-5_9.
Bachate RP, Sharma A (2019) Automatic speech recognition systems for regional languages in India. Int J Recent Technol Eng (IJRTE) 8(2S3):585–592. https://doi.org/10.35940/ijrte.B1108.0782S319
Article Google Scholar
Barra P, Bisogni C, Nappi M, Freire-Obregón D, Castrillón Santana M (2019) Gait analysis for gender classification in forensics. Depend Sens Cloud Big Data Syst. https://doi.org/10.1007/978-981-15-1304-6_15
Article Google Scholar
Beigi H (2011) Fundamentals of speaker recognition. Springer, Berlin
Book MATH Google Scholar
Büyük O, Arslan LM (2018) Age identification from voice using feed-forward deep neural networks. In: 2018 26th Signal Processing and Communications Applications Conference (SIU) pp 1–4. https://doi.org/10.1109/SIU.2018.8404322
Campbell J (1997) Speaker recognition: a tutorial. Proceed IEEE 85(9):1437–1462. https://doi.org/10.1109/5.628714
Article Google Scholar
Devi KJ, Thongam K (2019) Automatic speaker recognition with enhanced swallow swarm optimization and ensemble classification model from speech signals. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01414-y
Article Google Scholar
Figen E (2011) Fundamentals of speaker recognition. J Eng Sci 6(2–3):185–193
Google Scholar
Ganesh A, Chandra E (2012) An overview of speech recognition and speech synthesis algorithms. Int J Comput Technol Appl 3(4):1426–1430
MathSciNet Google Scholar
Ghahremani P, Nidadavolu P, Chen N, Villalba J, Povey D, Khudanpur S, Dehak N (2018) End-to-end deep neural network age estimation. Proceed Ann Conf Int Speech Commun Assoc. https://doi.org/10.21437/Interspeech.2018-2015
Article Google Scholar
Godfrey JJ, Holliman E, McDaniel J (1992) SWITCHBOARD: telephone speech corpus for research and development. [Proceedings] ICASSP-92: 1992 IEEE int conf acoustics speech signal process. DOI: https://doi.org/10.1109/ICASSP.1992.225858.
Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN.801–804. Doi: https://doi.org/10.1145/2647868.2654984.
Huang Y, Tian K, Wu A et al (2019) Feature fusion methods research based on deep belief\networks for speech emotion recognition under noise condition. J Ambient Intell Human Comput 10:1787–1798. https://doi.org/10.1007/s12652-017-0644-8
Article Google Scholar
Karpagavalli S, Chandra E (2016) A review on automatic speech recognition architecture and approaches. Int J Signal Process Image Process Pattern Recogn 9(4):393–404. https://doi.org/10.14257/ijsip.2016.9.4.34
Article Google Scholar
Karthika K, Chandra E (2018) An advance on gender classification by information preserving features. EEET '18: Proceedings of the 2018 international conference on electronics and electrical engineering technology. pp 227–231. Doi: https://doi.org/10.1145/3277453.3277462
McLaren M, Lei Y, Scheffer N, Ferrer L (2014) Application of convolutional neural networks to speaker recognition in noisy conditions. INTERSPEECH-2014. pp:686–690
Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, Muller C, Huber R, Andrassy B, Bauer J, Littel B (2007) Comparison of four approaches to age and gender recognition for telephone applications. Acoustics speech, and signal processing, 1988. ICASSP-88.IEEE. DOI: https://doi.org/10.1109/ICASSP.2007.367263
Michael F, Barnard E, Van Heerden C, Müller C (2009) Multilingual speaker age recognition: regression analyses on the Lwazi corpus. IEEE Workshop Autom Speech Recogn Underst. https://doi.org/10.1109/ASRU.2009.5373374
Article Google Scholar
Ming Li KJ (2013) Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang Sci Dir 27(1):151–167. https://doi.org/10.1016/j.csl.2012.01.008
Article Google Scholar
Mohamed A-R, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22. https://doi.org/10.1109/TASL.2011.2109382
Article Google Scholar
Nehe NS, Holambe RS (2009) Isolated word recognition using normalized teager energy cepstral features. Int Conf Adv Comput Control Telecommun Technol. https://doi.org/10.1109/ACT.2009.36
Article Google Scholar
Ossama A-H, Mohamed A-R, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP.2012.6288864.
Passricha V, Aggarwal RK (2020) A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR. J Ambient Intell Human Comput 11:675–691. https://doi.org/10.1007/s12652-019-01325-y
Article Google Scholar
Patil BD, Manav Y, Sudheendra P (2013) Dynamic database creation for speaker recognition system. MoMM '13: Proceedings of international conference on advances in mobile computing and multimedia. pp 532–536. Doi: https://doi.org/10.1145/2536853.2536923.
Pellegrini T, Vahid H, Isabel T, Annika H, Miguel Sales D (2014) Speaker age estimation for elderly speech recognition in European Portuguese. Interspeech
Poorjam AH (2014) Speaker profiling for forensic applications. Dissertation. KU Leuven, Heverlee
Rubin PV (1998) Measuring and modeling speech production. Animal acoustic communication. Springer, Berlin
Google Scholar
Saeid Safavi MR (2018) Automatic speaker, age-group and gender identification from children’s speech. Comput Speech Lang Sci Dir 50:141–156. https://doi.org/10.1016/j.csl.2018.01.001
Article Google Scholar
Sainath TN, Mohamed A-R, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. IEEE international conference on acoustics, speech and signal processing. pp. 8614–8618. Doi: https://doi.org/10.1109/ICASSP.2013.6639347.
Salehghaffari H (2018) Speaker verification using convolutional neural networks. EURASIP J Image Video Process
Sarma M, Sarma KK, Goel NK (2020) Children's age and gender recognition from raw speech waveform using DNN. In: Advances in intelligent computing and communication. pp. 1–9. Springer, Singapore. Doi: https://doi.org/10.1007/978-981-15-2774-6.
Schotz S (2006) Perception, analysis and synthesis of speaker age. Dissertation. Lund University
Schotz S (2007) Acoustic analysis of adult speaker age. In: Müller C (Eds.). Speaker classification I. Lecture notes in computer science. Springer, Berlin. pp 88–107. Doi:https://doi.org/10.1007/978-3-540-74200-5_5.
Schuller BJ (2017) A paralinguistic approach to speaker diarisation: using age, gender, voice likability and personality traits. Proceedings of the 25th ACM international conference on multimedia. ACM. P 387–392. DOI: https://doi.org/10.1145/3123266.3123338
Shipp T, Qi Y, Huntley R, Hollien H (1992) Acoustic and temporal correlates of perceived age. J Voice Sci Dir 6(3):211–216. https://doi.org/10.1016/S0892-1997(05)80145-6
Article Google Scholar
Skoog Waller S, Eriksson M, Sörqvist P (2015) Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age. Front Psychol. https://doi.org/10.3389/fpsyg.2015.00978
Article Google Scholar
Snyder D, Garcia-Romero D, Povey D, Khudanpur S (2017) Deep Neural Network embeddings for text-independent speaker verification. Interspeech. https://doi.org/10.21437/Interspeech.2017-620
Article Google Scholar
Sujiya EC (2017) A review on speaker recognition. Int J Eng Technol 9(3):1592–1598. https://doi.org/10.21817/ijet/2017/v9i3/170903513
Article Google Scholar
Tranel D, Damasio AR, Damasio H (1988) Intact recognition of facial expression, gender, and age in patients with impaired recognition of face identity. Neurology 38(5):690–696. https://doi.org/10.1212/wnl.38.5.690
Article Google Scholar
Wang Z, Tashev I (2017) Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 5150–5154. Doi: https://doi.org/10.1109/ICASSP.2017.7953138.
Yücesoy E (2020) Speaker age and gender classification using GMM super vector and NAP channel compensation method. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02045-4
Article Google Scholar
Yue M, Chen L, Zhang J, Liu H (2014) Speaker age recognition based on isolated words by using SVM, 2014 IEEE 3rd international conference on cloud computing and intelligence systems. pp. 282–286. Doi: https://doi.org/10.1109/CCIS.2014.7175743.
Zakariya Q, Mallouh AA, Barkana BD (2017) DNN-based Models for Speaker Age and Gender Classification. Proceedings of the 10th international joint conference on biomedical engineering systems and technologies .pp 106–111. DOI: https://doi.org/10.5220/0006096401060111.
Zazo R, Sankar Nidadavolu P, Chen N, Gonzalez-Rodriguez J, Dehak N (2018) Age estimation in short speech utterances based on LSTM recurrent neural networks. IEEE pp. 22524–22530. Doi: https://doi.org/10.1109/ACCESS.2018.2816163.
Zhang Y, Weninger F, Liu B, Schmitt M, Eyben F, Schuller B (2017) A paralinguistic approach to speaker diarisation: using age, gender, voice likability and personality traits. In: Proceedings of the 25th ACM international conference on Multimedia. pp. 387–392

Download references

Acknowledgment

I am grateful to all kinds of support provided by Prof. Dr. E. Chandra Eswaran for guiding me for my research work.

Funding

The research work is supported by RUSA 2.0- BEICH.

Author information

Authors and Affiliations

Department of Computer Science, Bharathiar University, Coimbatore, 641046, India
Karthika Kuppusamy & Chandra Eswaran

Authors

Karthika Kuppusamy
View author publications
You can also search for this author inPubMed Google Scholar
Chandra Eswaran
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Both the authors conceived of the presented idea, developed the theory and performed the computations and Dr. E. Chandra encouraged K. Karthika to investigate the research and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript. This work has been submitted for Indian Intellectual property with Patent Application Number 201841032399.

Corresponding author

Correspondence to Karthika Kuppusamy.

Ethics declarations

Conflict of Interest

The authors declare that they have no competing interests.

Replication of results

No replicated results are presented.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuppusamy, K., Eswaran, C. Convolutional and Deep Neural Networks based techniques for extracting the age-relevant features of the speaker. J Ambient Intell Human Comput 13, 5655–5667 (2022). https://doi.org/10.1007/s12652-021-03238-1

Download citation

Received: 11 June 2020
Accepted: 25 March 2021
Published: 25 April 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s12652-021-03238-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional and Deep Neural Networks based techniques for extracting the age-relevant features of the speaker

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

Automatic Age Estimation Based on Vocal Cues and Deep Neural Network

Explore related subjects

Data availability

References

Acknowledgment

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Replication of results

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now