Robustness of Whisper Features for Infant Cry Classification

Charola, Monil; Rathod, Siddharth; Patil, Hemant A.

doi:10.1007/978-3-031-48312-7_34

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14339))

Included in the following conference series:

International Conference on Speech and Computer

337 Accesses

Abstract

Early intervention and correct identification of the pathology in infant cry is an important and socially relevant research problem, as it can save the lives of many infants, and also improve the quality of their life. This study proposes utilizing Web-scale Supervised Pretraining for Speech Recognition (WSPSR), also known as Whisper, pre-trained Encoder Module (WEM) for infant cry classification task. These features are contrasted with the state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) feature set, for the purpose of classifying normal vs. pathological infant cries. Additionally, we introduce a multi-class classification approach for pathological infant cries using Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (Bi-LSTM) networks. Our study concludes that the combination of the WEM with Deep Neural Networks (DNN) classifiers, such as CNN and Bi-LSTM, outperforms the MFCC feature set by a significant margin. In addition, a series of comprehensive experiments were conducted to assess the noise robustness and the results indicate that WEM features are relatively more robust compared to MFCC. The experiments were performed utilizing a 10-fold cross-validation on standard and statistically meaningful Baby Chilanto dataset, In-House DA-IICT Corpus, and a combined dataset derived from these two datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Linear Frequency Residual Features for Infant Cry Classification

Premature Infant Cry Classification via Deep Convolutional Recurrent Neural Network Based on Multi-class Features

Article 01 August 2023

A review of infant cry analysis and classification

Article Open access 05 February 2021

References

Agarap, A.F.: Deep learning using rectified linear units (relu). CoRR abs/1803.08375 (2018). http://arxiv.org/abs/1803.08375. Accessed 6 Feb 2023
Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)
Article Google Scholar
Anjali, G., Sanjeev, S., Mounika, A., Suhas, G., Reddy, G.P., Kshiraja, Y.: Infant cry classification using transfer learning. In: TENCON 2022, Seoul, South Korea, pp. 1–7. IEEE (2022)
Google Scholar
Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)
Article Google Scholar
Bock, S., Weiß, M.: A proof of local convergence for the Adam optimizer. In: 2019 (IJCNN), pp. 1–8 (2019)
Google Scholar
Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. Oriental Cocosda, Vietnam (2007)
Google Scholar
Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252-e15 (2017)
Article Google Scholar
Ji, C., Basodi, S., Xiao, X., Pan, Y.: Infant sound classification on multi-stage CNNs with hybrid features and prior knowledge. In: Xu, R., De, W., Zhong, W., Tian, L., Bai, Y., Zhang, L.-J. (eds.) AIMS 2020. LNCS, vol. 12401, pp. 3–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59605-7_1
Chapter Google Scholar
Ji, C., Mudiyanselage, T.B., Gao, Y., Pan, Y.: A review of infant cry analysis and classification. EURASIP J. Audio Speech Music Process. 2021(1), 1–17 (2021)
Article Google Scholar
Ji, C., Xiao, X., Basodi, S., Pan, Y.: Deep learning for asphyxiated infant cry classification based on acoustic features and weighted prosodic features. In: 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE smart data (SmartData), pp. 1233–1240. IEEE (2019)
Google Scholar
Manickam, K., Li, H.: Complexity analysis of normal and deaf infant cry acoustic waves. In: 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (2005)
Google Scholar
Onu, C.C., Lebensold, J., Hamilton, W.L., Precup, D.: Neural transfer learning for cry-based diagnosis of perinatal asphyxia. In: International Conference on Learning Representations (ICLR) Workshop, Graz, Austria (2019)
Google Scholar
Onu, C.C., et al.: Ubenwa: cry-based diagnosis of birth asphyxia. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA (2017)
Google Scholar
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015). Accessed 25 Feb 2023
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356 (2022). Accessed 6 Mar 2023
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, pp. 261–267. IEEE (2008)
Google Scholar
Rezaee, K., Ghayoumi Zadeh, H., Qi, L., Rabiee, H., Khosravi, M.R.: Can you understand why i am crying? a decision-making system for classifying infants’ cry languages based on deepsvm model. ACM Transactions on Asian and Low-Resource Language Information Processing (2023)
Google Scholar
Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015)
Article Google Scholar
Sahak, R., Mansor, W., Lee, Y., Yassin, A., Zabidi, A.: Performance of combined support vector machine and principal component analysis in recognizing infant cry with asphyxia. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 6292–6295. IEEE (2010)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Strand, O.M., Egeberg, A.: Cepstral mean and variance normalization in the model domain. In: COST278 and ITRW on Robustness Issues in Conversational Interaction, Norwich, United Kingdom, 30–31 August 2004 (2004)
Google Scholar
Ting, H.N., Choo, Y.M., Kamar, A.A.: Classification of asphyxia infant cry using hybrid speech features and deep learning models. Expert Syst. Appl. 208, 118064 (2022)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in NIPS, Long Beach California, United States of America 30 (2017)
Google Scholar
Xu, H.t., Zhang, J., Dai, L.r.: Differential time-frequency log-mel spectrogram features for vision transformer based infant cry recognition. In: Proceedings of the INTERSPEECH, Incheon Songdo ConvensiA, Korea, pp. 1963–1967 (2022)
Google Scholar
Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in NIPS, vol. 31, 2018, Montreal Canada (2018)
Google Scholar

Download references

Acknowledgements

The authors are thankful to the Ministry of Electronics and Information Technology (MeitY), New Delhi, Government of India, for sponsoring the project, National Language Translation Mission (NLTM): BHASHINI with the objective of Building Assistive Speech Technologies for the Challenged (Grant ID: 11(1)2022-HCC (TDIL)). They also thank the organizers, namely, the National Institute of Astrophysics and Optical Electronics, CONACYT Mexico for the statistically meaningful Baby Chilanto Database.

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Ganghinagar, India
Monil Charola, Siddharth Rathod & Hemant A. Patil

Authors

Monil Charola
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Rathod
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Monil Charola , Siddharth Rathod or Hemant A. Patil .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Charola, M., Rathod, S., Patil, H.A. (2023). Robustness of Whisper Features for Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-48312-7_34
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48311-0
Online ISBN: 978-3-031-48312-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robustness of Whisper Features for Infant Cry Classification

Abstract

Access this chapter

Similar content being viewed by others

Linear Frequency Residual Features for Infant Cry Classification

Premature Infant Cry Classification via Deep Convolutional Recurrent Neural Network Based on Multi-class Features

A review of infant cry analysis and classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robustness of Whisper Features for Infant Cry Classification

Abstract

Access this chapter

Similar content being viewed by others

Linear Frequency Residual Features for Infant Cry Classification

Premature Infant Cry Classification via Deep Convolutional Recurrent Neural Network Based on Multi-class Features

A review of infant cry analysis and classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation