Skip to main content

Linear Frequency Residual Features for Infant Cry Classification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

  • 779 Accesses

Abstract

Classification of normal vs. pathological infant cries is a socially relevant task as crying is the only known mode of infant communication. Due to quasi-periodic sampling of the vocal tract system, the spectrum formed by high pitch-source harmonics results in extremely poor spectral resolution for commonly used features. This paper investigates the effect of excitation source-based features captured using Linear Prediction Residual for classification of normal vs. pathological infant cries. The performance of Linear Frequency Residual Cepstral Coefficients (LFRCC) was compared for matched conditions (of train and test data) against state-of-the-art feature sets, namely, Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC) using Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) as classifiers. This study also investigated the effect of LFRCC on cross-database (i.e., mismatched conditions) and combined database evaluation scenarios. It was observed that LFRCC outperformed MFCC and LFCC by \(24.9\%\) and \(17.43\%\), respectively, for mismatched conditions and over 0.27%–1.11% for the combined database. The relatively better performance of LFRCC feature set maybe due to its capability in representing excitation source information, which is very prevalent in infant cry as formant structures are not well developed in the initial period of life.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baby Crying Analyzer. http://www.showeryourbaby.com/whycrbacran1.html/. Accessed 25 Nov 2022

  2. Baby Pod. https://babypod.net/en/babypod-device/. Accessed 25 Nov 2022

  3. Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)

    Article  Google Scholar 

  4. Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)

    Article  Google Scholar 

  5. Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. (JASA) 50(2B), 637–655 (1971)

    Article  Google Scholar 

  6. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics (ISS). Springer, New York (2006)

    Google Scholar 

  7. Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. In: Oriental COCOSDA, Vietnam (2007)

    Google Scholar 

  8. Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252.e15–252.e26 (2017)

    Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  10. Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3543–3559 (2021)

    Article  Google Scholar 

  11. Esposito, G., Venuti, P.: Understanding early communication signals in autism: a study of the perception of infants’ cry. J. Intellect. Disabil. Res. 54(3), 216–223 (2010)

    Article  Google Scholar 

  12. Gupta, P., Patil, H.A.: Linear frequency residual cepstral features for replay spoof detection on ASVSpoof 2019. In: 2022 30th European Signal Processing Conference (EUSIPCO), pp. 349–353. IEEE (2022)

    Google Scholar 

  13. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  14. Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C.: A precursor of language acquisition in young infants. Cognition 29(2), 143–178 (1988)

    Article  Google Scholar 

  15. Onu, C.C., et al.: Ubenwa: cry-based diagnosis of birth asphyxia. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA (2017)

    Google Scholar 

  16. Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice, 1st edn. Pearson Education India (2015)

    Google Scholar 

  17. Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, 27–31 October 2008, Atizapan De Zaragoza, Mexico, pp. 330–335. IEEE (2008)

    Google Scholar 

  18. Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, pp. 261–267 (2008)

    Google Scholar 

  19. Tak, H., Patil, H.A.: Novel linear frequency residual cepstral features for replay attack detection. In: INTERSPEECH, Hyderabad, India, September 2018, pp. 726–730 (2018)

    Google Scholar 

  20. Xie, Q., Ward, R.K., Laszlo, C.A.: Automatic assessment of infants’ levels-of-distress from the cry signals. IEEE Trans. Speech Audio Process. 4(4), 253–265 (1996)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their heartfelt gratitude to several entities for their invaluable contributions to this research. First and foremost, we extend our thanks to the National Institute of Astrophysics and Optical Electronics and CONACYT Mexico for graciously providing access to the Baby Chilanto database, which played a pivotal role in our statistical analyses. We are also deeply appreciative of the Ministry of Electronics and Information Technology (MeitY), New Delhi, Government of India, for their generous sponsorship of the consortium project titled ‘BHASHINI,’ with the subtitle ‘Building Assistive Speech Technologies for the Challenged’ (Grant ID: 11(1)2022-HCC (TDIL)).

Furthermore, we would like to acknowledge the leadership of Prof. Hema A. Murthy and Prof. S. Umesh from IIT Madras, who spearheaded the consortium project. Their guidance and expertise have been instrumental in shaping this research endeavor. Lastly, we extend our gratitude to the authorities at DA-IICT Gandhinagar, India, for their unwavering support and collaboration throughout the course of this study.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aastha Kachhi or Hemant A. Patil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uthiraa, S., Kachhi, A., Patil, H.A. (2023). Linear Frequency Residual Features for Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics