Abstract
Classification of normal vs. pathological infant cries is a socially relevant task as crying is the only known mode of infant communication. Due to quasi-periodic sampling of the vocal tract system, the spectrum formed by high pitch-source harmonics results in extremely poor spectral resolution for commonly used features. This paper investigates the effect of excitation source-based features captured using Linear Prediction Residual for classification of normal vs. pathological infant cries. The performance of Linear Frequency Residual Cepstral Coefficients (LFRCC) was compared for matched conditions (of train and test data) against state-of-the-art feature sets, namely, Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC) using Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) as classifiers. This study also investigated the effect of LFRCC on cross-database (i.e., mismatched conditions) and combined database evaluation scenarios. It was observed that LFRCC outperformed MFCC and LFCC by \(24.9\%\) and \(17.43\%\), respectively, for mismatched conditions and over 0.27%–1.11% for the combined database. The relatively better performance of LFRCC feature set maybe due to its capability in representing excitation source information, which is very prevalent in infant cry as formant structures are not well developed in the initial period of life.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baby Crying Analyzer. http://www.showeryourbaby.com/whycrbacran1.html/. Accessed 25 Nov 2022
Baby Pod. https://babypod.net/en/babypod-device/. Accessed 25 Nov 2022
Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)
Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)
Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. (JASA) 50(2B), 637–655 (1971)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics (ISS). Springer, New York (2006)
Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. In: Oriental COCOSDA, Vietnam (2007)
Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252.e15–252.e26 (2017)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3543–3559 (2021)
Esposito, G., Venuti, P.: Understanding early communication signals in autism: a study of the perception of infants’ cry. J. Intellect. Disabil. Res. 54(3), 216–223 (2010)
Gupta, P., Patil, H.A.: Linear frequency residual cepstral features for replay spoof detection on ASVSpoof 2019. In: 2022 30th European Signal Processing Conference (EUSIPCO), pp. 349–353. IEEE (2022)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C.: A precursor of language acquisition in young infants. Cognition 29(2), 143–178 (1988)
Onu, C.C., et al.: Ubenwa: cry-based diagnosis of birth asphyxia. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA (2017)
Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice, 1st edn. Pearson Education India (2015)
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, 27–31 October 2008, Atizapan De Zaragoza, Mexico, pp. 330–335. IEEE (2008)
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, pp. 261–267 (2008)
Tak, H., Patil, H.A.: Novel linear frequency residual cepstral features for replay attack detection. In: INTERSPEECH, Hyderabad, India, September 2018, pp. 726–730 (2018)
Xie, Q., Ward, R.K., Laszlo, C.A.: Automatic assessment of infants’ levels-of-distress from the cry signals. IEEE Trans. Speech Audio Process. 4(4), 253–265 (1996)
Acknowledgements
The authors would like to express their heartfelt gratitude to several entities for their invaluable contributions to this research. First and foremost, we extend our thanks to the National Institute of Astrophysics and Optical Electronics and CONACYT Mexico for graciously providing access to the Baby Chilanto database, which played a pivotal role in our statistical analyses. We are also deeply appreciative of the Ministry of Electronics and Information Technology (MeitY), New Delhi, Government of India, for their generous sponsorship of the consortium project titled ‘BHASHINI,’ with the subtitle ‘Building Assistive Speech Technologies for the Challenged’ (Grant ID: 11(1)2022-HCC (TDIL)).
Furthermore, we would like to acknowledge the leadership of Prof. Hema A. Murthy and Prof. S. Umesh from IIT Madras, who spearheaded the consortium project. Their guidance and expertise have been instrumental in shaping this research endeavor. Lastly, we extend our gratitude to the authorities at DA-IICT Gandhinagar, India, for their unwavering support and collaboration throughout the course of this study.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Uthiraa, S., Kachhi, A., Patil, H.A. (2023). Linear Frequency Residual Features for Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)