Abstract
Classification of normal vs. pathological infant cry is a socially relevant and challenging problem. Many feature sets, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Constant Q Cepstral Coefficients (CQCC) have been used for this task. However, an effective representation of the spectral and pitch components of a spectrum together is not achieved leaving scope for improvement. Also, the infant cry can be considered a melodic sound implying that the fundamental frequency and timbre-based features also carry vital information. This work proposes Constant Q Harmonic Coefficients (CQHC), and Constant Q Pitch Coefficients (CQPC) extracted by the decomposition of the Constant Q Transform (CQT) spectrum for the infant cry classification. This work uses Convolutional Neural Network (CNN) as the classifier along with traditional classifiers, such as Gaussian Mixture Models (GMM) and Support Vector Machines (SVM). The results using the CNN classifier are compared by considering the MFCC, LFCC, and CQCC feature sets as the baseline features. The feature-level fusion of MFCC with log-CQHC and MFCC with log-CQPC achieved a 5-fold accuracy of 98.73% and 98.96% respectively, surpassing the baseline MFCC. Furthermore, the fusion of MFCC with log-CQHC and log-CQPC feature sets resulted in improved classification accuracy of 3%, 4.7%, and 5.85% when compared with the baseline MFCC, LFCC, and CQCC feature sets, respectively. Further, our intensive experiments using three classifiers structures, namely, GMM, SVM, and CNN indicate superior results using the proposed feature extraction techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)
Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)
Brown, J.C.: Calculation of a constant q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant q transform. J. Acoust. Soc. Am. (JASA) 92(5), 2698–2701 (1992)
Budaghyan, D., Gorin, A., Subakan, C., Onu, C.C.: Cryceleb: a speaker verification dataset based on infant cry sounds. arXiv preprint arXiv:2305.00969 (2023)
Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. Oriental Cocosda, Vietnam (2007)
Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252-e15 (2017)
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 3, 326–334 (1965)
Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3543–3559 (2021)
Ezbakhe, F., Pérez-Foguet, A.: Child mortality levels and trends. Demogr. Res. 43, 1263–1296 (2020)
Feinberg, D.R., Jones, B.C., Little, A.C., Burt, D.M., Perrett, D.I.: Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Anim. Behav. 69(3), 561–568 (2005)
Hariharan, M., Yaacob, S., Awang, S.A.: Pathological infant cry analysis using wavelet packet transform and probabilistic neural network. Expert Syst. Appl. 38(12), 15377–15382 (2011)
Hemant, A.: “Patil: Cry baby’’: using spectrographic analysis to assess neonatal health status from an infant’s cry. In: Neustein, A. (ed.) Advances in Speech Recognition, pp. 323–348. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-5951-5_14
Ide, H., Kurita, T.: Improvement of learning for CNN with Relu activation by sparse regularization. In: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska. pp. 2684–2691 (2017)
Ji, C., Mudiyanselage, T.B., Gao, Y., Pan, Y.: A review of infant cry analysis and classification. EURASIP J. Audio Speech Music Process. 2021(1), 1–17 (2021)
Ketkar, N.: Introduction to Keras. In: Deep Learning with Python, pp. 95–109. Apress, Berkeley, CA (2017). https://doi.org/10.1007/978-1-4842-2766-4_7
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)
Patil, H.A., Patil, A.T., Kachhi, A.: Constant Q cepstral coefficients for classification of normal vs. pathological infant cry. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, pp. 7392–7396 (2022)
Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50
Rafii, Z.: The constant-Q harmonic coefficients: a timbre feature designed for music signals [Lecture Notes]. IEEE Signal Process. Mag. 39(3), 90–96 (2022)
Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015)
Stevens, S.S., Volkmann, J., Newman, E.B.: A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. (JASA) 8(3), 185–190 (1937)
Wasz-Höckert, O., Michelsson, K., Lind, J.: Twenty-five years of scandinavian cry research. In: Lester, B.M., Zachariah Boukydis, C.F. (eds.) Infant Crying, pp. 83–104. Springer, Boston (1985). https://doi.org/10.1007/978-1-4613-2381-5_4
Wermke, K., Mende, W.: Musical elements in human infants’ cries: in the beginning is the melody. Musicae Scientiae 13(2_suppl), 151–175 (2009)
Wermke, K., Robb, M.P., Schluter, P.J.: Melody complexity of infants’ cry and non-cry vocalisations increases across the first six months. Sci. Rep. 11(1), 1–11 (2021)
Xie, Q., Ward, R.K., Laszlo, C.A.: Automatic assessment of infants’ levels-of-distress from the cry signals. IEEE Trans. Speech Audio Process. 4(4), 253 (1996)
Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, Canada, pp. 1–2 (2018)
Acknowledgements
The authors extend their heartfelt gratitude to the organizers of the National Institute of Astrophysics and Optical Electronics, CONACYT Mexico, for generously providing access to the Baby Chilanto database, which has proven to be statistically significant for their research. They would also like to express their appreciation to the Ministry of Electronics and Information Technology (MeitY) in New Delhi, Government of India, for their sponsorship of the consortium project titled ‘Speech Technologies in Indian Languages’ as part of the ‘National Language Translation Mission (NLTM): BHASHINI.’ This project, subtitled ’Building Assistive Speech Technologies for the Challenged,’ carries the Grant ID: 11(1)2022-HCC (TDIL).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pusuluri, A., Kachhi, A., Patil, H.A. (2023). Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-48312-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48311-0
Online ISBN: 978-3-031-48312-7
eBook Packages: Computer ScienceComputer Science (R0)