Skip to main content

Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14339))

Included in the following conference series:

  • 316 Accesses

Abstract

Classification of normal vs. pathological infant cry is a socially relevant and challenging problem. Many feature sets, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Constant Q Cepstral Coefficients (CQCC) have been used for this task. However, an effective representation of the spectral and pitch components of a spectrum together is not achieved leaving scope for improvement. Also, the infant cry can be considered a melodic sound implying that the fundamental frequency and timbre-based features also carry vital information. This work proposes Constant Q Harmonic Coefficients (CQHC), and Constant Q Pitch Coefficients (CQPC) extracted by the decomposition of the Constant Q Transform (CQT) spectrum for the infant cry classification. This work uses Convolutional Neural Network (CNN) as the classifier along with traditional classifiers, such as Gaussian Mixture Models (GMM) and Support Vector Machines (SVM). The results using the CNN classifier are compared by considering the MFCC, LFCC, and CQCC feature sets as the baseline features. The feature-level fusion of MFCC with log-CQHC and MFCC with log-CQPC achieved a 5-fold accuracy of 98.73% and 98.96% respectively, surpassing the baseline MFCC. Furthermore, the fusion of MFCC with log-CQHC and log-CQPC feature sets resulted in improved classification accuracy of 3%, 4.7%, and 5.85% when compared with the baseline MFCC, LFCC, and CQCC feature sets, respectively. Further, our intensive experiments using three classifiers structures, namely, GMM, SVM, and CNN indicate superior results using the proposed feature extraction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)

    Article  Google Scholar 

  2. Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)

    Article  Google Scholar 

  3. Brown, J.C.: Calculation of a constant q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)

    Article  Google Scholar 

  4. Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant q transform. J. Acoust. Soc. Am. (JASA) 92(5), 2698–2701 (1992)

    Article  Google Scholar 

  5. Budaghyan, D., Gorin, A., Subakan, C., Onu, C.C.: Cryceleb: a speaker verification dataset based on infant cry sounds. arXiv preprint arXiv:2305.00969 (2023)

  6. Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. Oriental Cocosda, Vietnam (2007)

    Google Scholar 

  7. Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252-e15 (2017)

    Article  Google Scholar 

  8. Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 3, 326–334 (1965)

    Article  MATH  Google Scholar 

  9. Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3543–3559 (2021)

    Article  Google Scholar 

  10. Ezbakhe, F., Pérez-Foguet, A.: Child mortality levels and trends. Demogr. Res. 43, 1263–1296 (2020)

    Article  Google Scholar 

  11. Feinberg, D.R., Jones, B.C., Little, A.C., Burt, D.M., Perrett, D.I.: Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Anim. Behav. 69(3), 561–568 (2005)

    Article  Google Scholar 

  12. Hariharan, M., Yaacob, S., Awang, S.A.: Pathological infant cry analysis using wavelet packet transform and probabilistic neural network. Expert Syst. Appl. 38(12), 15377–15382 (2011)

    Article  Google Scholar 

  13. Hemant, A.: “Patil: Cry baby’’: using spectrographic analysis to assess neonatal health status from an infant’s cry. In: Neustein, A. (ed.) Advances in Speech Recognition, pp. 323–348. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-5951-5_14

    Chapter  Google Scholar 

  14. Ide, H., Kurita, T.: Improvement of learning for CNN with Relu activation by sparse regularization. In: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska. pp. 2684–2691 (2017)

    Google Scholar 

  15. Ji, C., Mudiyanselage, T.B., Gao, Y., Pan, Y.: A review of infant cry analysis and classification. EURASIP J. Audio Speech Music Process. 2021(1), 1–17 (2021)

    Article  Google Scholar 

  16. Ketkar, N.: Introduction to Keras. In: Deep Learning with Python, pp. 95–109. Apress, Berkeley, CA (2017). https://doi.org/10.1007/978-1-4842-2766-4_7

  17. McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)

    Google Scholar 

  18. Patil, H.A., Patil, A.T., Kachhi, A.: Constant Q cepstral coefficients for classification of normal vs. pathological infant cry. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, pp. 7392–7396 (2022)

    Google Scholar 

  19. Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50

    Chapter  Google Scholar 

  20. Rafii, Z.: The constant-Q harmonic coefficients: a timbre feature designed for music signals [Lecture Notes]. IEEE Signal Process. Mag. 39(3), 90–96 (2022)

    Article  Google Scholar 

  21. Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015)

    Article  Google Scholar 

  22. Stevens, S.S., Volkmann, J., Newman, E.B.: A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. (JASA) 8(3), 185–190 (1937)

    Article  Google Scholar 

  23. Wasz-Höckert, O., Michelsson, K., Lind, J.: Twenty-five years of scandinavian cry research. In: Lester, B.M., Zachariah Boukydis, C.F. (eds.) Infant Crying, pp. 83–104. Springer, Boston (1985). https://doi.org/10.1007/978-1-4613-2381-5_4

  24. Wermke, K., Mende, W.: Musical elements in human infants’ cries: in the beginning is the melody. Musicae Scientiae 13(2_suppl), 151–175 (2009)

    Google Scholar 

  25. Wermke, K., Robb, M.P., Schluter, P.J.: Melody complexity of infants’ cry and non-cry vocalisations increases across the first six months. Sci. Rep. 11(1), 1–11 (2021)

    Article  Google Scholar 

  26. Xie, Q., Ward, R.K., Laszlo, C.A.: Automatic assessment of infants’ levels-of-distress from the cry signals. IEEE Trans. Speech Audio Process. 4(4), 253 (1996)

    Article  Google Scholar 

  27. Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, Canada, pp. 1–2 (2018)

    Google Scholar 

Download references

Acknowledgements

The authors extend their heartfelt gratitude to the organizers of the National Institute of Astrophysics and Optical Electronics, CONACYT Mexico, for generously providing access to the Baby Chilanto database, which has proven to be statistically significant for their research. They would also like to express their appreciation to the Ministry of Electronics and Information Technology (MeitY) in New Delhi, Government of India, for their sponsorship of the consortium project titled ‘Speech Technologies in Indian Languages’ as part of the ‘National Language Translation Mission (NLTM): BHASHINI.’ This project, subtitled ’Building Assistive Speech Technologies for the Challenged,’ carries the Grant ID: 11(1)2022-HCC (TDIL).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Pusuluri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pusuluri, A., Kachhi, A., Patil, H.A. (2023). Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48312-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48311-0

  • Online ISBN: 978-3-031-48312-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics