Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification

Pusuluri, Aditya; Kachhi, Aastha; Patil, Hemant A.

doi:10.1007/978-3-031-48312-7_33

Aditya Pusuluri¹³,
Aastha Kachhi¹³ &
Hemant A. Patil¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14339))

Included in the following conference series:

International Conference on Speech and Computer

316 Accesses

Abstract

Classification of normal vs. pathological infant cry is a socially relevant and challenging problem. Many feature sets, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Constant Q Cepstral Coefficients (CQCC) have been used for this task. However, an effective representation of the spectral and pitch components of a spectrum together is not achieved leaving scope for improvement. Also, the infant cry can be considered a melodic sound implying that the fundamental frequency and timbre-based features also carry vital information. This work proposes Constant Q Harmonic Coefficients (CQHC), and Constant Q Pitch Coefficients (CQPC) extracted by the decomposition of the Constant Q Transform (CQT) spectrum for the infant cry classification. This work uses Convolutional Neural Network (CNN) as the classifier along with traditional classifiers, such as Gaussian Mixture Models (GMM) and Support Vector Machines (SVM). The results using the CNN classifier are compared by considering the MFCC, LFCC, and CQCC feature sets as the baseline features. The feature-level fusion of MFCC with log-CQHC and MFCC with log-CQPC achieved a 5-fold accuracy of 98.73% and 98.96% respectively, surpassing the baseline MFCC. Furthermore, the fusion of MFCC with log-CQHC and log-CQPC feature sets resulted in improved classification accuracy of 3%, 4.7%, and 5.85% when compared with the baseline MFCC, LFCC, and CQCC feature sets, respectively. Further, our intensive experiments using three classifiers structures, namely, GMM, SVM, and CNN indicate superior results using the proposed feature extraction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alaie, H.F., Abou-Abbas, L., Tadj, C.: Cry-based infant pathology classification using GMMs. Speech Commun. 77, 28–52 (2016)
Article Google Scholar
Armbrüster, L., Mende, W., Gelbrich, G., Wermke, P., Götz, R., Wermke, K.: Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatr. Logop. 73(5), 401–412 (2021)
Article Google Scholar
Brown, J.C.: Calculation of a constant q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
Article Google Scholar
Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant q transform. J. Acoust. Soc. Am. (JASA) 92(5), 2698–2701 (1992)
Article Google Scholar
Budaghyan, D., Gorin, A., Subakan, C., Onu, C.C.: Cryceleb: a speaker verification dataset based on infant cry sounds. arXiv preprint arXiv:2305.00969 (2023)
Buddha, N., Patil, H.A.: Corpora for analysis of infant cry. Oriental Cocosda, Vietnam (2007)
Google Scholar
Chittora, A., Patil, H.A.: Data collection of infant cries for research and analysis. J. Voice 31(2), 252-e15 (2017)
Article Google Scholar
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 3, 326–334 (1965)
Article MATH Google Scholar
Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3543–3559 (2021)
Article Google Scholar
Ezbakhe, F., Pérez-Foguet, A.: Child mortality levels and trends. Demogr. Res. 43, 1263–1296 (2020)
Article Google Scholar
Feinberg, D.R., Jones, B.C., Little, A.C., Burt, D.M., Perrett, D.I.: Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Anim. Behav. 69(3), 561–568 (2005)
Article Google Scholar
Hariharan, M., Yaacob, S., Awang, S.A.: Pathological infant cry analysis using wavelet packet transform and probabilistic neural network. Expert Syst. Appl. 38(12), 15377–15382 (2011)
Article Google Scholar
Hemant, A.: “Patil: Cry baby’’: using spectrographic analysis to assess neonatal health status from an infant’s cry. In: Neustein, A. (ed.) Advances in Speech Recognition, pp. 323–348. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-5951-5_14
Chapter Google Scholar
Ide, H., Kurita, T.: Improvement of learning for CNN with Relu activation by sparse regularization. In: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska. pp. 2684–2691 (2017)
Google Scholar
Ji, C., Mudiyanselage, T.B., Gao, Y., Pan, Y.: A review of infant cry analysis and classification. EURASIP J. Audio Speech Music Process. 2021(1), 1–17 (2021)
Article Google Scholar
Ketkar, N.: Introduction to Keras. In: Deep Learning with Python, pp. 95–109. Apress, Berkeley, CA (2017). https://doi.org/10.1007/978-1-4842-2766-4_7
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)
Google Scholar
Patil, H.A., Patil, A.T., Kachhi, A.: Constant Q cepstral coefficients for classification of normal vs. pathological infant cry. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, pp. 7392–7396 (2022)
Google Scholar
Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50
Chapter Google Scholar
Rafii, Z.: The constant-Q harmonic coefficients: a timbre feature designed for music signals [Lecture Notes]. IEEE Signal Process. Mag. 39(3), 90–96 (2022)
Article Google Scholar
Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015)
Article Google Scholar
Stevens, S.S., Volkmann, J., Newman, E.B.: A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. (JASA) 8(3), 185–190 (1937)
Article Google Scholar
Wasz-Höckert, O., Michelsson, K., Lind, J.: Twenty-five years of scandinavian cry research. In: Lester, B.M., Zachariah Boukydis, C.F. (eds.) Infant Crying, pp. 83–104. Springer, Boston (1985). https://doi.org/10.1007/978-1-4613-2381-5_4
Wermke, K., Mende, W.: Musical elements in human infants’ cries: in the beginning is the melody. Musicae Scientiae 13(2_suppl), 151–175 (2009)
Google Scholar
Wermke, K., Robb, M.P., Schluter, P.J.: Melody complexity of infants’ cry and non-cry vocalisations increases across the first six months. Sci. Rep. 11(1), 1–11 (2021)
Article Google Scholar
Xie, Q., Ward, R.K., Laszlo, C.A.: Automatic assessment of infants’ levels-of-distress from the cry signals. IEEE Trans. Speech Audio Process. 4(4), 253 (1996)
Article Google Scholar
Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, Canada, pp. 1–2 (2018)
Google Scholar

Download references

Acknowledgements

The authors extend their heartfelt gratitude to the organizers of the National Institute of Astrophysics and Optical Electronics, CONACYT Mexico, for generously providing access to the Baby Chilanto database, which has proven to be statistically significant for their research. They would also like to express their appreciation to the Ministry of Electronics and Information Technology (MeitY) in New Delhi, Government of India, for their sponsorship of the consortium project titled ‘Speech Technologies in Indian Languages’ as part of the ‘National Language Translation Mission (NLTM): BHASHINI.’ This project, subtitled ’Building Assistive Speech Technologies for the Challenged,’ carries the Grant ID: 11(1)2022-HCC (TDIL).

Author information

Authors and Affiliations

Speech Research Lab, DA-IICT Gandhinagar Gujarat, Gandhinagar, India
Aditya Pusuluri, Aastha Kachhi & Hemant A. Patil

Authors

Aditya Pusuluri
View author publications
You can also search for this author in PubMed Google Scholar
Aastha Kachhi
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Pusuluri .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pusuluri, A., Kachhi, A., Patil, H.A. (2023). Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-48312-7_33
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48311-0
Online ISBN: 978-3-031-48312-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification