Abstract
Classification of pathological vs. normal infant cries is used to infer the infant’s health conditions. Such an approach can be beneficial in many situations and even to save infants’ lives. In this paper, we propose a novel classification system based on the Modified Group Delay Cepstral Coefficients (MGDCC), for classifying infant cries. We investigate generalizability of proposed MGDCC features. The Convolutional Neural Network (CNN) was used as a pattern classifier in this study. Proposed MGDCC features are found to perform better than widely used spectral features, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Group Delay Cepstral Coefficients (GDCC). Experiments are performed on two datasets namely, Baby Chillanto (D1) dataset, and DA-IICT Infant Cry (D2) corpus and for various experimental evaluation factors, such as noise robustness under signal degradation conditions, cross-database scenario, and analysis of latency period. We obtained 2.25% increase accuracy as compared to existing optimal accuracy for proposed task. Better performance of MGDCC is may be due to its capability to implicitly capture time dependencies in the sequence of audio samples via fourier transform phase information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
github repo: “https://github.com/ARTHARKING55/CNN_ICPR_MGDCC”.
References
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C.: A precursor of language acquisition in young infants. Cognition 29(2), 143–178 (1988)
Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3543–3559 (2021)
Bonneh, Y.S., Levanon, Y., Dean-Pardo, O., Lossos, L., Adini, Y.: Abnormal speech spectrum and increased pitch variability in young autistic children. Front. Hum. Neurosci. 4, 237 (2011)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Uthiraa, S., Kachhi, A., Patil, H.A.: Linear frequency residual features for infant cry classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds.) SPECOM 2023. LNCS, vol. 14338, pp. 550–561. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48309-7_44
Dewi, S.P., Prasasti, A.L., Irawan, B.: The study of baby crying analysis using MFCC and LFCC in different classification methods. In: 2019 IEEE International Conference on Signals and Systems (ICSigSys), Bandung, Indonesia, pp. 18–23 (2019)
Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50
Abbaskhah, A., Sedighi, H., Marvi, H.: Infant cry classification by MFCC feature extraction with MLP and CNN structures. Biomed. Signal Process. Control 86, 105–261 (2023)
Zhu, D., Paliwal, K.K.: Product of power spectrum and group delay function for speech recognition. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. I-125 (2004)
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, vol. 1, p. I-68 (2003)
Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2006)
Alsteris, L.D., Paliwal, K.K.: Evaluation of the modified group delay feature for isolated word recognition. In: Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, Sydney, Australia, vol. 2, pp. 715–718 (2005)
Chittora, A., Patil, H.A.: Modified group delay based features for asthma and HIE infant cries classification. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 595–602. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24033-6_67
O’Sullivan, J., et al.: Automatic speech recognition for ASD using the open-source whisper model from OpenAI (2023)
Feng, T., Narayanan, S.: Foundation model assisted automatic speech emotion recognition: transcribing, annotating, and augmenting. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 12116–12120 (2024)
Yang, Y., et al.: A robust audio deepfake detection system via multi-view feature. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 13131–13135 (2024)
Charola, M., Kachhi, A., Patil, H.A.: Whisper encoder features for infant cry classification. In: Proceedings of INTERSPEECH, Dublin, Ireland, vol. 2023, pp. 1773–1777 (2023)
Hannan, E., Thomson, P.: Estimating group delay. Biometrika 60(2), 241–253 (1973)
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011)
Zhu, X., Li, Y., Yong, S., Zhuang, Z.: A novel definition and measurement method of group delay and its application. IEEE Trans. Instrum. Meas. 58(1), 229–233 (2008)
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330–335. Cambridge (2008)
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, Baja California, Mexico, pp. 261–267 (2008)
Chittora, A., Patil, H.A.: Data collection and corpus design for analysis of nonnal and pathological infant cry. In: 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurugram, India, pp. 1–6 (2013)
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001). Accessed 16 Apr 2024
Dewi, S.P., Prasasti, A.L., Irawan, B.: Analysis of LFCC feature extraction in baby crying classification using KNN. In: IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), Hong Kong, pp. 86–91 (2019)
Joukov, N., Traeger, A., Iyer, R., Wright, C.P., Zadok, E.: Operating system profiling via latency analysis. In: OSDI, Seattle, WA, vol. 6, pp. 89–102 (2006)
Le, L., Kabir, A.N.M., Ji, C., Basodi, S., Pan, Y.: Using transfer learning, SVM, and ensemble classification to classify baby cries based on their spectrogram images. In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW), Monterey, CA, USA, pp. 106–110 (2019)
Chunyan, J., Chen, M., Bin, L., Pan, Y.: Infant cry classification with graph convolutional networks. In: 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Las Vegas, USA, pp. 322–327 (2021)
Parthasarathi, S.H.K., Padmanabhan, R., Murthy, H.A.: Robustness of group delay representations for noisy speech signals. Int. J. Speech Technol. 14, 361–368 (2011)
Acknowledgements
The authors specially thank Mr. Aditya PSS. (JRF at Speech Lab DA-IICT), and DA-IICT authorities for his helpful support in this study. The authors sincerely thank the MeitY, for funding this study under project ‘BHASHINI’, (Grant ID: 11(1)2022-HCC(TDIL)).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Noise Robustness of MGDCC
Appendix: Noise Robustness of MGDCC
Let clean \(_{\text {signal }}(x)\) be a clean signal, degraded by adding uncorrelated, additive noise (x) with 0 mean and \(\sigma ^2\) variance. Then, the noisy \( _{\text {signal }}(x)\) can be represented as,
Obtaining the power spectrum, and taking the Fourier transform, we get,
Two frequency regions, which are mutually exclusive (higher and lower SNR), can be obtained from Eq. (8). For the scenario of lower signal-to-noise ratio (SNR), we examine frequencies \(\omega _0\) satisfying \(P_{\text {clean }}\left( e^{j \omega _0}\right) \ll \sigma ^2\left( \omega _0\right) \), while for higher SNR, we focus on frequencies \(\omega _0\), where \(P_{\text {clean }}\left( e^{j \omega _0}\right) \gg \sigma ^2\left( \omega _0\right) \) [29]. For low SNR, we have:
Solving Eq. (9), and neglecting higher order terms, we get:
Equation (10) can be further solved and GDF can be obtained as mentioned in [29]:
Similarly for higher SNR, we have:
Taking the logarithm on both sides of Eq. (12) and using the Taylor series expansion results in expanded term as:
Equation (13) can be solved to GDF and the term obtained can be represented as [29]:
The respective GDF for these cases (Eq. (11), and Eq. (14)) summarized and represented as follows [29]:
The Fourier series coefficients of \(\ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \) and \(\frac{1}{P_{c l e~\hbox {a} n}\left( e^{j \omega _0}\right) }\) are denoted by \(d_x\)’s and \(e_x\)’s, respectively. Equation (15) reveals that in the lower SNR scenario, the GDF is inversely proportional to the noise power, suggesting that the GDF effectively preserves peaks and valleys amidst additive noise. Conversely, for higher SNR values, the GDF is proportional to noise power, although the noise power is lower than the signal power. These findings imply that the GDF tracks the signal spectrum rather than the noise spectrum.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shah, A.J., Chaudhari, H., Patil, H.A. (2025). Infant Cry Classification Using Modified Group Delay Cepstral Coefficients. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-78341-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)