Skip to main content

Infant Cry Classification Using Modified Group Delay Cepstral Coefficients

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15314))

Included in the following conference series:

  • 160 Accesses

Abstract

Classification of pathological vs. normal infant cries is used to infer the infant’s health conditions. Such an approach can be beneficial in many situations and even to save infants’ lives. In this paper, we propose a novel classification system based on the Modified Group Delay Cepstral Coefficients (MGDCC), for classifying infant cries. We investigate generalizability of proposed MGDCC features. The Convolutional Neural Network (CNN) was used as a pattern classifier in this study. Proposed MGDCC features are found to perform better than widely used spectral features, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Group Delay Cepstral Coefficients (GDCC). Experiments are performed on two datasets namely, Baby Chillanto (D1) dataset, and DA-IICT Infant Cry (D2) corpus and for various experimental evaluation factors, such as noise robustness under signal degradation conditions, cross-database scenario, and analysis of latency period. We obtained 2.25% increase accuracy as compared to existing optimal accuracy for proposed task. Better performance of MGDCC is may be due to its capability to implicitly capture time dependencies in the sequence of audio samples via fourier transform phase information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    github repo: “https://github.com/ARTHARKING55/CNN_ICPR_MGDCC”.

References

  1. Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C.: A precursor of language acquisition in young infants. Cognition 29(2), 143–178 (1988)

    Article  Google Scholar 

  2. Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3543–3559 (2021)

    Article  Google Scholar 

  3. Bonneh, Y.S., Levanon, Y., Dean-Pardo, O., Lossos, L., Adini, Y.: Abnormal speech spectrum and increased pitch variability in young autistic children. Front. Hum. Neurosci. 4, 237 (2011)

    Article  Google Scholar 

  4. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  5. Uthiraa, S., Kachhi, A., Patil, H.A.: Linear frequency residual features for infant cry classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds.) SPECOM 2023. LNCS, vol. 14338, pp. 550–561. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48309-7_44

    Chapter  Google Scholar 

  6. Dewi, S.P., Prasasti, A.L., Irawan, B.: The study of baby crying analysis using MFCC and LFCC in different classification methods. In: 2019 IEEE International Conference on Signals and Systems (ICSigSys), Bandung, Indonesia, pp. 18–23 (2019)

    Google Scholar 

  7. Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50

    Chapter  Google Scholar 

  8. Abbaskhah, A., Sedighi, H., Marvi, H.: Infant cry classification by MFCC feature extraction with MLP and CNN structures. Biomed. Signal Process. Control 86, 105–261 (2023)

    Article  Google Scholar 

  9. Zhu, D., Paliwal, K.K.: Product of power spectrum and group delay function for speech recognition. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. I-125 (2004)

    Google Scholar 

  10. Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, vol. 1, p. I-68 (2003)

    Google Scholar 

  11. Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2006)

    Article  Google Scholar 

  12. Alsteris, L.D., Paliwal, K.K.: Evaluation of the modified group delay feature for isolated word recognition. In: Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, Sydney, Australia, vol. 2, pp. 715–718 (2005)

    Google Scholar 

  13. Chittora, A., Patil, H.A.: Modified group delay based features for asthma and HIE infant cries classification. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 595–602. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24033-6_67

    Chapter  Google Scholar 

  14. O’Sullivan, J., et al.: Automatic speech recognition for ASD using the open-source whisper model from OpenAI (2023)

    Google Scholar 

  15. Feng, T., Narayanan, S.: Foundation model assisted automatic speech emotion recognition: transcribing, annotating, and augmenting. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 12116–12120 (2024)

    Google Scholar 

  16. Yang, Y., et al.: A robust audio deepfake detection system via multi-view feature. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 13131–13135 (2024)

    Google Scholar 

  17. Charola, M., Kachhi, A., Patil, H.A.: Whisper encoder features for infant cry classification. In: Proceedings of INTERSPEECH, Dublin, Ireland, vol. 2023, pp. 1773–1777 (2023)

    Google Scholar 

  18. Hannan, E., Thomson, P.: Estimating group delay. Biometrika 60(2), 241–253 (1973)

    Article  MathSciNet  Google Scholar 

  19. Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011)

    Article  Google Scholar 

  20. Zhu, X., Li, Y., Yong, S., Zhuang, Z.: A novel definition and measurement method of group delay and its application. IEEE Trans. Instrum. Meas. 58(1), 229–233 (2008)

    Google Scholar 

  21. Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330–335. Cambridge (2008)

    Google Scholar 

  22. Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, Baja California, Mexico, pp. 261–267 (2008)

    Google Scholar 

  23. Chittora, A., Patil, H.A.: Data collection and corpus design for analysis of nonnal and pathological infant cry. In: 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurugram, India, pp. 1–6 (2013)

    Google Scholar 

  24. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001). Accessed 16 Apr 2024

    Article  Google Scholar 

  25. Dewi, S.P., Prasasti, A.L., Irawan, B.: Analysis of LFCC feature extraction in baby crying classification using KNN. In: IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), Hong Kong, pp. 86–91 (2019)

    Google Scholar 

  26. Joukov, N., Traeger, A., Iyer, R., Wright, C.P., Zadok, E.: Operating system profiling via latency analysis. In: OSDI, Seattle, WA, vol. 6, pp. 89–102 (2006)

    Google Scholar 

  27. Le, L., Kabir, A.N.M., Ji, C., Basodi, S., Pan, Y.: Using transfer learning, SVM, and ensemble classification to classify baby cries based on their spectrogram images. In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW), Monterey, CA, USA, pp. 106–110 (2019)

    Google Scholar 

  28. Chunyan, J., Chen, M., Bin, L., Pan, Y.: Infant cry classification with graph convolutional networks. In: 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Las Vegas, USA, pp. 322–327 (2021)

    Google Scholar 

  29. Parthasarathi, S.H.K., Padmanabhan, R., Murthy, H.A.: Robustness of group delay representations for noisy speech signals. Int. J. Speech Technol. 14, 361–368 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

The authors specially thank Mr. Aditya PSS. (JRF at Speech Lab DA-IICT), and DA-IICT authorities for his helpful support in this study. The authors sincerely thank the MeitY, for funding this study under project ‘BHASHINI’, (Grant ID: 11(1)2022-HCC(TDIL)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arth J. Shah .

Editor information

Editors and Affiliations

Appendix: Noise Robustness of MGDCC

Appendix: Noise Robustness of MGDCC

Let clean \(_{\text {signal }}(x)\) be a clean signal, degraded by adding uncorrelated, additive noise (x) with 0 mean and \(\sigma ^2\) variance. Then, the noisy \( _{\text {signal }}(x)\) can be represented as,

$$\begin{aligned} \operatorname {noisy}_{\text {signal }}(x)=\text { clean }_{\text {signal }}(x)+\operatorname {noise}(x) . \end{aligned}$$
(7)

Obtaining the power spectrum, and taking the Fourier transform, we get,

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) =P_{\text {clean }}\left( e^{j \omega _0}\right) +P_{\text {noise }}\left( e^{j \omega _0}\right) . \end{aligned}$$
(8)

Two frequency regions, which are mutually exclusive (higher and lower SNR), can be obtained from Eq. (8). For the scenario of lower signal-to-noise ratio (SNR), we examine frequencies \(\omega _0\) satisfying \(P_{\text {clean }}\left( e^{j \omega _0}\right) \ll \sigma ^2\left( \omega _0\right) \), while for higher SNR, we focus on frequencies \(\omega _0\), where \(P_{\text {clean }}\left( e^{j \omega _0}\right) \gg \sigma ^2\left( \omega _0\right) \) [29]. For low SNR, we have:

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) =\sigma ^2\left( \omega _0\right) \left( 1+\frac{P_{\text {clean }}\left( e^{j \omega _0}\right) }{\sigma ^2\left( \omega _0\right) }\right) . \end{aligned}$$
(9)

Solving Eq. (9), and neglecting higher order terms, we get:

$$\begin{aligned} \ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \approx \ln \left( \sigma ^2\left( \omega _0\right) \right) +\frac{1}{\sigma ^2\left( \omega _0\right) }\left[ d_0+\sum _{x=1}^{+\infty } d_x \cos \left( \frac{2 \pi }{\omega _0} \omega _0 x\right) \right] . \end{aligned}$$
(10)

Equation (10) can be further solved and GDF can be obtained as mentioned in [29]:

$$\begin{aligned} \tau \left( e^{j \omega _0}\right) \approx \frac{1}{\sigma ^2\left( \omega _0\right) } \sum _{x=1}^{+\infty } x d_x \cos \left( \omega _0 x\right) . \end{aligned}$$
(11)

Similarly for higher SNR, we have:

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) = P_{\text {clean }}\left( e^{j\omega _0}\right) \left( 1 + \frac{\sigma ^2\left( \omega _0\right) }{P_{\text {clean }}\left( \omega _0\right) }\right) . \end{aligned}$$
(12)

Taking the logarithm on both sides of Eq. (12) and using the Taylor series expansion results in expanded term as:

$$\begin{aligned} \ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \approx \frac{d_0}{2}+\frac{\sigma ^2\left( \omega _0\right) e_0}{2}+\sum _{x=1}^{+\infty }\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) . \end{aligned}$$
(13)

Equation (13) can be solved to GDF and the term obtained can be represented as [29]:

$$\begin{aligned} \tau \left( e^{j \omega _0}\right) \approx \sum _{x=1}^{+\infty } x\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) \end{aligned}$$
(14)

The respective GDF for these cases (Eq. (11), and Eq. (14)) summarized and represented as follows [29]:

$$\begin{aligned} \tau _{G D F}\left( e^{j \omega _0}\right) \approx \left\{ \begin{array}{l} \frac{1}{\sigma ^2\left( \omega _0\right) } \sum \nolimits _{x=1}^{+\infty } x d_x \cos \left( \omega _0 x\right) , \text { for lower SNR}, \\ \sum \nolimits _{x=1}^{+\infty } x\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) , \text { for higher SNR}, \end{array}\right. \end{aligned}$$
(15)

The Fourier series coefficients of \(\ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \) and \(\frac{1}{P_{c l e~\hbox {a} n}\left( e^{j \omega _0}\right) }\) are denoted by \(d_x\)’s and \(e_x\)’s, respectively. Equation (15) reveals that in the lower SNR scenario, the GDF is inversely proportional to the noise power, suggesting that the GDF effectively preserves peaks and valleys amidst additive noise. Conversely, for higher SNR values, the GDF is proportional to noise power, although the noise power is lower than the signal power. These findings imply that the GDF tracks the signal spectrum rather than the noise spectrum.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, A.J., Chaudhari, H., Patil, H.A. (2025). Infant Cry Classification Using Modified Group Delay Cepstral Coefficients. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78341-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78340-1

  • Online ISBN: 978-3-031-78341-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics