Infant Cry Classification Using Modified Group Delay Cepstral Coefficients

Shah, Arth J.; Chaudhari, Hiya; Patil, Hemant A.

doi:10.1007/978-3-031-78341-8_18

Arth J. Shah¹³,
Hiya Chaudhari¹³ &
Hemant A. Patil¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15314))

Included in the following conference series:

International Conference on Pattern Recognition

160 Accesses

Abstract

Classification of pathological vs. normal infant cries is used to infer the infant’s health conditions. Such an approach can be beneficial in many situations and even to save infants’ lives. In this paper, we propose a novel classification system based on the Modified Group Delay Cepstral Coefficients (MGDCC), for classifying infant cries. We investigate generalizability of proposed MGDCC features. The Convolutional Neural Network (CNN) was used as a pattern classifier in this study. Proposed MGDCC features are found to perform better than widely used spectral features, such as Mel Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Group Delay Cepstral Coefficients (GDCC). Experiments are performed on two datasets namely, Baby Chillanto (D1) dataset, and DA-IICT Infant Cry (D2) corpus and for various experimental evaluation factors, such as noise robustness under signal degradation conditions, cross-database scenario, and analysis of latency period. We obtained 2.25% increase accuracy as compared to existing optimal accuracy for proposed task. Better performance of MGDCC is may be due to its capability to implicitly capture time dependencies in the sequence of audio samples via fourier transform phase information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Linear Frequency Residual Features for Infant Cry Classification

Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification

Robustness of Whisper Features for Infant Cry Classification

Notes

1.
github repo: “https://github.com/ARTHARKING55/CNN_ICPR_MGDCC”.

References

Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C.: A precursor of language acquisition in young infants. Cognition 29(2), 143–178 (1988)
Article Google Scholar
Engelsma, J.J., Deb, D., Cao, K., Bhatnagar, A., Sudhish, P.S., Jain, A.K.: Infant-ID: fingerprints for global good. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3543–3559 (2021)
Article Google Scholar
Bonneh, Y.S., Levanon, Y., Dean-Pardo, O., Lossos, L., Adini, Y.: Abnormal speech spectrum and increased pitch variability in young autistic children. Front. Hum. Neurosci. 4, 237 (2011)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Uthiraa, S., Kachhi, A., Patil, H.A.: Linear frequency residual features for infant cry classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds.) SPECOM 2023. LNCS, vol. 14338, pp. 550–561. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-48309-7_44
Chapter Google Scholar
Dewi, S.P., Prasasti, A.L., Irawan, B.: The study of baby crying analysis using MFCC and LFCC in different classification methods. In: 2019 IEEE International Conference on Signals and Systems (ICSigSys), Bandung, Indonesia, pp. 18–23 (2019)
Google Scholar
Pusuluri, A., Kachhi, A., Patil, H.A.: Analysis of time-averaged feature extraction techniques on infant cry classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds.) SPECOM 2022. LNCS, vol. 13721, pp. 590–603. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20980-2_50
Chapter Google Scholar
Abbaskhah, A., Sedighi, H., Marvi, H.: Infant cry classification by MFCC feature extraction with MLP and CNN structures. Biomed. Signal Process. Control 86, 105–261 (2023)
Article Google Scholar
Zhu, D., Paliwal, K.K.: Product of power spectrum and group delay function for speech recognition. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. I-125 (2004)
Google Scholar
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, vol. 1, p. I-68 (2003)
Google Scholar
Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2006)
Article Google Scholar
Alsteris, L.D., Paliwal, K.K.: Evaluation of the modified group delay feature for isolated word recognition. In: Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, Sydney, Australia, vol. 2, pp. 715–718 (2005)
Google Scholar
Chittora, A., Patil, H.A.: Modified group delay based features for asthma and HIE infant cries classification. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 595–602. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24033-6_67
Chapter Google Scholar
O’Sullivan, J., et al.: Automatic speech recognition for ASD using the open-source whisper model from OpenAI (2023)
Google Scholar
Feng, T., Narayanan, S.: Foundation model assisted automatic speech emotion recognition: transcribing, annotating, and augmenting. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 12116–12120 (2024)
Google Scholar
Yang, Y., et al.: A robust audio deepfake detection system via multi-view feature. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 13131–13135 (2024)
Google Scholar
Charola, M., Kachhi, A., Patil, H.A.: Whisper encoder features for infant cry classification. In: Proceedings of INTERSPEECH, Dublin, Ireland, vol. 2023, pp. 1773–1777 (2023)
Google Scholar
Hannan, E., Thomson, P.: Estimating group delay. Biometrika 60(2), 241–253 (1973)
Article MathSciNet Google Scholar
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011)
Article Google Scholar
Zhu, X., Li, Y., Yong, S., Zhuang, Z.: A novel definition and measurement method of group delay and its application. IEEE Trans. Instrum. Meas. 58(1), 229–233 (2008)
Google Scholar
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330–335. Cambridge (2008)
Google Scholar
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Validation of the cry unit as primary element for cry analysis using an evolutionary-neural approach. In: 2008 Mexican International Conference on Computer Science, Baja California, Mexico, pp. 261–267 (2008)
Google Scholar
Chittora, A., Patil, H.A.: Data collection and corpus design for analysis of nonnal and pathological infant cry. In: 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurugram, India, pp. 1–6 (2013)
Google Scholar
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001). Accessed 16 Apr 2024
Article Google Scholar
Dewi, S.P., Prasasti, A.L., Irawan, B.: Analysis of LFCC feature extraction in baby crying classification using KNN. In: IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), Hong Kong, pp. 86–91 (2019)
Google Scholar
Joukov, N., Traeger, A., Iyer, R., Wright, C.P., Zadok, E.: Operating system profiling via latency analysis. In: OSDI, Seattle, WA, vol. 6, pp. 89–102 (2006)
Google Scholar
Le, L., Kabir, A.N.M., Ji, C., Basodi, S., Pan, Y.: Using transfer learning, SVM, and ensemble classification to classify baby cries based on their spectrogram images. In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW), Monterey, CA, USA, pp. 106–110 (2019)
Google Scholar
Chunyan, J., Chen, M., Bin, L., Pan, Y.: Infant cry classification with graph convolutional networks. In: 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Las Vegas, USA, pp. 322–327 (2021)
Google Scholar
Parthasarathi, S.H.K., Padmanabhan, R., Murthy, H.A.: Robustness of group delay representations for noisy speech signals. Int. J. Speech Technol. 14, 361–368 (2011)
Article Google Scholar

Download references

Acknowledgements

The authors specially thank Mr. Aditya PSS. (JRF at Speech Lab DA-IICT), and DA-IICT authorities for his helpful support in this study. The authors sincerely thank the MeitY, for funding this study under project ‘BHASHINI’, (Grant ID: 11(1)2022-HCC(TDIL)).

Author information

Authors and Affiliations

Speech Research Lab, DA-IICT, Gandhinagar, India
Arth J. Shah, Hiya Chaudhari & Hemant A. Patil

Authors

Arth J. Shah
View author publications
You can also search for this author in PubMed Google Scholar
Hiya Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arth J. Shah .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Appendix: Noise Robustness of MGDCC

Let clean $_{\text {signal }}(x)$ be a clean signal, degraded by adding uncorrelated, additive noise (x) with 0 mean and $\sigma ^2$ variance. Then, the noisy $ _{\text {signal }}(x)$ can be represented as,

$$\begin{aligned} \operatorname {noisy}_{\text {signal }}(x)=\text { clean }_{\text {signal }}(x)+\operatorname {noise}(x) . \end{aligned}$$

(7)

Obtaining the power spectrum, and taking the Fourier transform, we get,

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) =P_{\text {clean }}\left( e^{j \omega _0}\right) +P_{\text {noise }}\left( e^{j \omega _0}\right) . \end{aligned}$$

(8)

Two frequency regions, which are mutually exclusive (higher and lower SNR), can be obtained from Eq. (8). For the scenario of lower signal-to-noise ratio (SNR), we examine frequencies $\omega _0$ satisfying $P_{\text {clean }}\left( e^{j \omega _0}\right) \ll \sigma ^2\left( \omega _0\right) $, while for higher SNR, we focus on frequencies $\omega _0$, where $P_{\text {clean }}\left( e^{j \omega _0}\right) \gg \sigma ^2\left( \omega _0\right) $ [29]. For low SNR, we have:

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) =\sigma ^2\left( \omega _0\right) \left( 1+\frac{P_{\text {clean }}\left( e^{j \omega _0}\right) }{\sigma ^2\left( \omega _0\right) }\right) . \end{aligned}$$

(9)

Solving Eq. (9), and neglecting higher order terms, we get:

$$\begin{aligned} \ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \approx \ln \left( \sigma ^2\left( \omega _0\right) \right) +\frac{1}{\sigma ^2\left( \omega _0\right) }\left[ d_0+\sum _{x=1}^{+\infty } d_x \cos \left( \frac{2 \pi }{\omega _0} \omega _0 x\right) \right] . \end{aligned}$$

(10)

Equation (10) can be further solved and GDF can be obtained as mentioned in [29]:

$$\begin{aligned} \tau \left( e^{j \omega _0}\right) \approx \frac{1}{\sigma ^2\left( \omega _0\right) } \sum _{x=1}^{+\infty } x d_x \cos \left( \omega _0 x\right) . \end{aligned}$$

(11)

Similarly for higher SNR, we have:

$$\begin{aligned} P_{\text {noisy }}\left( e^{j \omega _0}\right) = P_{\text {clean }}\left( e^{j\omega _0}\right) \left( 1 + \frac{\sigma ^2\left( \omega _0\right) }{P_{\text {clean }}\left( \omega _0\right) }\right) . \end{aligned}$$

(12)

Taking the logarithm on both sides of Eq. (12) and using the Taylor series expansion results in expanded term as:

$$\begin{aligned} \ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) \approx \frac{d_0}{2}+\frac{\sigma ^2\left( \omega _0\right) e_0}{2}+\sum _{x=1}^{+\infty }\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) . \end{aligned}$$

(13)

Equation (13) can be solved to GDF and the term obtained can be represented as [29]:

$$\begin{aligned} \tau \left( e^{j \omega _0}\right) \approx \sum _{x=1}^{+\infty } x\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) \end{aligned}$$

(14)

The respective GDF for these cases (Eq. (11), and Eq. (14)) summarized and represented as follows [29]:

$$\begin{aligned} \tau _{G D F}\left( e^{j \omega _0}\right) \approx \left\{ \begin{array}{l} \frac{1}{\sigma ^2\left( \omega _0\right) } \sum \nolimits _{x=1}^{+\infty } x d_x \cos \left( \omega _0 x\right) , \text { for lower SNR}, \\ \sum \nolimits _{x=1}^{+\infty } x\left( d_x+\sigma ^2\left( \omega _0\right) e_x\right) \cos \left( \omega _0 x\right) , \text { for higher SNR}, \end{array}\right. \end{aligned}$$

(15)

The Fourier series coefficients of $\ln \left( P_{\text {noisy }}\left( e^{j \omega _0}\right) \right) $ and $\frac{1}{P_{c l e~\hbox {a} n}\left( e^{j \omega _0}\right) }$ are denoted by $d_x$’s and $e_x$’s, respectively. Equation (15) reveals that in the lower SNR scenario, the GDF is inversely proportional to the noise power, suggesting that the GDF effectively preserves peaks and valleys amidst additive noise. Conversely, for higher SNR values, the GDF is proportional to noise power, although the noise power is lower than the signal power. These findings imply that the GDF tracks the signal spectrum rather than the noise spectrum.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, A.J., Chaudhari, H., Patil, H.A. (2025). Infant Cry Classification Using Modified Group Delay Cepstral Coefficients. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-78341-8_18
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Infant Cry Classification Using Modified Group Delay Cepstral Coefficients

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Linear Frequency Residual Features for Infant Cry Classification

Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification

Robustness of Whisper Features for Infant Cry Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Noise Robustness of MGDCC

Appendix: Noise Robustness of MGDCC

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships