Skip to main content

A CNN-Based Method for Infant Cry Detection and Recognition

  • Conference paper
  • First Online:
Web, Artificial Intelligence and Network Applications (WAINA 2019)

Abstract

Crying is the primary means of communication between the baby and the outside world. When a baby is crying, it is difficult for a novice parent to immediately understand the baby’s needs. If parents can accurately determine the cause of the baby’s cry, they can understand the baby’s emotional and physiological changes and needs. In real-world applications, recording devices may record sounds that are not produced by a baby. To reduce the burden on the recognition server and improve the accuracy of the classifier, this study proposes the conversion of the baby’s crying signal into a two-dimensional spectrogram. A convolutional neural network is used to determine if the input spectrum represents a baby’s cry. A baby’s cry is ultimately divided into four categories (including pain, hunger, sleepiness, and wet diaper) through additional one-dimensional convolutional neural networks. Experimental results showed that the proposed method achieves high crying detection and recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  3. Ballester, P., de Araújo, R.M.: On the performance of GoogLeNet and AlexNet applied to sketches. In: AAAI (2016)

    Google Scholar 

  4. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  5. Rader, C., Brenner, N.: A new principle for fast Fourier transformation. IEEE Trans. Acoust. Speech Signal Process. 24(3), 264–266 (1976)

    Article  Google Scholar 

  6. Tyagi, V., Wellekens, C.: On desensitizing the Mel-Cepstrum to spurious spectral components for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005) , vol. 1 (2005)

    Google Scholar 

  7. Garcia, J.O., Garcia, C.R.: Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. Neural Netw. 4, 3140–3145 (2003)

    Google Scholar 

  8. Petroni, M., et al.: Identification of pain from infant cry vocalizations using artificial neural networks (ANNs). In: Applications and Science of Artificial Neural Networks, vol. 2492. International Society for Optics and Photonics (1995)

    Google Scholar 

  9. Yong, B.F., Ting, H.N., Ng, K.H.: Baby cry recognition using deep neural networks. In: World Congress on Medical Physics and Biomedical Engineering 2018. Springer, Singapore (2019)

    Google Scholar 

  10. Abdel-Hamid, O., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech, Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  12. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)

  13. Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. Acoust. Speech Signal Process. 1, 365–368 (1998)

    Google Scholar 

  14. Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuan-Yu Chang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chang, CY., Tsai, LY. (2019). A CNN-Based Method for Infant Cry Detection and Recognition. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2019. Advances in Intelligent Systems and Computing, vol 927. Springer, Cham. https://doi.org/10.1007/978-3-030-15035-8_76

Download citation

Publish with us

Policies and ethics