A CNN-Based Method for Infant Cry Detection and Recognition

Chang, Chuan-Yu; Tsai, Lung-Yu

doi:10.1007/978-3-030-15035-8_76

Chuan-Yu Chang¹⁸ &
Lung-Yu Tsai¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 927))

Included in the following conference series:

Workshops of the International Conference on Advanced Information Networking and Applications

2851 Accesses
11 Citations

Abstract

Crying is the primary means of communication between the baby and the outside world. When a baby is crying, it is difficult for a novice parent to immediately understand the baby’s needs. If parents can accurately determine the cause of the baby’s cry, they can understand the baby’s emotional and physiological changes and needs. In real-world applications, recording devices may record sounds that are not produced by a baby. To reduce the burden on the recognition server and improve the accuracy of the classifier, this study proposes the conversion of the baby’s crying signal into a two-dimensional spectrogram. A convolutional neural network is used to determine if the input spectrum represents a baby’s cry. A baby’s cry is ultimately divided into four categories (including pain, hunger, sleepiness, and wet diaper) through additional one-dimensional convolutional neural networks. Experimental results showed that the proposed method achieves high crying detection and recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Ballester, P., de Araújo, R.M.: On the performance of GoogLeNet and AlexNet applied to sketches. In: AAAI (2016)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Rader, C., Brenner, N.: A new principle for fast Fourier transformation. IEEE Trans. Acoust. Speech Signal Process. 24(3), 264–266 (1976)
Article Google Scholar
Tyagi, V., Wellekens, C.: On desensitizing the Mel-Cepstrum to spurious spectral components for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005) , vol. 1 (2005)
Google Scholar
Garcia, J.O., Garcia, C.R.: Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. Neural Netw. 4, 3140–3145 (2003)
Google Scholar
Petroni, M., et al.: Identification of pain from infant cry vocalizations using artificial neural networks (ANNs). In: Applications and Science of Artificial Neural Networks, vol. 2492. International Society for Optics and Photonics (1995)
Google Scholar
Yong, B.F., Ting, H.N., Ng, K.H.: Baby cry recognition using deep neural networks. In: World Congress on Medical Physics and Biomedical Engineering 2018. Springer, Singapore (2019)
Google Scholar
Abdel-Hamid, O., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech, Lang. Process. 22(10), 1533–1545 (2014)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)
Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. Acoust. Speech Signal Process. 1, 365–368 (1998)
Google Scholar
Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
Chuan-Yu Chang & Lung-Yu Tsai

Authors

Chuan-Yu Chang
View author publications
You can also search for this author in PubMed Google Scholar
Lung-Yu Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuan-Yu Chang .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Advanced Sciences, Hosei University, Koganei-Shi, Tokyo, Japan
Makoto Takizawa
Department of Computer Science, Technical University of Catalonia, Barcelona, Barcelona, Spain
Fatos Xhafa
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, CY., Tsai, LY. (2019). A CNN-Based Method for Infant Cry Detection and Recognition. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2019. Advances in Intelligent Systems and Computing, vol 927. Springer, Cham. https://doi.org/10.1007/978-3-030-15035-8_76

Download citation

DOI: https://doi.org/10.1007/978-3-030-15035-8_76
Published: 15 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15034-1
Online ISBN: 978-3-030-15035-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics