Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs

Ji, Chunyan; Jiao, Yang; Chen, Ming; Pan, Yi

doi:10.1007/978-3-031-23504-7_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13729))

Included in the following conference series:

International Conference on AI and Mobile Services

351 Accesses

Abstract

We propose a novel method of using feature fusion and model fusion to improve infant cry classification performance. Spectrogram features extracted from transfer learning convolutional neural network model and mel-spectrogram features extracted from mel-spectrogram decomposition model are fused and fed into a multiple layer perception for better classification accuracy. The mel-spectrogram decomposition method feeds band-wise crops of the mel-spectrograms into multiple CNNs followed by a merged global classifier to capture more enhanced discriminative features. Feature fusion brings higher dimensional detailed information and characteristics more in line with human hearing perception together to achieve better performance on CNNs. The evaluation of the approach is conducted on Baby Chillanto database and Baby2020 database. Our approach yields a significant reduction of 4.72% absolute classification error rate compared with the result using single mel-spectrogram images with CNN model on Baby Chillanto database and our testing accuracy reaches 99.26%, which outperforms all other methods with this five-category classification task. The gender classification experiment on Baby2020 database also shows 3.87% accuracy improvement compared with the CNN model using single spectrograms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Baby Cry Recognition Using Deep Neural Networks

Constant-Q Based Harmonic and Pitch Features for Normal vs. Pathological Infant Cry Classification

A review of infant cry analysis and classification

Article Open access 05 February 2021

References

Lavner, Y., Cohen, R., Ruinskiy, D., Ijzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE) (2017). https://doi.org/10.1109/ICSEE.2016.7806117
Liu, L., Li, Y., Kuo, K.: Infant cry signal detection, pattern extraction and recognition. In: 2018 International Conference on Information and Computer Technologies (ICICT), pp. 159–163. IEEE (2018 ). https://doi.org/10.1109/INFOCT.2018.8356861
Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015). https://doi.org/10.1016/j.bspc.2014.10.002
Article Google Scholar
Franti, E., Ispas, I., Dascalu, M.: Testing the universal baby language hypothesis-automatic infant speech recognition with cnns. In: 2018 41st International Conference on Telecommunications and Signal Processing (TSP), pp. 1–4. IEEE (2018). https://doi.org/10.1109/TSP.2018.8441412
Sachin, M.U., Nagaraj, R., Samiksha, M., Rao, S., Moharir, M.: GPU based deep learning to detect asphyxia in neonates. Indian J. Sci. Technol. 10(3) (2017). https://doi.org/10.17485/ijst/2017/v10i3/110617
Lim, H., Park, J., Lee, K., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. In: Dcase 2017 Proceeding, pp. 2–6 (2017)
Google Scholar
Gong, Y., Chung, Y.A., Glass, J.: AST: Audio Spectrogram Transformer (2021). https://doi.org/10.21437/interspeech.2021-698
Nanni, L., Rigo, A., Lumini, A., Brahnam, S.: Spectrogram classification using dissimilarity space. Appl. Sci. 10(12), 1–17 (2020). https://doi.org/10.3390/APP10124176
Article Google Scholar
Palanisamy, K., Singhania, D., Yao, A.: Rethinking CNN Models for Audio Classification (2020). http://arxiv.org/abs/2007.11154
Chang, C.-Y., Tsai, L.-Y.: A CNN-based method for infant cry detection and recognition. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) WAINA 2019. AISC, vol. 927, pp. 786–792. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15035-8_76
Chapter Google Scholar
Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence-to-sequence voice conversion (2020). http://arxiv.org/abs/2001.01401
Juvela, L., Bollepalli, B., Yamagishi, J., Alku, P.: Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram. In: Proceedings Annual Conference International Speech Communication Association. Interspeech, pp. 694–698 (2019). https://doi.org/10.21437/Interspeech.2019-2008
Phaye, S.S.R., Benetos, E., Wang, Y.: Subspectralnet – Using Sub-Spectrogram Based Convolutional Neural Networks for Acoustic Scene ClassificatioN School of Computing , National University of Singapore, Singapore School of EECS , Queen Mary University of London , UK 3 The Alan Turing Institu, pp. 825–829 (2019)
Google Scholar
Zhang, T., Feng, G., Liang, J., An, T.: Acoustic scene classification based on Mel spectrogram decomposition and model merging. Appl. Acoust. 182, 108258 (2021). https://doi.org/10.1016/j.apacoust.2021.108258
Article Google Scholar
Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional Feature Fusion, pp. 3559–3568 (2021). https://doi.org/10.1109/wacv48630.2021.00360
Chen Y, et al.: Research of improving semantic image segmentation based on a feature fusion model. J. Ambient Intell. Hum. Comput. 9, 1–3 (2020). https://doi.org/10.1007/s12652-020-02066-z
Xu, X., et al.: AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement (2021). http://arxiv.org/abs/2101.06268
McLoughlin, I., Xie, Z., Song, Y., Phan, H., Palaniappan, R.: Time–frequency feature fusion for noise robust audio event classification. Circuits Syst. Signal Process. 39(3), 1672–1687 (2019). https://doi.org/10.1007/s00034-019-01203-0
Article Google Scholar
Ji, C., Xiao, X., Basodi, S., Pan, Y.: Deep learning for asphyxiated infant cry classification based on acoustic features and weighted prosodic features. In: Proceedings 2019 IEEE International Congress Cybermatics 12th IEEE International Conference Internet Things, 15th IEEE International Conference Green Computing Communication 12th IEEE International Confernce Cybermatics Phys. So (2019). https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00206
Chang, C.M., Chen, H.Y., Chen, H.C., Lee, C.C.: Sensing with contexts: crying reason classification for infant care center with environmental fusion. In: 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 314–318 (2020)
Google Scholar
Sox Homepage. https://en.wikipedia.org/wiki/SoX
McFee, B., et al.: librosa: Audio and Music Signal Analysis in Python (2015). https://doi.org/10.25080/majora-7b98e3ed-003
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330-335. IEEE (2008). https://doi.org/10.1109/MICAI.2008.73
Ma, R., Wang, Y., Wei, Y., Pan, Y.: Meta-data Study in Autism Spectrum Disorder Classification Based on Structural MRI (2022). arXiv preprint arXiv:2206.05052

Download references

Acknowledgement

This work was supported by BNU-HKBU United International College Start-up Research Fund (UICR0700051–23) and the Shenzhen KQTD Project (No. KQTD20200820113106007). We acknowledge Molecular Basis of Disease (MBD) at Georgia State University for the support. We thank Dr. Carlos A. Reyes-Garcia, Dr. Emilio Arch-Tirado and his INR-Mexico group, and Dr. Edgar M. Garcia-Tamayo for collecting the Infant Cry database. We also express our great gratitude to Dr. Orion Reyes and Dr. Carlos A. Reyes for providing the access to the Baby Chillanto database. We thank all parents, doctors, and nurses who support the recording of Baby2020 database.

Author information

Authors and Affiliations

Computer Science Department, BNU-HKBU United International College, Zhuhai, China
Chunyan Ji
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yang Jiao & Yi Pan
College of Information Science and Engineering, Hunan Normal University, Changsha, China
Ming Chen

Authors

Chunyan Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yang Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ming Chen or Yi Pan .

Editor information

Editors and Affiliations

Minzu University of China, Beijing, China
Xiuqin Pan
Hainan University, Haikou, China
Ting Jin
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, C., Jiao, Y., Chen, M., Pan, Y. (2022). Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs. In: Pan, X., Jin, T., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2022. AIMS 2022. Lecture Notes in Computer Science, vol 13729. Springer, Cham. https://doi.org/10.1007/978-3-031-23504-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-23504-7_10
Published: 16 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23503-0
Online ISBN: 978-3-031-23504-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics