Skip to main content

Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs

  • Conference paper
  • First Online:
Artificial Intelligence and Mobile Services – AIMS 2022 (AIMS 2022)

Abstract

We propose a novel method of using feature fusion and model fusion to improve infant cry classification performance. Spectrogram features extracted from transfer learning convolutional neural network model and mel-spectrogram features extracted from mel-spectrogram decomposition model are fused and fed into a multiple layer perception for better classification accuracy. The mel-spectrogram decomposition method feeds band-wise crops of the mel-spectrograms into multiple CNNs followed by a merged global classifier to capture more enhanced discriminative features. Feature fusion brings higher dimensional detailed information and characteristics more in line with human hearing perception together to achieve better performance on CNNs. The evaluation of the approach is conducted on Baby Chillanto database and Baby2020 database. Our approach yields a significant reduction of 4.72% absolute classification error rate compared with the result using single mel-spectrogram images with CNN model on Baby Chillanto database and our testing accuracy reaches 99.26%, which outperforms all other methods with this five-category classification task. The gender classification experiment on Baby2020 database also shows 3.87% accuracy improvement compared with the CNN model using single spectrograms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lavner, Y., Cohen, R., Ruinskiy, D., Ijzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE) (2017). https://doi.org/10.1109/ICSEE.2016.7806117

  2. Liu, L., Li, Y., Kuo, K.: Infant cry signal detection, pattern extraction and recognition. In: 2018 International Conference on Information and Computer Technologies (ICICT), pp. 159–163. IEEE (2018 ). https://doi.org/10.1109/INFOCT.2018.8356861

  3. Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015). https://doi.org/10.1016/j.bspc.2014.10.002

    Article  Google Scholar 

  4. Franti, E., Ispas, I., Dascalu, M.: Testing the universal baby language hypothesis-automatic infant speech recognition with cnns. In: 2018 41st International Conference on Telecommunications and Signal Processing (TSP), pp. 1–4. IEEE (2018). https://doi.org/10.1109/TSP.2018.8441412

  5. Sachin, M.U., Nagaraj, R., Samiksha, M., Rao, S., Moharir, M.: GPU based deep learning to detect asphyxia in neonates. Indian J. Sci. Technol. 10(3) (2017). https://doi.org/10.17485/ijst/2017/v10i3/110617

  6. Lim, H., Park, J., Lee, K., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. In: Dcase 2017 Proceeding, pp. 2–6 (2017)

    Google Scholar 

  7. Gong, Y., Chung, Y.A., Glass, J.: AST: Audio Spectrogram Transformer (2021). https://doi.org/10.21437/interspeech.2021-698

  8. Nanni, L., Rigo, A., Lumini, A., Brahnam, S.: Spectrogram classification using dissimilarity space. Appl. Sci. 10(12), 1–17 (2020). https://doi.org/10.3390/APP10124176

    Article  Google Scholar 

  9. Palanisamy, K., Singhania, D., Yao, A.: Rethinking CNN Models for Audio Classification (2020). http://arxiv.org/abs/2007.11154

  10. Chang, C.-Y., Tsai, L.-Y.: A CNN-based method for infant cry detection and recognition. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) WAINA 2019. AISC, vol. 927, pp. 786–792. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15035-8_76

    Chapter  Google Scholar 

  11. Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence-to-sequence voice conversion (2020). http://arxiv.org/abs/2001.01401

  12. Juvela, L., Bollepalli, B., Yamagishi, J., Alku, P.: Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram. In: Proceedings Annual Conference International Speech Communication Association. Interspeech, pp. 694–698 (2019). https://doi.org/10.21437/Interspeech.2019-2008

  13. Phaye, S.S.R., Benetos, E., Wang, Y.: Subspectralnet – Using Sub-Spectrogram Based Convolutional Neural Networks for Acoustic Scene ClassificatioN School of Computing , National University of Singapore, Singapore School of EECS , Queen Mary University of London , UK 3 The Alan Turing Institu, pp. 825–829 (2019)

    Google Scholar 

  14. Zhang, T., Feng, G., Liang, J., An, T.: Acoustic scene classification based on Mel spectrogram decomposition and model merging. Appl. Acoust. 182, 108258 (2021). https://doi.org/10.1016/j.apacoust.2021.108258

    Article  Google Scholar 

  15. Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional Feature Fusion, pp. 3559–3568 (2021). https://doi.org/10.1109/wacv48630.2021.00360

  16. Chen Y, et al.: Research of improving semantic image segmentation based on a feature fusion model. J. Ambient Intell. Hum. Comput. 9, 1–3 (2020). https://doi.org/10.1007/s12652-020-02066-z

  17. Xu, X., et al.: AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement (2021). http://arxiv.org/abs/2101.06268

  18. McLoughlin, I., Xie, Z., Song, Y., Phan, H., Palaniappan, R.: Time–frequency feature fusion for noise robust audio event classification. Circuits Syst. Signal Process. 39(3), 1672–1687 (2019). https://doi.org/10.1007/s00034-019-01203-0

    Article  Google Scholar 

  19. Ji, C., Xiao, X., Basodi, S., Pan, Y.: Deep learning for asphyxiated infant cry classification based on acoustic features and weighted prosodic features. In: Proceedings 2019 IEEE International Congress Cybermatics 12th IEEE International Conference Internet Things, 15th IEEE International Conference Green Computing Communication 12th IEEE International Confernce Cybermatics Phys. So (2019). https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00206

  20. Chang, C.M., Chen, H.Y., Chen, H.C., Lee, C.C.: Sensing with contexts: crying reason classification for infant care center with environmental fusion. In: 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 314–318 (2020)

    Google Scholar 

  21. Sox Homepage. https://en.wikipedia.org/wiki/SoX

  22. McFee, B., et al.: librosa: Audio and Music Signal Analysis in Python (2015). https://doi.org/10.25080/majora-7b98e3ed-003

  23. Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330-335. IEEE (2008). https://doi.org/10.1109/MICAI.2008.73

  24. Ma, R., Wang, Y., Wei, Y., Pan, Y.: Meta-data Study in Autism Spectrum Disorder Classification Based on Structural MRI (2022). arXiv preprint arXiv:2206.05052

Download references

Acknowledgement

This work was supported by BNU-HKBU United International College Start-up Research Fund (UICR0700051–23) and the Shenzhen KQTD Project (No. KQTD20200820113106007). We acknowledge Molecular Basis of Disease (MBD) at Georgia State University for the support. We thank Dr. Carlos A. Reyes-Garcia, Dr. Emilio Arch-Tirado and his INR-Mexico group, and Dr. Edgar M. Garcia-Tamayo for collecting the Infant Cry database. We also express our great gratitude to Dr. Orion Reyes and Dr. Carlos A. Reyes for providing the access to the Baby Chillanto database. We thank all parents, doctors, and nurses who support the recording of Baby2020 database.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ming Chen or Yi Pan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ji, C., Jiao, Y., Chen, M., Pan, Y. (2022). Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs. In: Pan, X., Jin, T., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2022. AIMS 2022. Lecture Notes in Computer Science, vol 13729. Springer, Cham. https://doi.org/10.1007/978-3-031-23504-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23504-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23503-0

  • Online ISBN: 978-3-031-23504-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics