Abstract
We propose a novel method of using feature fusion and model fusion to improve infant cry classification performance. Spectrogram features extracted from transfer learning convolutional neural network model and mel-spectrogram features extracted from mel-spectrogram decomposition model are fused and fed into a multiple layer perception for better classification accuracy. The mel-spectrogram decomposition method feeds band-wise crops of the mel-spectrograms into multiple CNNs followed by a merged global classifier to capture more enhanced discriminative features. Feature fusion brings higher dimensional detailed information and characteristics more in line with human hearing perception together to achieve better performance on CNNs. The evaluation of the approach is conducted on Baby Chillanto database and Baby2020 database. Our approach yields a significant reduction of 4.72% absolute classification error rate compared with the result using single mel-spectrogram images with CNN model on Baby Chillanto database and our testing accuracy reaches 99.26%, which outperforms all other methods with this five-category classification task. The gender classification experiment on Baby2020 database also shows 3.87% accuracy improvement compared with the CNN model using single spectrograms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lavner, Y., Cohen, R., Ruinskiy, D., Ijzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE) (2017). https://doi.org/10.1109/ICSEE.2016.7806117
Liu, L., Li, Y., Kuo, K.: Infant cry signal detection, pattern extraction and recognition. In: 2018 International Conference on Information and Computer Technologies (ICICT), pp. 159–163. IEEE (2018 ). https://doi.org/10.1109/INFOCT.2018.8356861
Rosales-Pérez, A., Reyes-García, C.A., Gonzalez, J.A., Reyes-Galaviz, O.F., Escalante, H.J., Orlandi, S.: Classifying infant cry patterns by the genetic selection of a fuzzy model. Biomed. Signal Process. Control 17, 38–46 (2015). https://doi.org/10.1016/j.bspc.2014.10.002
Franti, E., Ispas, I., Dascalu, M.: Testing the universal baby language hypothesis-automatic infant speech recognition with cnns. In: 2018 41st International Conference on Telecommunications and Signal Processing (TSP), pp. 1–4. IEEE (2018). https://doi.org/10.1109/TSP.2018.8441412
Sachin, M.U., Nagaraj, R., Samiksha, M., Rao, S., Moharir, M.: GPU based deep learning to detect asphyxia in neonates. Indian J. Sci. Technol. 10(3) (2017). https://doi.org/10.17485/ijst/2017/v10i3/110617
Lim, H., Park, J., Lee, K., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. In: Dcase 2017 Proceeding, pp. 2–6 (2017)
Gong, Y., Chung, Y.A., Glass, J.: AST: Audio Spectrogram Transformer (2021). https://doi.org/10.21437/interspeech.2021-698
Nanni, L., Rigo, A., Lumini, A., Brahnam, S.: Spectrogram classification using dissimilarity space. Appl. Sci. 10(12), 1–17 (2020). https://doi.org/10.3390/APP10124176
Palanisamy, K., Singhania, D., Yao, A.: Rethinking CNN Models for Audio Classification (2020). http://arxiv.org/abs/2007.11154
Chang, C.-Y., Tsai, L.-Y.: A CNN-based method for infant cry detection and recognition. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) WAINA 2019. AISC, vol. 927, pp. 786–792. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15035-8_76
Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence-to-sequence voice conversion (2020). http://arxiv.org/abs/2001.01401
Juvela, L., Bollepalli, B., Yamagishi, J., Alku, P.: Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram. In: Proceedings Annual Conference International Speech Communication Association. Interspeech, pp. 694–698 (2019). https://doi.org/10.21437/Interspeech.2019-2008
Phaye, S.S.R., Benetos, E., Wang, Y.: Subspectralnet – Using Sub-Spectrogram Based Convolutional Neural Networks for Acoustic Scene ClassificatioN School of Computing , National University of Singapore, Singapore School of EECS , Queen Mary University of London , UK 3 The Alan Turing Institu, pp. 825–829 (2019)
Zhang, T., Feng, G., Liang, J., An, T.: Acoustic scene classification based on Mel spectrogram decomposition and model merging. Appl. Acoust. 182, 108258 (2021). https://doi.org/10.1016/j.apacoust.2021.108258
Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional Feature Fusion, pp. 3559–3568 (2021). https://doi.org/10.1109/wacv48630.2021.00360
Chen Y, et al.: Research of improving semantic image segmentation based on a feature fusion model. J. Ambient Intell. Hum. Comput. 9, 1–3 (2020). https://doi.org/10.1007/s12652-020-02066-z
Xu, X., et al.: AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement (2021). http://arxiv.org/abs/2101.06268
McLoughlin, I., Xie, Z., Song, Y., Phan, H., Palaniappan, R.: Time–frequency feature fusion for noise robust audio event classification. Circuits Syst. Signal Process. 39(3), 1672–1687 (2019). https://doi.org/10.1007/s00034-019-01203-0
Ji, C., Xiao, X., Basodi, S., Pan, Y.: Deep learning for asphyxiated infant cry classification based on acoustic features and weighted prosodic features. In: Proceedings 2019 IEEE International Congress Cybermatics 12th IEEE International Conference Internet Things, 15th IEEE International Conference Green Computing Communication 12th IEEE International Confernce Cybermatics Phys. So (2019). https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00206
Chang, C.M., Chen, H.Y., Chen, H.C., Lee, C.C.: Sensing with contexts: crying reason classification for infant care center with environmental fusion. In: 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 314–318 (2020)
Sox Homepage. https://en.wikipedia.org/wiki/SoX
McFee, B., et al.: librosa: Audio and Music Signal Analysis in Python (2015). https://doi.org/10.25080/majora-7b98e3ed-003
Reyes-Galaviz, O.F., Cano-Ortiz, S.D., Reyes-García, C.A.: Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies. In: 2008 Seventh Mexican International Conference on Artificial Intelligence, pp. 330-335. IEEE (2008). https://doi.org/10.1109/MICAI.2008.73
Ma, R., Wang, Y., Wei, Y., Pan, Y.: Meta-data Study in Autism Spectrum Disorder Classification Based on Structural MRI (2022). arXiv preprint arXiv:2206.05052
Acknowledgement
This work was supported by BNU-HKBU United International College Start-up Research Fund (UICR0700051–23) and the Shenzhen KQTD Project (No. KQTD20200820113106007). We acknowledge Molecular Basis of Disease (MBD) at Georgia State University for the support. We thank Dr. Carlos A. Reyes-Garcia, Dr. Emilio Arch-Tirado and his INR-Mexico group, and Dr. Edgar M. Garcia-Tamayo for collecting the Infant Cry database. We also express our great gratitude to Dr. Orion Reyes and Dr. Carlos A. Reyes for providing the access to the Baby Chillanto database. We thank all parents, doctors, and nurses who support the recording of Baby2020 database.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ji, C., Jiao, Y., Chen, M., Pan, Y. (2022). Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs. In: Pan, X., Jin, T., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2022. AIMS 2022. Lecture Notes in Computer Science, vol 13729. Springer, Cham. https://doi.org/10.1007/978-3-031-23504-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-23504-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23503-0
Online ISBN: 978-3-031-23504-7
eBook Packages: Computer ScienceComputer Science (R0)