Skip to main content

Feature Set Optimisation for Infant Cry Classification

  • Conference paper
  • First Online:
Advances in Artificial Intelligence - IBERAMIA 2018 (IBERAMIA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11238))

Included in the following conference series:

Abstract

This work deals with the development of features for the automatic classification of infant cry, considering three categories: neutral, fussing and crying vocalisations. Mel-frequency cepstral coefficients, together with standard functional obtained from these, have long been the most widely used features for all kind of speech-related tasks, including infant cry classification. However, recent works have introduced alternative filter banks leading to performance improvements and increased robustness. In this work, the optimisation of a filter bank is proposed for feature extraction and two other spectrum-based feature sets are compared. The first set of features is obtained through the optimisation of filter banks, by means of an evolutionary algorithm, in order to find a more suitable speech representation for the infant cry classification. Moreover, the classification performance of the optimised representation combined with other spectral features based on the mean log-spectrum and auditory spectrum is evaluated. The results show that these feature sets are able to improve the performance for the cry classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Neural Systems Lab., Institutes for Systems Research, UMCP. http://www.isr.umd.edu/Labs/NSL/.

References

  1. Abou-Abbas, L., Tadj, C., Fersaie, H.A.: A fully automated approach for baby cry signal segmentation and boundary detection of expiratory and inspiratory episodes. J. Acoust. Soc. Am. 142(3), 1318–1331 (2017). https://doi.org/10.1121/1.5001491

    Article  Google Scholar 

  2. Aggarwal, R.K., Dave, M.: Filterbank optimization for robust ASR using GA and PSO. Int. J. Speech Technol. 15(2), 191–201 (2012). https://doi.org/10.1007/s10772-012-9133-9

    Article  Google Scholar 

  3. Ahmad, K.S., Thosar, A.S., Nirmal, J.H., Pande, V.S.: A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6, January 2015. https://doi.org/10.1109/ICAPR.2015.7050669

  4. Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011). https://doi.org/10.1016/j.csl.2010.10.001

    Article  Google Scholar 

  5. Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Feature extraction based on bio-inspired model for robust emotion recognition. Soft Comput. 21(17), 5145–5158 (2017). https://doi.org/10.1007/s00500-016-2110-5

    Article  Google Scholar 

  6. Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015). https://doi.org/10.1007/s10462-012-9368-5

    Article  Google Scholar 

  7. Arora, V., Sood, P., Keshari, K.U.: A stacked sparse autoencoder based architecture for Punjabi and English spoken language classification using MFCC features. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 269–272, March 2016

    Google Scholar 

  8. Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13

    Chapter  Google Scholar 

  9. Davis, S.V., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 57–366 (1980)

    Article  Google Scholar 

  10. Drummond, J.E., McBride, M.L., Wiebe, C.F.: The development of mothers’ understanding of infant crying. Clin. Nurs. Res. 2(4), 396–410 (1993). https://doi.org/10.1177/105477389300200403. pMID: 8220195

    Article  Google Scholar 

  11. Eyben, F.: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer theses. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-27299-3. https://books.google.com.ar/books?id=AFBECwAAQBAJ

    Book  MATH  Google Scholar 

  12. Garcia, J.O., Garcia, C.A.R.: Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. In: Proceedings of the International Joint Conference on Neural Networks, vol. 4, pp. 3140–3145, July 2003. https://doi.org/10.1109/IJCNN.2003.1224074

  13. Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, pp. 125–128 (2001). https://doi.org/10.1109/ICASSP.2001.940783

  14. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 985–990, July 2004. https://doi.org/10.1109/IJCNN.2004.1380068

  15. Hung, J.: Optimization of filter-bank to improve the extraction of MFCC features in speech recognition. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 675–678, October 2004

    Google Scholar 

  16. Lee, S., Fang, S., Hung, J., Lee, L.: Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU 2001, pp. 49–52 (2001). https://doi.org/10.1109/ASRU.2001.1034586

  17. Likitha, M.S., Gupta, S.R.R., Hasitha, K., Raju, A.U.: Speech based human emotion recognition using MFCC. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2257–2260, March 2017. https://doi.org/10.1109/WiSPNET.2017.8300161

  18. Marschik, P.B., et al.: A novel way to measure and predict development: a heuristic approach to facilitate the early detection of neurodevelopmental disorders. Curr. Neurol. Neurosci. Rep. 17(5), 43 (2017)

    Article  Google Scholar 

  19. Oliveira, A.L., Braga, P.L., Lima, R.M., Cornélio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010). https://doi.org/10.1016/j.infsof.2010.05.009

    Article  Google Scholar 

  20. Paul, S., Das, S.: Simultaneous feature selection and weighting - an evolutionary multi-objective optimization approach. Pattern Recognit. Lett. 65, 51–59 (2015). https://doi.org/10.1016/j.patrec.2015.07.007

    Article  Google Scholar 

  21. Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A system for the processing of infant cry to recognize pathologies in recently born babies with neural networks. In: 9th Conference Speech and Computer, SPECOM-2004 (2004)

    Google Scholar 

  22. Rosenberg, A.: Classifying skewed data: importance weighting to optimize average recall. In: INTERSPEECH 2012, Portland, USA (2012)

    Google Scholar 

  23. Schuller, B., Steidl, S., Batliner, A., Schiel, F., Krajewski, J.: The interspeech 2011 speaker state challenge. In: Proceedings of the Interspeech, ISCA, pp. 3201–3204, March 2011

    Google Scholar 

  24. Schuller, B., Steidl, S., Batliner, A., Baumeister, et al.: The interspeech 2018 computational paralinguistics challenge: atypical & self-assessed affect, crying & heart beats. In: Computational Paralinguistics Challenge, Interspeech 2018 (2018)

    Google Scholar 

  25. Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204, March 2016. https://doi.org/10.1109/ICASSP.2016.7472669

  26. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560

    Article  Google Scholar 

  27. Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, Y.V.: Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 786–789, March 2017. https://doi.org/10.1109/WiSPNET.2017.8299868

  28. Veer, K., Sharma, T.: A novel feature extraction for robust EMG pattern recognition. J. Med. Eng. Technol. 40(4), 149–154 (2016). https://doi.org/10.3109/03091902.2016.1153739

    Article  Google Scholar 

  29. Vignolo, L.D., Milone, D.H., Rufiner, H.L.: Genetic wavelet packets for speech recognition. Expert Syst. Appl. 40(6), 2350–2359 (2013). https://doi.org/10.1016/j.eswa.2012.10.050

    Article  Google Scholar 

  30. Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013). https://doi.org/10.1016/j.eswa.2013.03.032

    Article  Google Scholar 

  31. Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011). https://doi.org/10.1016/j.asoc.2011.01.012

    Article  Google Scholar 

  32. Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Proc. 2011, 8:1–8:14 (2011)

    Article  Google Scholar 

  33. Vozáriková, E., Juhár, J., Čižmár, A.: Acoustic events detection using MFCC and MPEG-7 descriptors. In: Dziech, A., Czyżewski, A. (eds.) Multimedia Communications, Services and Security, pp. 191–197. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21512-4_23

    Chapter  Google Scholar 

  34. Wu, Z., Cao, Z.: Improved MFCC-based feature for robust speaker identification. Tsinghua Sci. Technol. 10(2), 158–161 (2005)

    Article  Google Scholar 

  35. Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. Inf. Theory 38(2), 824–839 (1992)

    Article  Google Scholar 

  36. Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and AMS-GMM mask for acoustic emotion classification. Signal Process. Lett. 21(5), 620–624 (2014). https://doi.org/10.1109/LSP.2014.2311435

    Article  Google Scholar 

  37. Zabidi, A., Mansor, W., Khuan, L.Y., Sahak, R., Rahman, F.Y.A.: Mel-frequency cepstrum coefficient analysis of infant cry with hypothyroidism. In: 2009 5th International Colloquium on Signal Processing its Applications, pp. 204–208, March 2009. https://doi.org/10.1109/CSPA.2009.5069217

Download references

Acknowledgements

The authors wish to thank the support of the Agencia Nacional de Promoción Científica y Tecnológica (with PICT 2015-0977), the Universidad Nacional de Litoral (with CAI+D 50020150100055LI, CAI+D 50020150100059LI, CAI+D 50020150100042LI), and the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) from Argentina.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro D. Vignolo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vignolo, L.D., Albornoz, E.M., Martínez, C.E. (2018). Feature Set Optimisation for Infant Cry Classification. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03928-8_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03927-1

  • Online ISBN: 978-3-030-03928-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics