Feature Set Optimisation for Infant Cry Classification

Vignolo, Leandro D.; Albornoz, Enrique Marcelo; Martínez, César Ernesto

doi:10.1007/978-3-030-03928-8_37

Leandro D. Vignolo^17,18,
Enrique Marcelo Albornoz^17,18 &
César Ernesto Martínez^17,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11238))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1271 Accesses
1 Citations

Abstract

This work deals with the development of features for the automatic classification of infant cry, considering three categories: neutral, fussing and crying vocalisations. Mel-frequency cepstral coefficients, together with standard functional obtained from these, have long been the most widely used features for all kind of speech-related tasks, including infant cry classification. However, recent works have introduced alternative filter banks leading to performance improvements and increased robustness. In this work, the optimisation of a filter bank is proposed for feature extraction and two other spectrum-based feature sets are compared. The first set of features is obtained through the optimisation of filter banks, by means of an evolutionary algorithm, in order to find a more suitable speech representation for the infant cry classification. Moreover, the classification performance of the optimised representation combined with other spectral features based on the mean log-spectrum and auditory spectrum is evaluated. The results show that these feature sets are able to improve the performance for the cry classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Neural Systems Lab., Institutes for Systems Research, UMCP. http://www.isr.umd.edu/Labs/NSL/.

References

Abou-Abbas, L., Tadj, C., Fersaie, H.A.: A fully automated approach for baby cry signal segmentation and boundary detection of expiratory and inspiratory episodes. J. Acoust. Soc. Am. 142(3), 1318–1331 (2017). https://doi.org/10.1121/1.5001491
Article Google Scholar
Aggarwal, R.K., Dave, M.: Filterbank optimization for robust ASR using GA and PSO. Int. J. Speech Technol. 15(2), 191–201 (2012). https://doi.org/10.1007/s10772-012-9133-9
Article Google Scholar
Ahmad, K.S., Thosar, A.S., Nirmal, J.H., Pande, V.S.: A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6, January 2015. https://doi.org/10.1109/ICAPR.2015.7050669
Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011). https://doi.org/10.1016/j.csl.2010.10.001
Article Google Scholar
Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Feature extraction based on bio-inspired model for robust emotion recognition. Soft Comput. 21(17), 5145–5158 (2017). https://doi.org/10.1007/s00500-016-2110-5
Article Google Scholar
Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015). https://doi.org/10.1007/s10462-012-9368-5
Article Google Scholar
Arora, V., Sood, P., Keshari, K.U.: A stacked sparse autoencoder based architecture for Punjabi and English spoken language classification using MFCC features. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 269–272, March 2016
Google Scholar
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Chapter Google Scholar
Davis, S.V., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 57–366 (1980)
Article Google Scholar
Drummond, J.E., McBride, M.L., Wiebe, C.F.: The development of mothers’ understanding of infant crying. Clin. Nurs. Res. 2(4), 396–410 (1993). https://doi.org/10.1177/105477389300200403. pMID: 8220195
Article Google Scholar
Eyben, F.: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer theses. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-27299-3. https://books.google.com.ar/books?id=AFBECwAAQBAJ
Book MATH Google Scholar
Garcia, J.O., Garcia, C.A.R.: Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. In: Proceedings of the International Joint Conference on Neural Networks, vol. 4, pp. 3140–3145, July 2003. https://doi.org/10.1109/IJCNN.2003.1224074
Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, pp. 125–128 (2001). https://doi.org/10.1109/ICASSP.2001.940783
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 985–990, July 2004. https://doi.org/10.1109/IJCNN.2004.1380068
Hung, J.: Optimization of filter-bank to improve the extraction of MFCC features in speech recognition. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 675–678, October 2004
Google Scholar
Lee, S., Fang, S., Hung, J., Lee, L.: Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU 2001, pp. 49–52 (2001). https://doi.org/10.1109/ASRU.2001.1034586
Likitha, M.S., Gupta, S.R.R., Hasitha, K., Raju, A.U.: Speech based human emotion recognition using MFCC. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2257–2260, March 2017. https://doi.org/10.1109/WiSPNET.2017.8300161
Marschik, P.B., et al.: A novel way to measure and predict development: a heuristic approach to facilitate the early detection of neurodevelopmental disorders. Curr. Neurol. Neurosci. Rep. 17(5), 43 (2017)
Article Google Scholar
Oliveira, A.L., Braga, P.L., Lima, R.M., Cornélio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010). https://doi.org/10.1016/j.infsof.2010.05.009
Article Google Scholar
Paul, S., Das, S.: Simultaneous feature selection and weighting - an evolutionary multi-objective optimization approach. Pattern Recognit. Lett. 65, 51–59 (2015). https://doi.org/10.1016/j.patrec.2015.07.007
Article Google Scholar
Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A system for the processing of infant cry to recognize pathologies in recently born babies with neural networks. In: 9th Conference Speech and Computer, SPECOM-2004 (2004)
Google Scholar
Rosenberg, A.: Classifying skewed data: importance weighting to optimize average recall. In: INTERSPEECH 2012, Portland, USA (2012)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Schiel, F., Krajewski, J.: The interspeech 2011 speaker state challenge. In: Proceedings of the Interspeech, ISCA, pp. 3201–3204, March 2011
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Baumeister, et al.: The interspeech 2018 computational paralinguistics challenge: atypical & self-assessed affect, crying & heart beats. In: Computational Paralinguistics Challenge, Interspeech 2018 (2018)
Google Scholar
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204, March 2016. https://doi.org/10.1109/ICASSP.2016.7472669
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560
Article Google Scholar
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, Y.V.: Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 786–789, March 2017. https://doi.org/10.1109/WiSPNET.2017.8299868
Veer, K., Sharma, T.: A novel feature extraction for robust EMG pattern recognition. J. Med. Eng. Technol. 40(4), 149–154 (2016). https://doi.org/10.3109/03091902.2016.1153739
Article Google Scholar
Vignolo, L.D., Milone, D.H., Rufiner, H.L.: Genetic wavelet packets for speech recognition. Expert Syst. Appl. 40(6), 2350–2359 (2013). https://doi.org/10.1016/j.eswa.2012.10.050
Article Google Scholar
Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013). https://doi.org/10.1016/j.eswa.2013.03.032
Article Google Scholar
Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011). https://doi.org/10.1016/j.asoc.2011.01.012
Article Google Scholar
Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Proc. 2011, 8:1–8:14 (2011)
Article Google Scholar
Vozáriková, E., Juhár, J., Čižmár, A.: Acoustic events detection using MFCC and MPEG-7 descriptors. In: Dziech, A., Czyżewski, A. (eds.) Multimedia Communications, Services and Security, pp. 191–197. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21512-4_23
Chapter Google Scholar
Wu, Z., Cao, Z.: Improved MFCC-based feature for robust speaker identification. Tsinghua Sci. Technol. 10(2), 158–161 (2005)
Article Google Scholar
Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. Inf. Theory 38(2), 824–839 (1992)
Article Google Scholar
Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and AMS-GMM mask for acoustic emotion classification. Signal Process. Lett. 21(5), 620–624 (2014). https://doi.org/10.1109/LSP.2014.2311435
Article Google Scholar
Zabidi, A., Mansor, W., Khuan, L.Y., Sahak, R., Rahman, F.Y.A.: Mel-frequency cepstrum coefficient analysis of infant cry with hypothyroidism. In: 2009 5th International Colloquium on Signal Processing its Applications, pp. 204–208, March 2009. https://doi.org/10.1109/CSPA.2009.5069217

Download references

Acknowledgements

The authors wish to thank the support of the Agencia Nacional de Promoción Científica y Tecnológica (with PICT 2015-0977), the Universidad Nacional de Litoral (with CAI+D 50020150100055LI, CAI+D 50020150100059LI, CAI+D 50020150100042LI), and the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) from Argentina.

Author information

Authors and Affiliations

Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), Facultad de Ingeniería y Cs. Hídricas, Universidad Nacional del Litoral CC217, Ciudad Universitaria, Paraje El Pozo, S3000, Santa Fe, Argentina
Leandro D. Vignolo, Enrique Marcelo Albornoz & César Ernesto Martínez
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
Leandro D. Vignolo & Enrique Marcelo Albornoz
Laboratorio de Cibernética, Facultad de Ingeniería, Universidad Nacional de Entre Ríos, Entre Ríos, Argentina
César Ernesto Martínez

Authors

Leandro D. Vignolo
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Marcelo Albornoz
View author publications
You can also search for this author in PubMed Google Scholar
César Ernesto Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leandro D. Vignolo .

Editor information

Editors and Affiliations

Universidad Nacional del Sur, Bahía Blanca, Buenos Aires, Argentina
Guillermo R. Simari
University of Madeira, Funchal, Portugal
Eduardo Fermé
Universidad Nacional de Piura, Castilla-Piura, Peru
Flabio Gutiérrez Segura
Universidad Nacional de Trujillo, Trujillo, Peru
José Antonio Rodríguez Melquiades

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vignolo, L.D., Albornoz, E.M., Martínez, C.E. (2018). Feature Set Optimisation for Infant Cry Classification. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-03928-8_37
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics