Abstract
This work deals with the development of features for the automatic classification of infant cry, considering three categories: neutral, fussing and crying vocalisations. Mel-frequency cepstral coefficients, together with standard functional obtained from these, have long been the most widely used features for all kind of speech-related tasks, including infant cry classification. However, recent works have introduced alternative filter banks leading to performance improvements and increased robustness. In this work, the optimisation of a filter bank is proposed for feature extraction and two other spectrum-based feature sets are compared. The first set of features is obtained through the optimisation of filter banks, by means of an evolutionary algorithm, in order to find a more suitable speech representation for the infant cry classification. Moreover, the classification performance of the optimised representation combined with other spectral features based on the mean log-spectrum and auditory spectrum is evaluated. The results show that these feature sets are able to improve the performance for the cry classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Neural Systems Lab., Institutes for Systems Research, UMCP. http://www.isr.umd.edu/Labs/NSL/.
References
Abou-Abbas, L., Tadj, C., Fersaie, H.A.: A fully automated approach for baby cry signal segmentation and boundary detection of expiratory and inspiratory episodes. J. Acoust. Soc. Am. 142(3), 1318–1331 (2017). https://doi.org/10.1121/1.5001491
Aggarwal, R.K., Dave, M.: Filterbank optimization for robust ASR using GA and PSO. Int. J. Speech Technol. 15(2), 191–201 (2012). https://doi.org/10.1007/s10772-012-9133-9
Ahmad, K.S., Thosar, A.S., Nirmal, J.H., Pande, V.S.: A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6, January 2015. https://doi.org/10.1109/ICAPR.2015.7050669
Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011). https://doi.org/10.1016/j.csl.2010.10.001
Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Feature extraction based on bio-inspired model for robust emotion recognition. Soft Comput. 21(17), 5145–5158 (2017). https://doi.org/10.1007/s00500-016-2110-5
Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015). https://doi.org/10.1007/s10462-012-9368-5
Arora, V., Sood, P., Keshari, K.U.: A stacked sparse autoencoder based architecture for Punjabi and English spoken language classification using MFCC features. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 269–272, March 2016
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Davis, S.V., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 57–366 (1980)
Drummond, J.E., McBride, M.L., Wiebe, C.F.: The development of mothers’ understanding of infant crying. Clin. Nurs. Res. 2(4), 396–410 (1993). https://doi.org/10.1177/105477389300200403. pMID: 8220195
Eyben, F.: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer theses. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-27299-3. https://books.google.com.ar/books?id=AFBECwAAQBAJ
Garcia, J.O., Garcia, C.A.R.: Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. In: Proceedings of the International Joint Conference on Neural Networks, vol. 4, pp. 3140–3145, July 2003. https://doi.org/10.1109/IJCNN.2003.1224074
Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, pp. 125–128 (2001). https://doi.org/10.1109/ICASSP.2001.940783
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 985–990, July 2004. https://doi.org/10.1109/IJCNN.2004.1380068
Hung, J.: Optimization of filter-bank to improve the extraction of MFCC features in speech recognition. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 675–678, October 2004
Lee, S., Fang, S., Hung, J., Lee, L.: Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU 2001, pp. 49–52 (2001). https://doi.org/10.1109/ASRU.2001.1034586
Likitha, M.S., Gupta, S.R.R., Hasitha, K., Raju, A.U.: Speech based human emotion recognition using MFCC. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2257–2260, March 2017. https://doi.org/10.1109/WiSPNET.2017.8300161
Marschik, P.B., et al.: A novel way to measure and predict development: a heuristic approach to facilitate the early detection of neurodevelopmental disorders. Curr. Neurol. Neurosci. Rep. 17(5), 43 (2017)
Oliveira, A.L., Braga, P.L., Lima, R.M., Cornélio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010). https://doi.org/10.1016/j.infsof.2010.05.009
Paul, S., Das, S.: Simultaneous feature selection and weighting - an evolutionary multi-objective optimization approach. Pattern Recognit. Lett. 65, 51–59 (2015). https://doi.org/10.1016/j.patrec.2015.07.007
Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A system for the processing of infant cry to recognize pathologies in recently born babies with neural networks. In: 9th Conference Speech and Computer, SPECOM-2004 (2004)
Rosenberg, A.: Classifying skewed data: importance weighting to optimize average recall. In: INTERSPEECH 2012, Portland, USA (2012)
Schuller, B., Steidl, S., Batliner, A., Schiel, F., Krajewski, J.: The interspeech 2011 speaker state challenge. In: Proceedings of the Interspeech, ISCA, pp. 3201–3204, March 2011
Schuller, B., Steidl, S., Batliner, A., Baumeister, et al.: The interspeech 2018 computational paralinguistics challenge: atypical & self-assessed affect, crying & heart beats. In: Computational Paralinguistics Challenge, Interspeech 2018 (2018)
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204, March 2016. https://doi.org/10.1109/ICASSP.2016.7472669
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, Y.V.: Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 786–789, March 2017. https://doi.org/10.1109/WiSPNET.2017.8299868
Veer, K., Sharma, T.: A novel feature extraction for robust EMG pattern recognition. J. Med. Eng. Technol. 40(4), 149–154 (2016). https://doi.org/10.3109/03091902.2016.1153739
Vignolo, L.D., Milone, D.H., Rufiner, H.L.: Genetic wavelet packets for speech recognition. Expert Syst. Appl. 40(6), 2350–2359 (2013). https://doi.org/10.1016/j.eswa.2012.10.050
Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013). https://doi.org/10.1016/j.eswa.2013.03.032
Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011). https://doi.org/10.1016/j.asoc.2011.01.012
Vignolo, L.D., Rufiner, H.L., Milone, D.H., Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Proc. 2011, 8:1–8:14 (2011)
Vozáriková, E., Juhár, J., Čižmár, A.: Acoustic events detection using MFCC and MPEG-7 descriptors. In: Dziech, A., Czyżewski, A. (eds.) Multimedia Communications, Services and Security, pp. 191–197. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21512-4_23
Wu, Z., Cao, Z.: Improved MFCC-based feature for robust speaker identification. Tsinghua Sci. Technol. 10(2), 158–161 (2005)
Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. Inf. Theory 38(2), 824–839 (1992)
Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and AMS-GMM mask for acoustic emotion classification. Signal Process. Lett. 21(5), 620–624 (2014). https://doi.org/10.1109/LSP.2014.2311435
Zabidi, A., Mansor, W., Khuan, L.Y., Sahak, R., Rahman, F.Y.A.: Mel-frequency cepstrum coefficient analysis of infant cry with hypothyroidism. In: 2009 5th International Colloquium on Signal Processing its Applications, pp. 204–208, March 2009. https://doi.org/10.1109/CSPA.2009.5069217
Acknowledgements
The authors wish to thank the support of the Agencia Nacional de Promoción Científica y Tecnológica (with PICT 2015-0977), the Universidad Nacional de Litoral (with CAI+D 50020150100055LI, CAI+D 50020150100059LI, CAI+D 50020150100042LI), and the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) from Argentina.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Vignolo, L.D., Albornoz, E.M., Martínez, C.E. (2018). Feature Set Optimisation for Infant Cry Classification. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-03928-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)