Abstract
Automatic voice pathology diagnosis is a widely investigated area by the research community. Recently, in the literature, most of the proposed solutions are based on robust feature descriptors, which are combined with machine learning algorithms. Despite of their success, it is practically difficult to design handcrafted features which are optimal for specific classification tasks. Nowadays, deep learning approaches, particularly deep Convolutional Neural Networks (CNNs), have significant breakthroughs in the recognition tasks. In this study, the deep CNN, which was mainly explored in image recognition purposes, is used for the purpose of speech recognition. An approach is proposed for voice pathology recognition using both deep CNN and Genetic Algorithm (GA). The CNN weights are initialized using the solutions produced by GA, which minimizes the classification error and increases the ability to discriminate the voice pathology. Moreover, three popular deep CNN architectures, which have been investigated in the literature for image recognition, are adapted for voice pathology diagnosis, namely: AlexNet, VGG16, and ResNet34. For comparison purposes, performance of the hybrid CNN-GA algorithm is compared to the performance of the conventional CNN, and to some other approaches based on hybridization of deep CNN and meta-heuristic methods. Experimental results reveal that the improvement in voice pathology classification accuracy for proposed method in comparison to the basic CNN was 5.4% and when compared with other meta-heuristic based algorithms was up to 4.27%. The proposed approach also outperforms the state of the art works on the same dataset with overall accuracy of 99.37%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z.: Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J. Voice 31, 3–15 (2017)
Kohler, M., Mendoza, L.A.F., Lazo, J.G., Vellasco, M., Cataldo, E.: Classification of Voice Pathologies Using Glottal Signal Parameters. Anais do 10. Congresso Brasileiro de Inteligência Computacional (2016)
Ali, Z., Elamvazuthi, I., Alsulaiman, M., Muhammad, G.: Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model. J. Voice 30, 757-e7 (2016)
Hossain, M.S., Muhammad, G.: Cloud-assisted speech and face recognition framework for health monitoring. Mob. Networks Appl. 20, 391–399 (2015)
Cordeiro, H., Meneses, C., Fonseca, J.: Continuous speech classification systems for voice pathologies identification. In: Camarinha-Matos, L.M., Baldissera, T.A., Di Orio, G., Marques, F. (eds.) DoCEIS 2015. IAICT, vol. 450, pp. 217–224. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16766-4_23
Kay Elemetrics, Multi-Dimensional Voice Program (MDVP) [Computer Program] (2012)
Fu, Y., Aldrich, C.: Flotation froth image recognition with convolutional neural networks. Miner. Eng. 132, 183–190 (2019)
Traore, B.B., Kamsu-Foguem, B., Tangara, F.: Deep convolution neural network for image recognition. Ecol. Inf. 48, 257–268 (2018)
Fang, L., Jin, Y., Huang, L., Guo, S., Zhao, G., Chen, X.: Iterative fusion convolutional neural networks for classification of optical coherence tomography images. J. Vis. Commun. Image Represent. 59, 327–333 (2019)
Fayek, H.M., Lech, M., Cavedon, L.: Evaluating deep learning architectures for speech emotion recognition. Neural Networks 92, 60–68 (2017)
Tu, Y.-H., et al.: An iterative mask estimation approach to deep learning based multi-channel speech recognition. Speech Commun. 106, 31–43 (2019)
Angrick, M., Herff, C., Johnson, G., Shih, J., Krusienski, D., Schultz, T.: Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings. Neurocomputing 342, 145–151 (2019)
Hossain, M.S., Muhammad, G.: Emotion recognition using deep learning approach from audio–visual emotional big data. Inf. Fusion. 49, 69–78 (2019)
Palaz, D., Magimai-Doss, M., Collobert, R.: End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Commun. 108, 15–32 (2019)
Fang, S.-H., et al.: Detection of pathological voice using cepstrum vectors: a deep learning approach. J. Voice (2018)
Ghoniem, R.M., Shaalan, K.: FCSR - fuzzy continuous speech recognition approach for identifying laryngeal pathologies using new weighted spectrum features. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017 Advances in Intelligent Systems and Computing, pp. 384–395 (2017)
Muhammad, G., et al.: Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomed. Signal Process. Control 31, 156–164 (2017)
Guedes, V., Junior, A., Fernandes, J., Teixeira, F., Teixeira, J.P.: Long short term memory on chronic laryngitis classification. Procedia Comput. Sci. 138, 250–257 (2018)
Wu, K., Zhang, D., Lu, G., Guo, Z.: Joint learning for voice based disease detection. Pattern Recogn. 87, 130–139 (2019)
Eye, M., Infirmary, E.: Voice Disorders Database, (Version 1.03 Cd-Rom). Vol (Kay Elemetrics Corp., Lincoln Park N, ed.). Kay Elemetrics Corp., Lincoln Park (1994)
Song, R., Zhang, X., Zhou, C., Liu, J., He, J.: Predicting TEC in China based on the neural networks optimized by genetic algorithm. Adv. Space Res. 62, 745–759 (2018)
Ghoniem, R., Refky, B., Soliman, A., Tawfik, A.: IPES: an image processing-enabled expert system for the detection of breast malignant tumors. J. Biomed. Eng. Med. Imaging 3, 13–32 (2016)
Rere, L.R., Fanany, M.I., Arymurthy, A.M.: Simulated annealing algorithm for deep learning. Procedia Comput. Sci. 72, 137–144 (2015)
Silva, G.L.F.D., Valente, T.L.A., Silva, A.C., Paiva, A.C.D., Gattass, M.: Convolutional neural network-based PSO for lung nodule false positive reduction on CT images. Comput. Meth. Programs Biomed. 162, 109–118 (2018)
Yang, X.-S.: A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) Studies in Computational Intelligence, pp. 65–74 (2010)
Akbari, A., Arjmandi, M.K.: An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomed. Signal Process. Control 10, 209–223 (2014)
Muhammad, G., et al.: Automatic voice pathology detection and classification using vocal tract area irregularity. Biocybernetics Biomed. Eng. 36, 309–317 (2016)
Al-Nasheri, A., et al.: An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J. Voice 31, 113-e9 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ghoniem, R.M. (2019). Deep Genetic Algorithm-Based Voice Pathology Diagnostic System. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-23281-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23280-1
Online ISBN: 978-3-030-23281-8
eBook Packages: Computer ScienceComputer Science (R0)