Abstract
Biological sequence classification is a key task in Bioinformatics. For research labs today, the classification of unknown biological sequences is essential for facilitating the identification, grouping and study of organisms and their evolution. This paper compares three of the most recent deep learning works on the 16S rRNA barcode dataset for taxonomic classification. Three different CNN architectures are compared together with three different feature representations, namely: k-mer spectral representation, Frequency Chaos Game Representation (FCGR) and character-level integer encoding. Experimental results and comparisons have shown that representations that hold positional information about the nucleotides in a sequence perform much better with accuracies reaching 91.6% on the most fine-grained classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brandenberg, O., et al.: Introduction to Molecular Biology and Genetic Engineering (2011)
National Human Genome Research Institute: The Human Genome Project (HGP). https://www.genome.gov/human-genome-project. Accessed 17 June 2019
Reece, J.B., et al.: Biology: Concepts & Connections, 7th edn. Pearson Benjamin Cummings, San Francisco (2012)
Kristensen, T., Guillaume, F.: Classification of DNA sequences by a MLP and SVM network. In: Proceedings of the International Conference on Bioinformatics and Computational Biology, The Steering Committee of The World Congress in Computer Science (2013)
Kristensen, T., Guillaume, F.: Different regimes for classification of DNA sequences. In: IEEE 7th International Conference on Cybernetics and Intelligent Systems and IEEE Conference on Robotics, Automation and Mechatronics, IEEE, pp. 114–119 (2015)
Alhersh, T., et al.: Species identification using part of DNA sequence: evidence from machine learning algorithms. In: Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, ICST, pp. 490–494 (2016)
Dakhli, A., Bellil, W.: Wavelet neural networks for DNA sequence classification using the genetic algorithms and the least trimmed square. Procedia Comput. Sci. 96, 418–427 (2016)
Pashaei, E., Aydin, N.: Frequency difference based DNA encoding methods in human splice site recognition. In: International Conference on Computer Science and Engineering, IEEE, pp. 586–591 (2017)
Huang, J., et al.: An approach of encoding for prediction of splice sites using SVM. Biochimie 88(7), 923–929 (2006)
Rizzo, R., et al.: A deep learning approach to DNA sequence classification. In: International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 129–140. Springer (2015)
Rizzo, R., et al.: Classification experiments of DNA sequences by using a deep neural network and chaos game representation, pp. 222–228 (2016)
Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for DNA sequence classification, pp. 162–171 (2017)
Nguyen, N.G., et al.: DNA sequence classification by convolutional neural network. J. Biomed. Sci. Eng. 9, 280–286 (2016)
Yin, B., et al.: An image representation based convolutional network for DNA classification. arXiv preprint arXiv:1806.04931 (2018)
Min, X., et al.: DeepEnhancer: predicting enhancers by convolutional neural networks. In: IEEE International Conference on Bioinformatics and Biomedicine, IEEE, pp. 637–644 (2016)
Ghandi, M., et al.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7) (2014)
Michigan State University Center for Microbial Ecology. Ribosomal Database Project (RDP). https://rdp.cme.msu.edu/. Accessed 18 June 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Helaly, M.A., Rady, S., Aref, M.M. (2020). Convolutional Neural Networks for Biological Sequence Taxonomic Classification: A Comparative Study. In: Hassanien, A., Shaalan, K., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. AISI 2019. Advances in Intelligent Systems and Computing, vol 1058. Springer, Cham. https://doi.org/10.1007/978-3-030-31129-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-31129-2_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31128-5
Online ISBN: 978-3-030-31129-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)