Abstract
The correct classification of transposable elements (TEs) present in the genomes is crucial to understand the real role and the consequences of these elements on the organisms. Here we present a method that classifies TEs by training a CNN to label them in classes, orders and superfamilies. Unlike previous works in the literature, the proposed method does not search for similarities to classify the sequences or use traditional machine learning classifiers. Instead of that, it automatically extracts features and classify the sequences by the CNN itself. We performed an extensive experimental evaluation, analyzing our proposed method under different scenarios. It was capable to classify TEs’ sequences from various datasets in 9 different superfamilies and obtained an accuracy of \(94\%\). We also present comparisons between the proposed method and other state-of-the-art classification tools (PASTEC, REPCLASS and TECLASS), our method presents very promising results, outperforming PASTEC and REPCLASS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the Conference on Operating Systems Design and Implementation, pp. 265–283. USENIX Association (2016)
Abrusán, G., Grundmann, N., DeMester, L., Makalowski, W.: TEclass: a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25(10), 1329–1330 (2009). https://doi.org/10.1093/bioinformatics/btp084
Chuong, E.B., Elde, N.C., Feschotte, C.: Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2016)
Copetti, D., et al.: RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics 16(1), 538 (2015)
Feschotte, C., Keswani, U., Ranganathan, N., Guibotsy, M.L., Levine, D.: Exploring repetitive DNA landscapes using repclass, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Bioloyand Evol. 1, 205–220 (2009). https://doi.org/10.1093/gbe/evp023. https://www.ncbi.nlm.nih.gov/pubmed/20333191
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Hoede, C., et al.: PASTEC: an automatic transposable element classification tool. PLOS ONE 9(5), 1–6 (2014). https://doi.org/10.1371/journal.pone.0091929
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110(1–4), 462–467 (2005)
Kim, Y.: Convolutional neural networks for sentence classification. CoRR abs/1408.5882 (2014). http://arxiv.org/abs/1408.5882
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–15 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Li, S.F., et al.: DPTEdb, an integrative database of transposable elements in dioeciousplants. Database (Oxford) 2016, 1–10 (2016). https://doi.org/10.1093/database/baw078. https://www.ncbi.nlm.nih.gov/pubmed/27173524
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009). https://doi.org/10.1016/j.ipm.2009.03.002. http://www.sciencedirect.com/science/article/pii/S0306457309000259
Spannagl, M., et al.: PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44(D1), D1141–D1147 (2016). https://doi.org/10.1093/nar/gkv1130. https://www.ncbi.nlm.nih.gov/pubmed/26527721
Wicker, T., Matthews, D.E., Keller, B.: TREP: a database for triticeae repetitive elements. Trends Plant Sci. 7(12), 561–562 (2002)
Wicker, T., et al.: A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007). https://doi.org/10.1038/nrg2165
Yi, F., Jia, Z., Xiao, Y., Ma, W., Wang, J.: SPTEdb: a database for transposable elements in salicaceous plants. Database 2018(bay024), 1–8 (2018)
Acknowledgements
The authors thank Prof. Dr Douglas Silva Domingues and Ms. Daniel Longhi Fernandes Pedro for all the comments in this work. This work has been supported by CNPq (grants \(\#372528/2018\)-0, \(\#431668/2016\)-7, \(\#422811/2016\)-5); CAPES; Araucaria Foundation; SETI; PPGBIOINFO; and UTFPR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
da Cruz, M.H.P., Saito, P.T.M., Paschoal, A.R., Bugatti, P.H. (2019). Classification of Transposable Elements by Convolutional Neural Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2019. Lecture Notes in Computer Science(), vol 11509. Springer, Cham. https://doi.org/10.1007/978-3-030-20915-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-20915-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20914-8
Online ISBN: 978-3-030-20915-5
eBook Packages: Computer ScienceComputer Science (R0)