DeepED: A Deep Learning Framework for Estimating Evolutionary Distances

Liu, Zhuangzhuang; Ren, Mingming; Niu, Zhiheng; Wang, Gang; Liu, Xiaoguang

doi:10.1007/978-3-030-61609-0_26

Zhuangzhuang Liu¹¹,
Mingming Ren¹¹,
Zhiheng Niu¹¹,
Gang Wang¹¹ &
…
Xiaoguang Liu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Included in the following conference series:

International Conference on Artificial Neural Networks

3555 Accesses

Abstract

Evolutionary distances refer to the number of substitutions per site in two aligned nucleotide or amino acid sequences, which reflect divergence time and are much significant for phylogenetic inferences. In the past several decades, lots of molecular evolution models have been proposed for evolutionary distance estimation. Most of these models are designed under more or less assumptions and some assumptions are in good agreement with some real-world data but not all. To relax these assumptions and improve accuracies in evolutionary distance estimation, this paper proposes a framework containing Deep Neural Networks (DNNs), called DeepED (Deep learning method to estimate Evolutionary Distances), to estimate evolutionary distances for aligned DNA sequence pairs. The purposely designed structure in this framework enables it to handle long and variable length sequences as well as to find important segments in a sequence. The models of the network are trained with reliable data from real world which includes highly credible phylogenetic inferences. Experimental results demonstrate that DeepED models achieve a accuracy up to 0.98 (R-Squared), which outperforms traditional methods.

This work is partially supported by National Science Foundation of China (U1833114, 61872201, 61702521) and Science and Technology Development Plan of Tianjin (18ZXZNGX00140, 18ZXZNGX00200).

Z. Liu and M. Ren—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices

Article Open access 20 July 2020

Fast and accurate branch lengths estimation for phylogenomic trees

Article Open access 07 January 2016

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Article Open access 20 November 2023

References

Jukes, T.H., Cantor, C.R., et al.: Evolution of protein molecules. Mamm. Protein Metab. 3(21), 132 (1969)
Google Scholar
Posada, D., Crandall, K.A.: Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50(4), 580–601 (2001)
Article Google Scholar
Cunningham, C.W., Zhu, H., Hillis, D.M.: Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52(4), 978–987 (1998)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Tajima, F., Nei, M.: Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1(3), 269–285 (1984)
Google Scholar
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16(2), 111–120 (1980)
Article Google Scholar
Tamura, K.: Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+ C-content biases. Mol. Biol. Evol. 9(4), 678–687 (1992)
Google Scholar
Waddell, P.J., Steel, M.A.: General time reversible distances with unequal rates across sites (1996)
Google Scholar
Zhang, J., Xun, G.: Correlation between the substitution rate and rate variation among sites in protein evolution. Genetics 149(3), 1615–1625 (1998)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Hornik, K., Stinchcombe, M., White, H., et al.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article MATH Google Scholar
Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991)
Article Google Scholar
Zhang, H., et al.: A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front. Genet. 10, 467 (2019)
Article Google Scholar
Wang, R., et al.: Deepdna: a hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 270–274. IEEE (2018)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Huber, P.J.: Robust estimation of a location parameter. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 492–518. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_35
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Perelman, P., et al.: A molecular phylogeny of living primates. PLoS Genet. 7(3), e1001342 (2011)
Article Google Scholar
Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)
Article Google Scholar
Bouckaert, R., et al.: Beast 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10(4), e1003537 (2014)
Article Google Scholar
Hughes, L.C., et al.: Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. Proc. Natl. Acad. Sci. 115(24), 6249–6254 (2018)
Article Google Scholar
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Article Google Scholar
Kumar, S., Stecher, G., Tamura, K.: Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33(7), 1870–1874 (2016)
Article Google Scholar
Song, N., Liang, A.-P., Bu, C.-P.: A molecular phylogeny of hemiptera inferred from mitochondrial genome sequences. PLoS ONE 7(11), e48778 (2012)
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by National Science Foundation of China (61872201, 61702521, U1833114) and Science and Technology Development Plan of Tianjin (18ZXZNGX00140, 18ZXZNGX00200).

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, China
Zhuangzhuang Liu, Mingming Ren, Zhiheng Niu, Gang Wang & Xiaoguang Liu

Authors

Zhuangzhuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingming Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhiheng Niu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gang Wang or Xiaoguang Liu .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Ren, M., Niu, Z., Wang, G., Liu, X. (2020). DeepED: A Deep Learning Framework for Estimating Evolutionary Distances. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-61609-0_26
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61608-3
Online ISBN: 978-3-030-61609-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics