Large-Scale Neighbor-Joining with NINJA

Wheeler, Travis J.

doi:10.1007/978-3-642-04241-6_31

Travis J. Wheeler²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

950 Accesses
37 Citations

Abstract

Neighbor-joining is a well-established hierarchical clustering algorithm for inferring phylogenies. It begins with observed distances between pairs of sequences, and clustering order depends on a metric related to those distances. The canonical algorithm requires O(n ³) time and O(n ²) space for n sequences, which precludes application to very large sequence families, e.g. those containing 100,000 sequences. Datasets of this size are available today, and such phylogenies will play an increasingly important role in comparative biology studies. Recent algorithmic advances have greatly sped up neighbor-joining for inputs of thousands of sequences, but are limited to fewer than 13,000 sequences on a system with 4GB RAM. In this paper, I describe an algorithm that speeds up neighbor-joining by dramatically reducing the number of distance values that are viewed in each iteration of the clustering procedure, while still computing a correct neighbor-joining tree. This algorithm can scale to inputs larger than 100,000 sequences because of external-memory-efficient data structures. A free implementation may by obtained from http://nimbletwist.com/software/ninja

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
CAS PubMed Google Scholar
Nakhleh, L., Moret, B.M.E., Roshan, U., John, K.S., Sun, J., Warnow, T.: The accuracy of fast phylogenetic methods for large datasets. In: Proc. 7th Pacific Symp. on Biocomputing, PSB 2002, pp. 211–222 (2002)
Google Scholar
Atteson, K.: The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica 25, 251–278 (1999)
Article Google Scholar
Felsenstein, J.: Inferring phylogenies (January 2004)
Google Scholar
Bryant, D.: On the Uniqueness of the Selection Criterion in Neighbor-Joining. Journal of Classification 22, 3–15 (2005)
Article Google Scholar
Studier, J.A., Keppler, K.J.: A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5(6), 729–731 (1988)
CAS PubMed Google Scholar
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Res. 36(Database issue), D281–D288 (2008)
Google Scholar
Griffiths Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33(Database issue), D121–D124 (2005)
Article Google Scholar
Goldman, N., Yang, Z.: Introduction. Statistical and computational challenges in molecular phylogenetics and evolution. Philos. Trans. R Soc. Lond B Biol. Sci. 363(1512), 3889–3892 (2008)
Article PubMed PubMed Central Google Scholar
Smith, S.A., Beaulieu, J.M., Donoghue, M.J.: Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol. Biol. 9, 37 (2009)
Article PubMed PubMed Central Google Scholar
Howe, K., Bateman, A., Durbin, R.: QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)
Article CAS PubMed Google Scholar
Mailund, T., Pedersen, C.N.S.: QuickJoin–fast neighbour-joining tree reconstruction. Bioinformatics 20(17), 3261–3262 (2004)
Article CAS PubMed Google Scholar
Mailund, T., Brodal, G.S., Fagerberg, R., Pedersen, C.N.S., Phillips, D.: Recrafting the neighbor-joining method. BMC Bioinformatics 7, 29 (2006)
Article PubMed PubMed Central Google Scholar
Simonsen, M., Mailund, T., Pedersen, C.N.S.: Rapid Neighbour-Joining. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 113–122. Springer, Heidelberg (2008)
Chapter Google Scholar
Zaslavsky, L., Tatusova, T.: Accelerating the neighbor-joining algorithm using the adaptive bucket data structure. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 122–133. Springer, Heidelberg (2008)
Chapter Google Scholar
Evans, J., Sheneman, L., Foster, J.: Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J. Mol. Evol. 62(6), 785–792 (2006)
Article CAS PubMed Google Scholar
Elias, I., Lagergren, J.: Fast Neighbor Joining. Theor. Comput. Sci. 410, 1993–2000 (2009)
Article Google Scholar
Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. Journal of Computational Biology 9(5), 687–705 (2002)
Article CAS PubMed Google Scholar
Sheneman, L., Evans, J., Foster, J.A.: Clearcut: a fast implementation of relaxed neighbor joining. Bioinformatics 22(22), 2823–2824 (2006)
Article CAS PubMed Google Scholar
Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26, 1641–1650 (2009)
Article CAS PubMed PubMed Central Google Scholar
Patterson, D.A.: Latency lags bandwidth. Communications of the ACM 47(10), 71–75 (2004)
Article Google Scholar
Bayer, R., McCreight, E.: Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173–189 (1972)
Article Google Scholar
Corman, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. MIT Press, Cambridge (2001)
Google Scholar
Brengel, K., Crauser, A., Ferragina, P., Meyer, U.: An Experimental Study of Priority Queues in External Memory. In: Vitter, J.S., Zaroliagis, C.D. (eds.) WAE 1999. LNCS, vol. 1668, pp. 345–359. Springer, Heidelberg (1999)
Chapter Google Scholar
Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14(7), 685–695 (1997)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Arizona, Tucson, AZ, 85721, USA
Travis J. Wheeler

Authors

Travis J. Wheeler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Bioinformatics and Computational Biology, and Department of Computer Science, University of Maryland, MD, College Park, USA
Steven L. Salzberg
Department of Computer Sciences, The University of Texas at Austin, TX, USA
Tandy Warnow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wheeler, T.J. (2009). Large-Scale Neighbor-Joining with NINJA. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-04241-6_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics