skip to main content
10.1145/2037509.2037520acmotherconferencesArticle/Chapter ViewAbstractPublication PagescmsbConference Proceedingsconference-collections
research-article

HPC selection of models of DNA substitution

Published: 21 September 2011 Publication History

Abstract

Statistical model selection has become an essential step for the estimation of phylogenies from DNA sequence alignments. The program jModelTest offers different strategies to identify best-fit models for the data at hand, but for large DNA alignments, this task can demand vast computational resources.
This paper presents a High Performance Computing (HPC) adaptation of jModelTest for shared memory multi-core systems and distributed memory cluster platforms. The performance evaluation of this HPC version on a shared memory system and on a cluster shows significant performance advantages, with speedups up to 39. This could represent a reduction in the execution time of some analyses from almost one day to half an hour.

References

[1]
H. Akaike. A new look at the statistical model identification. IEEE T Automat Contr, 19(6):716--723, 1974.
[2]
T. R. Buckley. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst Biol, 51:509--523, 2002.
[3]
T. R. Buckley and C. W. Cunningham. The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support. Mol Biol Evol, 19:394--405, 2002.
[4]
B. Chor and T. Tuller. Finding a maximum likelihood tree is hard. Journal of the ACM (JACM), 53(5):722--744, 2006.
[5]
D. Darriba, G. L. Taboada, R. Doallo, and D. Posada. Prottest 3: fast selection of best-fit models of protein evolution. Bioinformatics, 27:1164--1165, 2011.
[6]
J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol, 17:368--376, 1981.
[7]
J. Felsenstein. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet, 22:521--565, 1988.
[8]
O. Gascuel. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol, 14(7):685--95, 1997.
[9]
O. Guindon and S. Gascuel. Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol, 52:696--704, 2003.
[10]
A. R. Lemmon and E. C. Moriarty. The importance of proper model assumption in Bayesian phylogenetic. Syst Biol, 53:265--277, 2004.
[11]
D. Posada. jModelTest: phylogenetic model averaging. Mol Biol Evol, 25(7):1253--1256, Apr. 2008.
[12]
D. Posada and K. A. Crandall. Selecting the best-fit model of nucleotide substitution. Syst Biol, 50(4):580--601, 2001.
[13]
G. Schwarz. Estimating the dimension of a model. Ann Stat, 6:461--464, 1978.
[14]
A. Shafi, B. Carpenter, and M. Baker. Nested parallelism for multi-core HPC systems using Java. J Parallel Distr Com, 69(6):532--545, 2009.
[15]
G. L. Taboada, J. T. no, and R. Doallo. F-MPJ: scalable Java message-passing communications on parallel systems. J Supercomput (In press), 2011.

Cited By

View all
  • (2014)High-performance computing selection of models of DNA substitution for multicore clustersInternational Journal of High Performance Computing Applications10.1177/109434201349509528:1(112-125)Online publication date: 1-Feb-2014
  • (2012)Quantifying functional heterothallism in the pseudohomothallic ascomycete Neurospora tetraspermaFungal Biology10.1016/j.funbio.2012.06.006116:9(962-975)Online publication date: Sep-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CMSB '11: Proceedings of the 9th International Conference on Computational Methods in Systems Biology
September 2011
224 pages
ISBN:9781450308175
DOI:10.1145/2037509
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • TCSIM: IEEE Computer Society Technical Committee on Simulation
  • University Henri-Poincare: University Henri-Poincare - France
  • NVIDIA
  • CNRS: Centre National De La Rechercue Scientifique
  • Microsoft Research: Microsoft Research

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java threads
  2. high performance computing
  3. message passing in Java
  4. nucleotide substitution
  5. performance evaluation
  6. phylogeny

Qualifiers

  • Research-article

Funding Sources

Conference

CMSB'11
Sponsor:
  • TCSIM
  • University Henri-Poincare
  • CNRS
  • Microsoft Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)High-performance computing selection of models of DNA substitution for multicore clustersInternational Journal of High Performance Computing Applications10.1177/109434201349509528:1(112-125)Online publication date: 1-Feb-2014
  • (2012)Quantifying functional heterothallism in the pseudohomothallic ascomycete Neurospora tetraspermaFungal Biology10.1016/j.funbio.2012.06.006116:9(962-975)Online publication date: Sep-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media