Article

Accelerator design for protein sequence HMM search

Authors:
Rahul P. Maddimsetty

Washington University, St. Louis, MO

Washington University, St. Louis, MO
View Profile

,
Jeremy Buhler

Washington University, St. Louis, MO

Washington University, St. Louis, MO
View Profile

,
Roger D. Chamberlain

Washington University, St. Louis, MO

Washington University, St. Louis, MO
View Profile

,
Mark A. Franklin

Washington University, St. Louis, MO

Washington University, St. Louis, MO
View Profile

,
Brandon Harris

Washington University, St. Louis, MO

Washington University, St. Louis, MO
View Profile

ICS '06: Proceedings of the 20th annual international conference on SupercomputingJune 2006Pages 288–296https://doi.org/10.1145/1183401.1183442

Published:28 June 2006Publication History

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Pages 288–296

ABSTRACT

Profile Hidden Markov models (HMMs) are a powerful approach to describing biologically significant functional units, or motifs, in protein sequences. Entire databases of such models are regularly compared to large collections of proteins to recognize motifs in them. Exponentially increasing rates of genome sequencing have caused both protein and model databases to explode in size, placing an ever-increasing computational burden on users of these systems.Here, we describe an accelerated search system that exploits parallelism in a number of ways. First, the application is functionally decomposed into a pipeline, with distinct compute resources executing each pipeline stage. Second, the first pipeline stage is deployed on a systolic array, which yields significant fine-grained parallelism. Third, for some instantiations of the design, parallel copies of the first pipeline stage are used, further increasing the level of coarse-grained parallelism.A naïve parallelization of the first stage computation has serious repercussions for the sensitivity of the search. We present a pair of remedies to this dilemma and quantify the regions of interest within which each approach is most effective. Analytic performance models are used to assess the overall speedup that can be attained relative to a single-processor software solution. Performance improvements of 1 to 2 orders of magnitude are predicted.

References

A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E. L. L. Sonnhammer, D. J. Studholme, C. Yeats, and S. R. Eddy. The Pfam protein families database. Nucleic Acids Research, 32:D138--41, 2004.Google ScholarCross Ref
B. Boeckmann, A. Bairoch, R. Apweiler, M. C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider. The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research, 31:365--70, 2003.Google ScholarCross Ref
R. Chamberlain and B. Shands. Streaming data from disk store to application. In Proc. 3rd Int'l Workshop on Storage Network Architecture and Parallel I/Os, pages 17--23, St. Louis, MO, 2005.Google Scholar
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, New York, 1998.Google ScholarCross Ref
S. Eddy. HMMER: Sequence analysis using profile hidden Markov models, 2004. http://hmmer.wustl.edu.Google Scholar
D. T. Hoang. Searching genetic databases on Splash 2. In Proc. of IEEE Workshop on Field-Programmable Custom Computing Machines, pages 185--192, 1993.Google ScholarCross Ref
D. R. Horn, M. Houston, and P. Hanrahan. ClawHMMER: a streaming HMMer-search implementation. In Proc. IEEE Supercomputing 2005, Seattle, WA, 2005. Google ScholarDigital Library
R. Hughey and A. Krogh. Hidden Markov models for sequence analysis: extension and analysis of the basic method. CABIOS, 12:95--107, 1996.Google Scholar
S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat'l Acad. Sci., 87(6):2264--2268, Mar. 1990.Google ScholarCross Ref
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: applications to protein modeling. Journal of Molecular Biology, 235:1501--31, 1994.Google ScholarCross Ref
National Center for Biological Information. Growth of GenBank, 2005. http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.Google Scholar
T. Oliver and B. Schmidt. High performance biosequence database scanning on reconfigurable platforms. In Proc. of 4th IEEE Int'l Workshop on High Performance Computational Biology, Apr. 2004.Google ScholarCross Ref
T. Oliver, B. Schmidt, and D. Maskell. Hyper customized processors for bio-sequence database scanning on FPGAs. In Proc. of ACM/SIGDA 13th Int'l Symp. on Field-Programmable Gate Arrays, pages 229--237, Feb. 2005. Google ScholarDigital Library
D. Outston et al. Application of hidden Markov models to detecting multi-stage network attacks. In Proc. 36th Hawaii Int'l Conf. on System Sciences, pages 334--44, 2003. Google ScholarDigital Library
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257--86, 1989.Google ScholarCross Ref
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--97, Mar. 1981.Google ScholarCross Ref
Timelogic DeCypherHMM solution, 2004. http://www.timelogic.com/decypher_hmm.htm.Google Scholar
T. VanCourt and M. C. Herbordt. Families of FPGA-based algorithms for approximate string matching. In Proc. of 15th IEEE Int'l Conf. on Application-Specific Systems, Architectures, and Processors, pages 354--364, Sept. 2004. Google ScholarDigital Library
J. Vlontzos and S. Kung. Hidden Markov models for character recognition. IEEE Transactions on Image Processing, 1(4), 1992.Google ScholarDigital Library
B. West, R. D. Chamberlain, R. S. Indeck, and Q. Zhang. An FPGA-based search engine for unstructured database. In Proc. of 2nd Workshop on Application Specific Processors, pages 25--32, Dec. 2003.Google Scholar
B. Wun, J. Buhler, and P. Crowley. Exploiting coarse-grained paralellism to accelerate protein motif finding with a network processor. In Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques, pages 173--84, St. Louis, MO, 2005. IEEE. Google ScholarDigital Library
Y. Yamaguchi, T. Maruyama, and A. Konagaya. High speed homology search with FPGAs. In Proc. of Pacific Symp. on Biocomputing, pages 271--282, 2002.Google Scholar

Index Terms

Accelerator design for protein sequence HMM search

Recommendations

An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially.This has popularized programs that carry out database searches. Current implementations of sequence alignmentmethods based on hidden Markov models (...
Read More
Protein homology detection by HMM--HMM comparison

Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution.

Results: We have generalized the alignment of protein sequences with a profile hidden Markov model (...
Read More
Fine-Scale Recombination Mapping of High-Throughput Sequence Data
BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

In this paper, we contrast the resolution and accuracy of determining recombination boundaries using genotyping arrays compared to high-throughput sequencing. In addition, we consider the impacts of sequence coverage and genetic diversity on localizing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '06: Proceedings of the 20th annual international conference on Supercomputing
June 2006
385 pages
ISBN:1595932828
DOI:10.1145/1183401
General Chairs:
Greg Egan
Monash University
,
Yoichi Muraoka
Waseda University
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HMMER
hidden Markov model
protein motif
Qualifiers
- Article
Conference

Acceptance Rates
ICS '06 Paper Acceptance Rate37of141submissions,26%Overall Acceptance Rate584of2,055submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 376
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerator design for protein sequence HMM search

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

Protein homology detection by HMM--HMM comparison

Fine-Scale Recombination Mapping of High-Throughput Sequence Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerator design for protein sequence HMM search

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

Protein homology detection by HMM--HMM comparison

Fine-Scale Recombination Mapping of High-Throughput Sequence Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media