skip to main content
10.1145/1183401.1183442acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Accelerator design for protein sequence HMM search

Published:28 June 2006Publication History

ABSTRACT

Profile Hidden Markov models (HMMs) are a powerful approach to describing biologically significant functional units, or motifs, in protein sequences. Entire databases of such models are regularly compared to large collections of proteins to recognize motifs in them. Exponentially increasing rates of genome sequencing have caused both protein and model databases to explode in size, placing an ever-increasing computational burden on users of these systems.Here, we describe an accelerated search system that exploits parallelism in a number of ways. First, the application is functionally decomposed into a pipeline, with distinct compute resources executing each pipeline stage. Second, the first pipeline stage is deployed on a systolic array, which yields significant fine-grained parallelism. Third, for some instantiations of the design, parallel copies of the first pipeline stage are used, further increasing the level of coarse-grained parallelism.A naïve parallelization of the first stage computation has serious repercussions for the sensitivity of the search. We present a pair of remedies to this dilemma and quantify the regions of interest within which each approach is most effective. Analytic performance models are used to assess the overall speedup that can be attained relative to a single-processor software solution. Performance improvements of 1 to 2 orders of magnitude are predicted.

References

  1. A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E. L. L. Sonnhammer, D. J. Studholme, C. Yeats, and S. R. Eddy. The Pfam protein families database. Nucleic Acids Research, 32:D138--41, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. B. Boeckmann, A. Bairoch, R. Apweiler, M. C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider. The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research, 31:365--70, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Chamberlain and B. Shands. Streaming data from disk store to application. In Proc. 3rd Int'l Workshop on Storage Network Architecture and Parallel I/Os, pages 17--23, St. Louis, MO, 2005.Google ScholarGoogle Scholar
  4. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, New York, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Eddy. HMMER: Sequence analysis using profile hidden Markov models, 2004. http://hmmer.wustl.edu.Google ScholarGoogle Scholar
  6. D. T. Hoang. Searching genetic databases on Splash 2. In Proc. of IEEE Workshop on Field-Programmable Custom Computing Machines, pages 185--192, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. R. Horn, M. Houston, and P. Hanrahan. ClawHMMER: a streaming HMMer-search implementation. In Proc. IEEE Supercomputing 2005, Seattle, WA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Hughey and A. Krogh. Hidden Markov models for sequence analysis: extension and analysis of the basic method. CABIOS, 12:95--107, 1996.Google ScholarGoogle Scholar
  9. S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat'l Acad. Sci., 87(6):2264--2268, Mar. 1990.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: applications to protein modeling. Journal of Molecular Biology, 235:1501--31, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  11. National Center for Biological Information. Growth of GenBank, 2005. http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.Google ScholarGoogle Scholar
  12. T. Oliver and B. Schmidt. High performance biosequence database scanning on reconfigurable platforms. In Proc. of 4th IEEE Int'l Workshop on High Performance Computational Biology, Apr. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. T. Oliver, B. Schmidt, and D. Maskell. Hyper customized processors for bio-sequence database scanning on FPGAs. In Proc. of ACM/SIGDA 13th Int'l Symp. on Field-Programmable Gate Arrays, pages 229--237, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Outston et al. Application of hidden Markov models to detecting multi-stage network attacks. In Proc. 36th Hawaii Int'l Conf. on System Sciences, pages 334--44, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257--86, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--97, Mar. 1981.Google ScholarGoogle ScholarCross RefCross Ref
  17. Timelogic DeCypherHMM solution, 2004. http://www.timelogic.com/decypher_hmm.htm.Google ScholarGoogle Scholar
  18. T. VanCourt and M. C. Herbordt. Families of FPGA-based algorithms for approximate string matching. In Proc. of 15th IEEE Int'l Conf. on Application-Specific Systems, Architectures, and Processors, pages 354--364, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Vlontzos and S. Kung. Hidden Markov models for character recognition. IEEE Transactions on Image Processing, 1(4), 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. West, R. D. Chamberlain, R. S. Indeck, and Q. Zhang. An FPGA-based search engine for unstructured database. In Proc. of 2nd Workshop on Application Specific Processors, pages 25--32, Dec. 2003.Google ScholarGoogle Scholar
  21. B. Wun, J. Buhler, and P. Crowley. Exploiting coarse-grained paralellism to accelerate protein motif finding with a network processor. In Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques, pages 173--84, St. Louis, MO, 2005. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Yamaguchi, T. Maruyama, and A. Konagaya. High speed homology search with FPGAs. In Proc. of Pacific Symp. on Biocomputing, pages 271--282, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Accelerator design for protein sequence HMM search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '06: Proceedings of the 20th annual international conference on Supercomputing
          June 2006
          385 pages
          ISBN:1595932828
          DOI:10.1145/1183401

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICS '06 Paper Acceptance Rate37of141submissions,26%Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader